Skip to content
Alessandro Demo edited this page Jul 14, 2025 · 14 revisions

📊 Geospatial Performance Benchmark

This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.

Use Case 1: Data Ingestion

Goal: To measure the one-time cost of preparing OSM data (.pbf) for analysis, either by loading it into a database or by creating a smaller subset.

Operation Performed Technology Test Dataset Key Metrics Notes / Qualitative Observations
Load entire .pbf into a database osm2pgsql + PostGIS italy-latest.osm.pbf Import Time: ~1840s (~31 min)
Disk Space: 19 GB
Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries.
Extract a regional .pbf from a national .pbf osmium-tool italy-latest.osm.pbf -> lombardy.geojson -> lombardy-latest.osm.pbf Execution Time: ~21s Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium.

(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2).


Use Case 2: Data Filtering and Extraction

Goal: To measure the efficiency of extracting a specific subset of data from a larger source.

Operation Performed Technology Test Dataset Key Metrics Notes / Qualitative Observations
Extract buildings from .pbf and save to .geoparquet QuackOSM italy-latest.osm.pbf (filtered on Milan) Execution Time: ~349s
Output Size: 6.97 MB
Self-contained workflow. Handles MemoryErrors gracefully. Final building count for Milan: 62,133.
Extract buildings from DB PostGIS planet_osm_polygon table (query on Milan) Query Time: ~0.4s Extremely fast due to pre-existing spatial indexes created during the Use Case 1 ingestion phase. Final building count for Milan: 62,127.
Read .pbf, build GeoDataFrame, and filter by area PyOsmium + GeoPandas lombardy-latest.osm.pbf (filtered on Milan) Execution Time: ~316s
Memory Usage: Very High
Pure Python approach. Final building count for Milan: 62,015. The result is now consistent after implementing an advanced handler that processes complex relations (multipolygons).

Use Case 3: Single Table Analysis (Geometric Properties)

Goal: Evaluate performance on calculations that do not require joins, but operate on geometries.

Operation Performed Technology Test Dataset Key Metrics Notes / Qualitative Observations
10 different geometric analysis operations DuckDB Spatial milan_buildings_...geoparquet Execution Times: Ranged from ~0.03s to ~2.4s Excellent performance overall. Operations requiring transformation and aggregation on all geometries (e.g., Average Area, Total Buffered Area) are the most expensive (~2s). Simpler filters or calculations are significantly faster (<0.2s). The query optimizer also intelligently skips unnecessary calculations (e.g., Simplify), resulting in near-instant times.
10 different geometric analysis operations GeoPandas milan_buildings_...geoparquet • Execution Time (s)
• Memory Usage (MB)
Test TODO
10 different geometric analysis operations PostGIS planet_osm_polygon table (filtered on Milan) • Query Time (s) Test TODO

Use Case 4: Complex Spatial Join

Goal: To test performance on computationally intensive join operations on large datasets.

Operation Performed Technology Test Dataset Key Metrics Notes / Qualitative Observations
Count buildings within 20m of main roads DuckDB Spatial rome_roads.parquet
rome_buildings.parquet
• Query Time (s)
• Result Correctness
Test TODO
Count buildings within 20m of main roads PostGIS rome_roads, rome_buildings tables with GiST indexes • Index Creation Time (s)
• Query Time (s)
Test TODO
Count buildings within 20m of main roads GeoPandas rome_roads.gpkg
rome_buildings.gpkg
• Execution Time (s)
• Memory Usage (MB)
Test TODO
Clone this wiki locally