Tables

📊 Geospatial Performance Benchmark

This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.

Use Case 1: Data Ingestion

Goal: To measure the one-time cost of preparing OSM data (.pbf) for analysis, either by loading it into a database or by creating a smaller subset.

Operation Performed	Technology	Test Dataset	Key Metrics	Notes / Qualitative Observations
Load entire `.pbf` into a database	`osm2pgsql + PostGIS`	`italy-latest.osm.pbf`	• Import Time: ~1840s (~31 min) • Disk Space: 19 GB	Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries.
Extract a regional `.pbf` from a national `.pbf`	`osmium-tool`	`italy-latest.osm.pbf -> lombardy.geojson -> lombardy-latest.osm.pbf`	• Execution Time: ~21s	Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium.

(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2).

Use Case 2: Data Filtering and Extraction

Goal: To measure the efficiency of extracting a specific subset of data from a larger source.

Operation Performed	Technology	Test Dataset	Key Metrics	Notes / Qualitative Observations
Extract buildings from `.pbf` and save to `.geoparquet`	`QuackOSM`	`italy-latest.osm.pbf` (filtered on Milan)	• Execution Time: ~349s • Output Size: 6.97 MB	Self-contained workflow. Handles MemoryErrors gracefully. Final building count for Milan: 62,133.
Extract buildings from DB	`PostGIS`	`planet_osm_polygon` table (query on Milan)	• Query Time: ~0.4s	Extremely fast due to pre-existing spatial indexes created during the Use Case 1 ingestion phase. Final building count for Milan: 62,127.
Read `.pbf`, build GeoDataFrame, and filter by area	`PyOsmium + GeoPandas`	`lombardy-latest.osm.pbf` (filtered on Milan)	• Execution Time: ~316s • Memory Usage: Very High	Pure Python approach. Final building count for Milan: 62,015. The result is now consistent after implementing an advanced handler that processes complex `relations` (multipolygons).

Use Case 3: Single Table Analysis (Geometric Properties)

Goal: Evaluate performance on calculations that do not require joins, but operate on geometries.

Operation Performed	Technology	Test Dataset	Key Metrics	Notes / Qualitative Observations
10 different geometric analysis operations	`DuckDB Spatial`	`milan_buildings_...geoparquet`	• Execution Times: Ranged from ~0.03s to ~2.4s	Excellent performance overall. Operations requiring transformation and aggregation on all geometries (e.g., Average Area, Total Buffered Area) are the most expensive (~2s). Simpler filters or calculations are significantly faster (<0.2s). The query optimizer also intelligently skips unnecessary calculations (e.g., Simplify), resulting in near-instant times.
10 different geometric analysis operations	`GeoPandas`	`milan_buildings_...geoparquet`	• Execution Time (s) • Memory Usage (MB)	Test TODO
10 different geometric analysis operations	`PostGIS`	`planet_osm_polygon` table (filtered on Milan)	• Query Time (s)	Test TODO

Use Case 4: Complex Spatial Join

Goal: To test performance on computationally intensive join operations on large datasets.

Operation Performed	Technology	Test Dataset	Key Metrics	Notes / Qualitative Observations
Count buildings within 20m of main roads	`DuckDB Spatial`	`rome_roads.parquet` `rome_buildings.parquet`	• Query Time (s) • Result Correctness	Test TODO
Count buildings within 20m of main roads	`PostGIS`	`rome_roads`, `rome_buildings` tables with GiST indexes	• Index Creation Time (s) • Query Time (s)	Test TODO
Count buildings within 20m of main roads	`GeoPandas`	`rome_roads.gpkg` `rome_buildings.gpkg`	• Execution Time (s) • Memory Usage (MB)	Test TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tables

📊 Geospatial Performance Benchmark

Use Case 1: Data Ingestion

Use Case 2: Data Filtering and Extraction

Use Case 3: Single Table Analysis (Geometric Properties)

Use Case 4: Complex Spatial Join

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally