-
Notifications
You must be signed in to change notification settings - Fork 0
Tables
Alessandro Demo edited this page Jul 14, 2025
·
14 revisions
This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.
Goal: To measure the one-time cost of preparing OSM data (.pbf
) for analysis, either by loading it into a database or by creating a smaller subset.
Operation Performed | Technology | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|
Load entire .pbf into a database |
osm2pgsql + PostGIS |
italy-latest.osm.pbf |
• Import Time: ~1840s (~31 min) • Disk Space: 19 GB |
Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries. |
Extract a regional .pbf from a national .pbf
|
osmium-tool |
italy-latest.osm.pbf -> lombardy.geojson -> lombardy-latest.osm.pbf |
• Execution Time: ~21s | Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium. |
(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2).
Goal: To measure the efficiency of extracting a specific subset of data from a larger source.
Operation Performed | Technology | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|
Extract buildings from .pbf and save to .geoparquet
|
QuackOSM |
italy-latest.osm.pbf (filtered on Milan) |
• Execution Time: ~349s • Output Size: 6.97 MB |
Self-contained workflow. Handles MemoryErrors gracefully. Final building count for Milan: 62,133. |
Extract buildings from DB | PostGIS |
planet_osm_polygon table (query on Milan) |
• Query Time: ~0.4s | Extremely fast due to pre-existing spatial indexes created during the Use Case 1 ingestion phase. Final building count for Milan: 62,127. |
Read .pbf , build GeoDataFrame, and filter by area |
PyOsmium + GeoPandas |
lombardy-latest.osm.pbf (filtered on Milan) |
• Execution Time: ~316s • Memory Usage: Very High |
Pure Python approach. Final building count for Milan: 62,015. The result is now consistent after implementing an advanced handler that processes complex relations (multipolygons). |
Goal: Evaluate performance on calculations that do not require joins, but operate on geometries.
Operation Performed | Technology | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|
10 different geometric analysis operations | DuckDB Spatial |
milan_buildings_...geoparquet |
• Execution Times: Ranged from ~0.03s to ~2.4s | Excellent performance overall. Operations requiring transformation and aggregation on all geometries (e.g., Average Area, Total Buffered Area) are the most expensive (~2s). Simpler filters or calculations are significantly faster (<0.2s). The query optimizer also intelligently skips unnecessary calculations (e.g., Simplify), resulting in near-instant times. |
10 different geometric analysis operations | GeoPandas |
milan_buildings_...geoparquet |
• Execution Time (s) • Memory Usage (MB) |
Test TODO |
10 different geometric analysis operations | PostGIS |
planet_osm_polygon table (filtered on Milan) |
• Query Time (s) | Test TODO |
Goal: To test performance on computationally intensive join operations on large datasets.
Operation Performed | Technology | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|
Count buildings within 20m of main roads | DuckDB Spatial |
rome_roads.parquet rome_buildings.parquet
|
• Query Time (s) • Result Correctness |
Test TODO |
Count buildings within 20m of main roads | PostGIS |
rome_roads , rome_buildings tables with GiST indexes |
• Index Creation Time (s) • Query Time (s) |
Test TODO |
Count buildings within 20m of main roads | GeoPandas |
rome_roads.gpkg rome_buildings.gpkg
|
• Execution Time (s) • Memory Usage (MB) |
Test TODO |