-
Notifications
You must be signed in to change notification settings - Fork 0
Use case 3
Alessandro Demo edited this page Jun 23, 2025
·
2 revisions
-
Goal: To test performance on computationally intensive join operations on large datasets.
-
Example task: Given the road network and buildings in Rome, count for each main road (
highway = 'primary'
) how many buildings are within 20 meters. -
Workflows to compare:
-
DuckDB: The query with
ST_DWithin
that I have already analyzed and seen to be slow. I should also discuss the optimization strategies I found (e.g. batching). -
PostGIS: The same query, but it will transparently use spatial indexes (to be created first).
-
GeoPandas:
gpd.sjoin_nearest(roads, buildings, max_distance=20)
- Metrics: Execution time, correctness of results, complexity of the code.