Skip to content

Conversation

smcguire-cmu
Copy link
Contributor

@smcguire-cmu smcguire-cmu commented Sep 17, 2025

Updates suffix behavior in crossmatch and join method to take a kwarg suffix_method to determine the suffix behavior. The two options currently are all_columns and overlapping_columns. all_columns applies suffixes to all columns in the catalogs, while overlapping_columns only renames the conflicting names and logs a warning with these renames.

The default currently is all_columns to match our current behavior, but there is a future warning if the user doesn't explicitly set the suffix method that the default will change in the future.

Closes #951 and closes #911 and closes #74

Copy link

github-actions bot commented Sep 17, 2025

Before [969705a] After [c5b65b2] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_open_many_columns_all
1.68±0.06s 1.76±0.01s 1.05 benchmarks.time_open_many_columns_default
5.82±0s 5.92±0.01s 1.02 benchmarks.time_create_large_catalog
92.0±0.8ms 92.5±0.9ms 1.01 benchmarks.time_kdtree_crossmatch
25.0±0.06ms 24.9±0.1ms 1.00 benchmarks.time_box_filter_on_partition
581±6ms 582±2ms 1.00 benchmarks.time_open_many_columns_list
891±6ms 886±3ms 0.99 benchmarks.time_create_midsize_catalog
41.6±1ms 40.0±0.9ms 0.96 benchmarks.time_polygon_search

Some benchmarks failed or their performance regressed significantly.

Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.16%. Comparing base (3aa9fb1) to head (9420c0f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1021      +/-   ##
==========================================
- Coverage   97.19%   97.16%   -0.04%     
==========================================
  Files          46       46              
  Lines        2638     2679      +41     
==========================================
+ Hits         2564     2603      +39     
- Misses         74       76       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@smcguire-cmu
Copy link
Contributor Author

I updated the code with your comments. While I was adding more tests I realized the metadata ra and dec columns weren't being updated properly, so I fixed that. The metadata also had some logic about carrying over default columns, but I remember users not liking that (If they load a catalog, xmatch, save, then load and half the columns don't load). So I just got rid of the default columns logic, but I can put it back in if you want.

Copy link
Contributor

@delucchi-cmu delucchi-cmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the straightforward readable approach for now.

Copy link

github-actions bot commented Oct 2, 2025

Before [969705a] After [fb7d2ac] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
1.70±0.06s 1.76±0.01s 1.04 benchmarks.time_open_many_columns_default
91.3±0.8ms 93.9±1ms 1.03 benchmarks.time_kdtree_crossmatch
39.8±0.3ms 40.5±0.5ms 1.02 benchmarks.time_polygon_search
5.79±0.01s 5.88±0s 1.01 benchmarks.time_create_large_catalog
879±6ms 875±4ms 1.00 benchmarks.time_create_midsize_catalog
581±7ms 579±3ms 1.00 benchmarks.time_open_many_columns_list
25.3±0.6ms 25.2±0.5ms 0.99 benchmarks.time_box_filter_on_partition

Some benchmarks failed or their performance regressed significantly.

Copy link

github-actions bot commented Oct 3, 2025

Before [5fe1775] After [f24dc14] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
25.9±0.3ms 26.6±0.3ms 1.03 benchmarks.time_box_filter_on_partition
94.1±1ms 94.6±1ms 1.01 benchmarks.time_kdtree_crossmatch
1.71±0.05s 1.72±0.06s 1.01 benchmarks.time_open_many_columns_default
41.7±0.9ms 42.1±1ms 1.01 benchmarks.time_polygon_search
5.85±0.03s 5.79±0.03s 0.99 benchmarks.time_create_large_catalog
601±8ms 593±7ms 0.99 benchmarks.time_open_many_columns_list
885±5ms 866±4ms 0.98 benchmarks.time_create_midsize_catalog

Some benchmarks failed or their performance regressed significantly.

Copy link

github-actions bot commented Oct 7, 2025

Before [5fe1775] After [20d917c] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
1.72±0.06s 1.75±0.01s 1.02 benchmarks.time_open_many_columns_default
5.81±0.01s 5.80±0.03s 1.00 benchmarks.time_create_large_catalog
587±2ms 586±2ms 1.00 benchmarks.time_open_many_columns_list
873±10ms 863±5ms 0.99 benchmarks.time_create_midsize_catalog
92.3±0.7ms 91.0±2ms 0.99 benchmarks.time_kdtree_crossmatch
25.5±0.3ms 24.9±0.1ms 0.97 benchmarks.time_box_filter_on_partition
41.1±0.6ms 40.0±0.6ms 0.97 benchmarks.time_polygon_search

Some benchmarks failed or their performance regressed significantly.

# Conflicts:
#	src/lsdb/dask/join_catalog_data.py
Copy link

github-actions bot commented Oct 8, 2025

Before [3aa9fb1] After [6016566] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
36.7±0.7ms 37.7±1ms 1.03 benchmarks.time_polygon_search
23.2±0.3ms 23.5±0.3ms 1.01 benchmarks.time_box_filter_on_partition
563±2ms 566±2ms 1.01 benchmarks.time_open_many_columns_list
82.2±1ms 82.3±1ms 1.00 benchmarks.time_kdtree_crossmatch
1.53±0.01s 1.51±0.02s 0.99 benchmarks.time_open_many_columns_default
5.52±0.02s 5.43±0.02s 0.98 benchmarks.time_create_large_catalog
830±8ms 811±6ms 0.98 benchmarks.time_create_midsize_catalog

Some benchmarks failed or their performance regressed significantly.

@smcguire-cmu smcguire-cmu merged commit 5a1a431 into main Oct 8, 2025
13 checks passed
@smcguire-cmu smcguire-cmu deleted the sean/suffixes branch October 8, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants