-
-
Notifications
You must be signed in to change notification settings - Fork 423
Add ability to query columns and upload tables #3403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3403 +/- ##
==========================================
+ Coverage 70.48% 70.68% +0.20%
==========================================
Files 232 232
Lines 19977 20070 +93
==========================================
+ Hits 14080 14186 +106
+ Misses 5897 5884 -13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zoghbi-a - Thanks for the PR, I have some review comments but this should be close to be mergeable.
The most substantial one where I would also ask Adam's insight is the naming of the new method as we have and can have in the future a similar functionality in other modules, too, so some consistency would be nice.
Parameters | ||
---------- | ||
links : `astropy.table.Table` or `astropy.table.Row` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably you called this parameter links
because these suppose to be datalinks? If yes, please add one more sentence that explains it.
return 'aws' | ||
return 'heasarc' | ||
|
||
def download_data(self, links, host=None, location='.'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not a change, but while we're at it, or in a follow-up, please make optional arguments keyword only for future proofing this.
def download_data(self, links, host=None, location='.'): | |
def download_data(self, links, *, host=None, location='.'): |
return self.query_region(pos, catalog=mission, spatial='cone', | ||
get_query_payload=get_query_payload) | ||
|
||
def query_by_column(self, catalog, params, *, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comment here is about the naming, we don't have methods named this, but we do have query_criteria
. How would you feel about using that, too?
Also, I would suggest to use something else then 'params', maybe constraint, or filter, or condition, or something even more descriptive?
cc @keflavich to have multiple eyes/opinions on the API naming choices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see another archive supporting this type of query, it's incredibly useful in Vizier:
https://astroquery.readthedocs.io/en/latest/vizier/vizier.html#specifying-keywords-output-columns-and-constraints-on-columns
I favor query_constraints
because it's a pre-existing nomenclature that astroquery users may be accustomed to.
Instead of params
, how about column_filters
, again by analogy to Vizier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting query_criteria
as it's used in more modules, but OTOH I admit it's kind of confusing naming. btw, here is the quick grep of what we have right now, including this PR:
1 query_aliases
1 query_apex_quicklooks
7 query_async
1 query_bibcode
1 query_bibobj
1 query_by_column
1 query_catalog
1 query_constraints_async
3 query_criteria
4 query_criteria_async
1 query_criteria_count
1 query_cross_id
1 query_cross_id_async
1 query_crossid_async
1 query_hierarchy
1 query_hips
1 query_hsa_tap
1 query_hsc_matchid_async
1 query_ida_tap
1 query_ids_catalogs
1 query_ids_maps
1 query_ids_spectra
1 query_instrument
4 query_lines_async
1 query_loc
1 query_main
1 query_metadata
1 query_mission_cols
1 query_mission_list
1 query_molecule
1 query_name_async
10 query_object
13 query_object_async
1 query_object_catalogs
1 query_object_count
1 query_object_maps
1 query_object_spectra
1 query_objectids
1 query_objects
1 query_objects_async
1 query_observations
1 query_photoobj_async
1 query_planet
1 query_raw
1 query_refcode
1 query_refcode_async
12 query_region
16 query_region_async
1 query_region_catalogs
1 query_region_count
1 query_region_iau
1 query_region_iau_async
1 query_region_maps
1 query_region_sia
1 query_region_spectra
2 query_sia
1 query_simple
1 query_specobj_async
2 query_sql_async
1 query_ssa
1 query_sso
1 query_surveys
6 query_tap
2 query_target
1 query_with_wcs
1 query_with_wcs_async
1 query_xsa_tap
return self.tap.search( | ||
query, language='ADQL', maxrec=maxrec, uploads=uploads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick as it doesn't matter really, but just a FYI: we allow long (120) lines here, no need to do these kind of line breaks
# if maxrec is more than the server limit, we set a higher limit | ||
if maxrec is not None and maxrec > 100000: | ||
adql = adql.replace('SELECT ', f'SELECT TOP {maxrec*4} ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain this logic a little bit?
return cols | ||
|
||
def query_tap(self, query, *, maxrec=None): | ||
def query_tap(self, query, *, maxrec=None, uploads=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think propagating the table uploads could be super useful for the other methods calling query_tap
internally, so please consider adding it to query_region
and the new query_by_column
, too.
This PR adds several enhancements that have been added to include feedback from users (including reported issues; e.g #3387, #3321) since the last refactor to use the VO backend. The changes include:
query_by_column
to allow querying of different catalog columns. The user passes a dict that is parsed into a TAP WHERE statements.query_tap
.download_data
Because
query_by_column
shares code with the existingquery_region
, the shared part has now been moved to an internal function_query_execute
, which is called by bothquery_by_column
andquery_region
.