Skip to content

Conversation

zoghbi-a
Copy link
Contributor

@zoghbi-a zoghbi-a commented Sep 8, 2025

This PR adds several enhancements that have been added to include feedback from users (including reported issues; e.g #3387, #3321) since the last refactor to use the VO backend. The changes include:

  • Add query_by_column to allow querying of different catalog columns. The user passes a dict that is parsed into a TAP WHERE statements.
  • Add support for uploading tables when using TAP directly through query_tap.
  • Improve how maxrec works. If it is bigger than the default server limit, add a TOP statement.
  • Add automatic guessing for the data host in download_data

Because query_by_column shares code with the existing query_region, the shared part has now been moved to an internal function _query_execute, which is called by both query_by_column and query_region.

@zoghbi-a zoghbi-a marked this pull request as draft September 8, 2025 15:30
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.68%. Comparing base (b918da4) to head (eb052f6).
⚠️ Report is 119 commits behind head on main.

Files with missing lines Patch % Lines
astroquery/heasarc/core.py 95.58% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3403      +/-   ##
==========================================
+ Coverage   70.48%   70.68%   +0.20%     
==========================================
  Files         232      232              
  Lines       19977    20070      +93     
==========================================
+ Hits        14080    14186     +106     
+ Misses       5897     5884      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zoghbi-a zoghbi-a marked this pull request as ready for review September 8, 2025 16:18
@zoghbi-a zoghbi-a changed the title ENH: Add ability to query columns and upload tables Add ability to query columns and upload tables Sep 8, 2025
@bsipocz bsipocz added this to the 0.4.12 milestone Sep 18, 2025
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoghbi-a - Thanks for the PR, I have some review comments but this should be close to be mergeable.

The most substantial one where I would also ask Adam's insight is the naming of the new method as we have and can have in the future a similar functionality in other modules, too, so some consistency would be nice.

Parameters
----------
links : `astropy.table.Table` or `astropy.table.Row`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably you called this parameter links because these suppose to be datalinks? If yes, please add one more sentence that explains it.

return 'aws'
return 'heasarc'

def download_data(self, links, host=None, location='.'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not a change, but while we're at it, or in a follow-up, please make optional arguments keyword only for future proofing this.

Suggested change
def download_data(self, links, host=None, location='.'):
def download_data(self, links, *, host=None, location='.'):

return self.query_region(pos, catalog=mission, spatial='cone',
get_query_payload=get_query_payload)

def query_by_column(self, catalog, params, *,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment here is about the naming, we don't have methods named this, but we do have query_criteria. How would you feel about using that, too?

Also, I would suggest to use something else then 'params', maybe constraint, or filter, or condition, or something even more descriptive?

cc @keflavich to have multiple eyes/opinions on the API naming choices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see another archive supporting this type of query, it's incredibly useful in Vizier:
https://astroquery.readthedocs.io/en/latest/vizier/vizier.html#specifying-keywords-output-columns-and-constraints-on-columns

I favor query_constraints because it's a pre-existing nomenclature that astroquery users may be accustomed to.

Instead of params, how about column_filters, again by analogy to Vizier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting query_criteria as it's used in more modules, but OTOH I admit it's kind of confusing naming. btw, here is the quick grep of what we have right now, including this PR:

   1 query_aliases
   1 query_apex_quicklooks
   7 query_async
   1 query_bibcode
   1 query_bibobj
   1 query_by_column
   1 query_catalog
   1 query_constraints_async
   3 query_criteria
   4 query_criteria_async
   1 query_criteria_count
   1 query_cross_id
   1 query_cross_id_async
   1 query_crossid_async
   1 query_hierarchy
   1 query_hips
   1 query_hsa_tap
   1 query_hsc_matchid_async
   1 query_ida_tap
   1 query_ids_catalogs
   1 query_ids_maps
   1 query_ids_spectra
   1 query_instrument
   4 query_lines_async
   1 query_loc
   1 query_main
   1 query_metadata
   1 query_mission_cols
   1 query_mission_list
   1 query_molecule
   1 query_name_async
  10 query_object
  13 query_object_async
   1 query_object_catalogs
   1 query_object_count
   1 query_object_maps
   1 query_object_spectra
   1 query_objectids
   1 query_objects
   1 query_objects_async
   1 query_observations
   1 query_photoobj_async
   1 query_planet
   1 query_raw
   1 query_refcode
   1 query_refcode_async
  12 query_region
  16 query_region_async
   1 query_region_catalogs
   1 query_region_count
   1 query_region_iau
   1 query_region_iau_async
   1 query_region_maps
   1 query_region_sia
   1 query_region_spectra
   2 query_sia
   1 query_simple
   1 query_specobj_async
   2 query_sql_async
   1 query_ssa
   1 query_sso
   1 query_surveys
   6 query_tap
   2 query_target
   1 query_with_wcs
   1 query_with_wcs_async
   1 query_xsa_tap

Comment on lines +291 to +292
return self.tap.search(
query, language='ADQL', maxrec=maxrec, uploads=uploads)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick as it doesn't matter really, but just a FYI: we allow long (120) lines here, no need to do these kind of line breaks

Comment on lines +354 to +356
# if maxrec is more than the server limit, we set a higher limit
if maxrec is not None and maxrec > 100000:
adql = adql.replace('SELECT ', f'SELECT TOP {maxrec*4} ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain this logic a little bit?

return cols

def query_tap(self, query, *, maxrec=None):
def query_tap(self, query, *, maxrec=None, uploads=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think propagating the table uploads could be super useful for the other methods calling query_tap internally, so please consider adding it to query_region and the new query_by_column, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants