-
Notifications
You must be signed in to change notification settings - Fork 12
DM-50550: Add exception diagnostics table to quantum provenance #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #495 +/- ##
==========================================
- Coverage 82.68% 82.21% -0.47%
==========================================
Files 111 111
Lines 14334 14428 +94
Branches 1834 1862 +28
==========================================
+ Hits 11852 11862 +10
- Misses 2032 2117 +85
+ Partials 450 449 -1 ☔ View full report in Codecov by Sentry. |
a513b9c
to
237da91
Compare
237da91
to
3117fa7
Compare
# lookup later. Querying per data ID in the loop is painfully slow. | ||
if butler: | ||
exposure_record_lookup = { | ||
d.dataId["exposure"]: d for d in butler.query_dimension_records("exposure", explain=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I could limit butler.query_dimension_records
to just the data IDs with exceptions in task_summary.exceptions
, instead of pulling everything. But the butler.query_dimension_records
method only takes one dataId
if passed, and I don't think it's possible to use where
to filter those sorts of things. But it works and is fast enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @timj knows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine for me if you already know the exposure IDs:
>>> butler.query_dimension_records("exposure", where="exposure in (:exp)", bind={"exp": (903334, 903336)}, instrument="HSC")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use
with butler.query() as query:
query = query.join_data_coordinates(<list of exposure data IDs>)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, change .required
to .mapping
in this line:
self.caveats.setdefault(code, []).append(dict(info["data_id"].required)) |
and then the data IDs you're starting with should have all of those implied values. They're already in the data IDs in the QG, but we were dropping them on the floor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will give it a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
39d0914
to
0b5f032
Compare
0b5f032
to
adb8ea2
Compare
# output table, if available. Note that 'band', 'day_obs', and | ||
# 'physical_filter' already exist in `exception.data_id` below. | ||
needed_visit_records = ["exposure_time", "target_name"] | ||
needed_exposure_records = ["exposure_time", "target_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sorts of strings should be passed in by the caller, not included in the code; I'm imagining something like an argument that's a list of extra quantities to put in the table.
You might then be able to use:
If you can initialize that with some dimensions (i.e. from the data IDs that have a particular exception), you can give it strings like "exposure_time" or "target_name" and it will figure out which dimension they belong to (of the ones it has been given, since those kinds of columns aren't always unique).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We modified the code so that here it calls an entry point that returns a dict of dimensions with list values of the metadata fields to be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enourbakhsh I don't see the new code we worked on here. Did you forget to push it? It was a big change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timj by the time I had it polished, @cmsaunders was already in the middle of the first review round. I didn’t want to interfere, so I’m planning to push those changes in the second round.
@@ -655,7 +660,7 @@ def _add_quantum_info( | |||
self.recovered_quanta.append(dict(info["data_id"].required)) | |||
if final_quantum_run is not None and final_quantum_run.caveats: | |||
code = final_quantum_run.caveats.concise() | |||
self.caveats.setdefault(code, []).append(dict(info["data_id"].required)) | |||
self.caveats.setdefault(code, []).append(dict(info["data_id"].mapping)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to change this?
butler : `lsst.daf.butler.Butler`, optional | ||
The butler used to create this summary. This is only used to get | ||
exposure dimension records for the exception diagnostics. | ||
return_exception_diagnostics_table : `bool`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any advantage to having this option. If show_exception_diagnostics
is true, then you can return this, and it can just be ignored if it's not needed. Then you can avoid some of the if/elses and the ValueError below.
------- | ||
table : `astropy.table.Table` | ||
Table with one row per data ID and columns for exception types (by | ||
task), and optionally, exposure dimension records and exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more clear here to write "columns for exception type and, optionally, dimension records and exception message."
# Define a hashable and stable tuple of data ID values. | ||
key = tuple(sorted(data_id.mapping.items())) | ||
assert len(rows[key]) == 0, f"Multiple exceptions for one data ID: {key}" | ||
assert rows[key]["Exception"] == "", f"Duplicate entry for data ID {key} in {task_label}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how the previous line could pass and then this line fail. If len(rows[key]) ==0
, doesn't that mean that nothing has been set for rows[key]["Exception"]
?
row.update(dict(key)) | ||
# Add dimension records next, if requested and available. | ||
if add_dimension_records: | ||
if visit_data_ids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check whether the data_id for this row is in visit_data_ids (or in exposure_data_ids at line 1289)?
This all looks reasonable, but I am concerned about how it runs on qgraphs that are not per-detector. I ran it on a qgraph with tract-level tasks and didn't get the Task-Exception-Count table, or the new exception diagnostics table. |
Checklist
doc/changes