DM-50550: Add exception diagnostics table to quantum provenance #495

enourbakhsh · 2025-05-08T09:28:14Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2025-05-08T09:30:31Z

Codecov Report

Attention: Patch coverage is 9.37500% with 87 lines in your changes missing coverage. Please review.

Project coverage is 82.21%. Comparing base (924f491) to head (adb8ea2).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
python/lsst/pipe/base/quantum_provenance_graph.py	9.37%	87 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
- Coverage   82.68%   82.21%   -0.47%     
==========================================
  Files         111      111              
  Lines       14334    14428      +94     
  Branches     1834     1862      +28     
==========================================
+ Hits        11852    11862      +10     
- Misses       2032     2117      +85     
+ Partials      450      449       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

enourbakhsh · 2025-05-12T20:35:09Z

python/lsst/pipe/base/quantum_provenance_graph.py

+        # lookup later. Querying per data ID in the loop is painfully slow.
+        if butler:
+            exposure_record_lookup = {
+                d.dataId["exposure"]: d for d in butler.query_dimension_records("exposure", explain=False)


I wish I could limit butler.query_dimension_records to just the data IDs with exceptions in task_summary.exceptions, instead of pulling everything. But the butler.query_dimension_records method only takes one dataId if passed, and I don't think it's possible to use where to filter those sorts of things. But it works and is fast enough.

Maybe @timj knows?

This works fine for me if you already know the exposure IDs:

>>> butler.query_dimension_records("exposure", where="exposure in (:exp)", bind={"exp": (903334, 903336)}, instrument="HSC")

Use

with butler.query() as query: query = query.join_data_coordinates(<list of exposure data IDs>) ...

Actually, change .required to .mapping in this line:

pipe_base/python/lsst/pipe/base/quantum_provenance_graph.py

Line 658 in 924f491

self.caveats.setdefault(code, []).append(dict(info["data_id"].required))

and then the data IDs you're starting with should have all of those implied values. They're already in the data IDs in the QG, but we were dropping them on the floor.

Will give it a try!

TallJimbo · 2025-05-14T00:55:06Z

python/lsst/pipe/base/quantum_provenance_graph.py

+        # output table, if available. Note that 'band', 'day_obs', and
+        # 'physical_filter' already exist in `exception.data_id` below.
+        needed_visit_records = ["exposure_time", "target_name"]
+        needed_exposure_records = ["exposure_time", "target_name"]


These sorts of strings should be passed in by the caller, not included in the code; I'm imagining something like an argument that's a list of extra quantities to put in the table.

You might then be able to use:

https://github.com/lsst/daf_butler/blob/07d9adb98d9bf763df84e9f7db83d4cfbb6c53d7/python/lsst/daf/butler/queries/_identifiers.py#L80

If you can initialize that with some dimensions (i.e. from the data IDs that have a particular exception), you can give it strings like "exposure_time" or "target_name" and it will figure out which dimension they belong to (of the ones it has been given, since those kinds of columns aren't always unique).

We modified the code so that here it calls an entry point that returns a dict of dimensions with list values of the metadata fields to be included.

@enourbakhsh I don't see the new code we worked on here. Did you forget to push it? It was a big change.

@timj by the time I had it polished, @cmsaunders was already in the middle of the first review round. I didn’t want to interfere, so I’m planning to push those changes in the second round.

cmsaunders · 2025-05-23T13:38:19Z

python/lsst/pipe/base/quantum_provenance_graph.py

@@ -655,7 +660,7 @@ def _add_quantum_info(
                    self.recovered_quanta.append(dict(info["data_id"].required))
                if final_quantum_run is not None and final_quantum_run.caveats:
                    code = final_quantum_run.caveats.concise()
-                    self.caveats.setdefault(code, []).append(dict(info["data_id"].required))
+                    self.caveats.setdefault(code, []).append(dict(info["data_id"].mapping))


Why did you need to change this?

cmsaunders · 2025-05-23T13:58:17Z

python/lsst/pipe/base/quantum_provenance_graph.py

+        butler : `lsst.daf.butler.Butler`, optional
+            The butler used to create this summary. This is only used to get
+            exposure dimension records for the exception diagnostics.
+        return_exception_diagnostics_table : `bool`, optional


I don't think there's any advantage to having this option. If show_exception_diagnostics is true, then you can return this, and it can just be ignored if it's not needed. Then you can avoid some of the if/elses and the ValueError below.

cmsaunders · 2025-05-23T14:10:11Z

python/lsst/pipe/base/quantum_provenance_graph.py

+        -------
+        table : `astropy.table.Table`
+            Table with one row per data ID and columns for exception types (by
+            task), and optionally, exposure dimension records and exception


It would be more clear here to write "columns for exception type and, optionally, dimension records and exception message."

cmsaunders · 2025-05-23T14:22:36Z

python/lsst/pipe/base/quantum_provenance_graph.py

+                    # Define a hashable and stable tuple of data ID values.
+                    key = tuple(sorted(data_id.mapping.items()))
+                    assert len(rows[key]) == 0, f"Multiple exceptions for one data ID: {key}"
+                    assert rows[key]["Exception"] == "", f"Duplicate entry for data ID {key} in {task_label}"


I don't understand how the previous line could pass and then this line fail. If len(rows[key]) ==0, doesn't that mean that nothing has been set for rows[key]["Exception"]?

cmsaunders · 2025-05-23T15:23:34Z

python/lsst/pipe/base/quantum_provenance_graph.py

+            row.update(dict(key))
+            # Add dimension records next, if requested and available.
+            if add_dimension_records:
+                if visit_data_ids:


Don't you need to check whether the data_id for this row is in visit_data_ids (or in exposure_data_ids at line 1289)?

cmsaunders · 2025-05-27T20:25:49Z

This all looks reasonable, but I am concerned about how it runs on qgraphs that are not per-detector. I ran it on a qgraph with tract-level tasks and didn't get the Task-Exception-Count table, or the new exception diagnostics table.

enourbakhsh force-pushed the tickets/DM-50550 branch 16 times, most recently from a513b9c to 237da91 Compare May 11, 2025 22:40

Add exception diagnostics table to quantum provenance

3117fa7

enourbakhsh force-pushed the tickets/DM-50550 branch from 237da91 to 3117fa7 Compare May 11, 2025 22:43

Add release note for exception diagnostics

0c7c857

enourbakhsh commented May 12, 2025

View reviewed changes

enourbakhsh force-pushed the tickets/DM-50550 branch 2 times, most recently from 39d0914 to 0b5f032 Compare May 13, 2025 20:38

Refactor dimension records logic

adb8ea2

enourbakhsh force-pushed the tickets/DM-50550 branch from 0b5f032 to adb8ea2 Compare May 13, 2025 20:39

TallJimbo reviewed May 14, 2025

View reviewed changes

cmsaunders reviewed May 27, 2025

View reviewed changes

DM-50550: Add exception diagnostics table to quantum provenance #495

Are you sure you want to change the base?

DM-50550: Add exception diagnostics table to quantum provenance #495

Conversation

enourbakhsh commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

codecov bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

enourbakhsh May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TallJimbo May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmsaunders commented May 27, 2025

Uh oh!

Uh oh!

enourbakhsh commented May 8, 2025 •

edited

Loading

codecov bot commented May 8, 2025 •

edited

Loading

enourbakhsh May 12, 2025 •

edited

Loading

TallJimbo May 14, 2025 •

edited

Loading