Skip to content

Toast 3 Work in Progress #369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 707 commits into
base: master
Choose a base branch
from
Draft

Toast 3 Work in Progress #369

wants to merge 707 commits into from

Conversation

tskisner
Copy link
Member

Hi @zonca and @keskitalo, this PR is for your feedback on API changes that we discussed offline. In addition to looking at the source, I have been updating the tutorials/01_Introduction/intro.ipynb notebook as a "look and feel" example. I have attached a rendered version of the notebook:

intro.pdf

Main features are:

  • Observation class as the new data model, with detdata, shared, intervals, and view attributes as the key places where the contents are influenced.

  • Improved processing model with changes to the Operator class and the new Pipeline operator. These classes are configured with traitlets.

  • New distributed map classes (PixelDistribution and PixelData) which split the calculation of the distribution from the actual data storage. These have the new Alltoallv communication pattern.

There are only 2-3 operators that have been ported to the new API as a demo. I'll continue on some config file work that needs to be updated since the switch to traitlets.

@tskisner tskisner requested review from zonca and keskitalo October 15, 2020 20:50
@tskisner tskisner marked this pull request as draft October 15, 2020 20:50
@zonca
Copy link
Member

zonca commented Oct 15, 2020

I think the easiest way to provide feedback would be to make an export of the notebook to a Python script in a separate pull request, so we can do line by line feedback there. Then the pull request can be closed without merging.

@tskisner
Copy link
Member Author

Good idea @zonca, will do that soon.

Copy link
Member Author

Ok @zonca, I enabled the reviewnb plugin on the toast repo. I think you can browse here:

https://app.reviewnb.com//pull/369/files/

and comment on the per-cell level of the intro.ipynb file. Since a lot has changed, there is a switch to "hide previous version". Let me know if that works, since I can't tell if this plugin is usable by everyone.

@tskisner
Copy link
Member Author

Note that the output of the notebook is stripped on github, so refer to the PDF attached to this issue to look at that.

@tskisner
Copy link
Member Author

Updated notebook output, with config section.
intro.pdf

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention if you can modify this in place or it is read-only. If it is read-only, how do I modify it?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, some options are fixed by the OS runtime environment of python, but some can be changed after toast is imported. Will give more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added text about setting log level manually or through environment. Same with threading.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better specify that "traditional CPU systems" means a supercomputer, otherwise it seems it is also required on a laptop.


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the user has an AMD Ryzen workstation with 16 cores (for example), then they probably want to use mpi4py if they are doing something more with toast than just running a notebook with a few samples. I will definitely clarify though. I have started an "intro_parallel.ipynb" where I am going to discuss using IPython.parallel with mpi4py. I'll reference that in the serial notebook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify that toast parallelism is mainly through MPI, so that any system with more than a few cores will benefit from having mpi4py installed.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to implement units here?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. astropy.units are a new addition to toast, and currently only used in the new Operator traits. I need to systematically go through the codebase and add support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I converted the fake focalplane simulation and plotting functions to use units. However, I'll wait on the rest of the instrument classes until we can revisit the overall plan for those.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if detlabels is None, you could use the keys as labels, so we avoid to build the trivial dict x:x.

please use keyword arguments for all inputs, so people don't have to look at the help of plot_focalplane

For the color, what about using endswith("A") instead of enumerating?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is cleaned up.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer namespacing, what about import toast.config as tc?


Reply via ReviewNB

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either it is an attribute, so other_simsat.config

or it is a method then needs to have a verb in the name:

other_simsat.get_config()




Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods are now get_config() and get_class_config() 

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inspired by the traitlets methods traits() and class_traits(), but I can add a "get_" in there if it is more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units are beautiful in the file!


Reply via ReviewNB

Copy link
Member

zonca commented Oct 16, 2020

I had heard about it but first time I used reviewnb , it is awesome!

@tskisner, I think the Toast interface looks really good, good job! I have a bit more feedback in the notebook.

@tskisner
Copy link
Member Author

Thanks @zonca for the detailed review, just what I was looking for! On overall question for objects that act like a dictionary, but which have other state information. For example, obs.detdata stores a DetectorData objects, but also information about the observation instance like number of samples. So the call to

obs.detdata.create("foo", shape=(5,), dtype=np.int64)

actually creates a DetectorData object under the hood using the number of observation samples. I could also do:

obs.detdata["foo"] = DetectorData(obs.detectors, (obs.n_sample, 5), dtype=np.int64)

And then have setitem check that the number of samples matches that in the observation. This did not seem as convenient to me, but I do hate typing :-)

For the MPI shared memory class, I could replace the set method like this:

# Slice of buffer set by offset and shape of data
obs.shared["foo"].set(data, offset=(3, 6), fromrank=2)

or

# Slice of buffer set by explicity range
obs.shared["foo"][3:7, 6:8] = (data, 2)

or something else?

@zonca
Copy link
Member

zonca commented Oct 16, 2020

inside __setitem__ you can do:

    def __setitem__(self, key, value):
        # if key is undefined
        data = DetectorData(self.detectors, (self.n_sample,)+ value.shape, dtype=value.dtype)
        # set this into the dict

for the set I don't understand, what do you need the fromrank for?

@tskisner
Copy link
Member Author

I'll try to clarify... The detdata attribute holds DetectorData objects. So I can certainly modify setitem to do sanity checks on the input value and make sure that the shape is compatible and if so, just assign to the internal dictionary. However, that requires the user to pre-create the DetectorData object first, using information from the observation instance. It seemed like more work for the user this way...

For the MPIshared class, the set() operation is collective and takes data from one process and broadcasts to shared memory across multiple nodes. So there is an argument (fromrank) to determine which process has the data. I guess I could figure that out though by doing an allreduce first to see which process has a non-None value.

@tskisner
Copy link
Member Author

I think I understand now- you want to allow obs.detdata setitem to take a numpy array and assume that it is the full 2D contents of the underlying DetectorData. So then you could do:

obs.detdata["foo"] = np.zeros((len(obs.detectors), obs.n_sample), dtype=np.int64)

I can implement that, but still not sure it is more convenient. On the other hand, no reason not to support multiple ways of assignment!

@tskisner
Copy link
Member Author

Ahhh, now I see- you can create the full-size DetectorData object first, and then the slicing notation can be applied when actually assigning from the RHS.

Ok, I will try this out. I agree this would be a more convenient interface. I'll also try to modify the MPIshared package to make the set() method optional at the cost of a precalculation.

@tskisner
Copy link
Member Author

I have added setitem support to the upstream pshmem package:

https://github.com/tskisner/pshmem/releases/tag/0.2.5

And this new version is available on PyPI:

https://pypi.org/project/pshmem/0.2.5/

So now I can work on using that syntax in toast.

@tskisner
Copy link
Member Author

Ok, I think I have concluded that the create() methods cannot be replaced in the proposed way. The Observation attributes detdata and shared do not have access to the right-hand-side of the assignment, since that is passed to the DetectorData.__setitem__() method after the manager class __getitem__() is called. To simplify the example, look at this snippet of code:

import numpy as np

class DetData:

    def __init__(self, ndet, shape, dtype):
        self.dtype = np.dtype(dtype)
        self.shape = [ndet]
        self.flatshape = ndet
        for s in shape:
            self.shape.append(s)
            self.flatshape *= s
        self.flatdata = np.zeros(self.flatshape, dtype=self.dtype)
        self.data = self.flatdata.reshape(self.shape)

    def __getitem__(self, key):
        print("DetData getitem {}".format(key))
        return self.data[key]

    def __setitem__(self, key, value):
        print("DetData setitem {}".format(key))
        self.data[key] = value

    def __repr__(self):
        return str(self.data)

class Mgr:
    
    def __init__(self, ndet):
        self.ndet = ndet
        self.d = dict()

    def create(self, name, shape, dtype):
        self.d[name] = DetData(self.ndet, shape, dtype)
        
    def __getitem__(self, key):
        print("Calling Mgr getitem")
        if key not in self.d:
            # Cannot guess what shape and dtype the user wants
            pass
        return self.d[key]

    def __setitem__(self, key, value):
        print("Calling Mgr setitem")
        self.d[key] = value


mgr = Mgr(2)

# This works fine, as expected:

mgr.create("A", (3, 4), np.int32)

mgr["A"][1, 0:2, 0:2] = 5

print("mgr['A'] = \n", mgr["A"])

# This works, but it is annoying, since the user has to know the name
# of the DetData class and also has to get information from the Mgr
# class:

mgr["B"] = DetData(mgr.ndet, (3, 4), np.int32)

mgr["B"][1, 0:2, 0:2] = 5

print("mgr['B'] = \n", mgr["B"])

# The code below is actually doing:
#
# Mgr.__getitem__("C").__setitem(tuple, 5)
#
# Which means that the DetData class would have to be instantiated in
# Mgr.__getitem__() where we don't know the correct shape of the buffer
# to create.  Obviously this gives a key error:

mgr["C"][1, 0:2, 0:2] = 5

print("mgr['C'] = \n", mgr["C"])

The output of the above script is:

Calling Mgr getitem
DetData setitem (1, slice(0, 2, None), slice(0, 2, None))
Calling Mgr getitem
mgr['A'] = 
 [[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[5 5 0 0]
  [5 5 0 0]
  [0 0 0 0]]]
Calling Mgr setitem
Calling Mgr getitem
DetData setitem (1, slice(0, 2, None), slice(0, 2, None))
Calling Mgr getitem
mgr['B'] = 
 [[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[5 5 0 0]
  [5 5 0 0]
  [0 0 0 0]]]
Calling Mgr getitem
Traceback (most recent call last):
  File "setitem.py", line 75, in <module>
    mgr["C"][1, 0:2, 0:2] = 5
  File "setitem.py", line 40, in __getitem__
    return self.d[key]
KeyError: 'C'

@zonca
Copy link
Member

zonca commented Oct 17, 2020

you first need to create the thing before slicing it:

mgr = Mgr(2)

mgr["A"] = np.zeros((3,4), dtype=np.int32)

mgr["A"][1, 0:2, 0:2] = 5

It doesn't work now, but it should be implementable.

@tskisner
Copy link
Member Author

Hi @zonca, thanks for your patience, and sorry if I am being dense :-) Below I updated the toy code to be closer to the real case. The central problem is that when assigning data (see the mgr["C"] case), we don't know which detector and sample range is being specified. If the user assigns data for the full array (see the mgr["D"] case), then we can assign it, but otherwise all we can do is create the buffer and then leave it.

import numpy as np

class DetData:
    def __init__(self, ndet, shape, dtype):
        self.dtype = np.dtype(dtype)
        self.shape = [ndet]
        self.flatshape = ndet
        for s in shape:
            self.shape.append(s)
            self.flatshape *= s
        self.flatdata = np.zeros(self.flatshape, dtype=self.dtype)
        self.data = self.flatdata.reshape(self.shape)

    def __getitem__(self, key):
        return self.data[key]

    def __setitem__(self, key, value):
        self.data[key] = value

    def __repr__(self):
        return str(self.data)


class Mgr:
    def __init__(self, ndet, nsample):
        self.ndet = ndet
        self.nsample = nsample
        self.d = dict()

    def create(self, name, sample_shape, dtype):
        self.d[name] = DetData(self.ndet, (self.nsample,) + sample_shape, dtype)

    def __getitem__(self, key):
        return self.d[key]

    def __setitem__(self, key, value):
        if isinstance(value, DetData):
            self.d[key] = value
        else:
            # This is an array, verify that the number of dimensions match
            sample_shape = None
            if len(value.shape) < 2:
                raise RuntimeError("Assigned array does not have sufficient dimensions")
            elif len(value.shape) == 2:
                # We assume the user meant one scalar value per sample...
                sample_shape = (1,)
            else:
                # The first two dimensions are detector and sample.  The rest of the
                # dimensions are the data shape for every sample and must be fully
                # specified when creating data like this.
                sample_shape = value.shape[2:]
            print(
                "Creating DetData with {} dets, {} samples, {} samp shape".format(
                    self.ndet, self.nsample, sample_shape
                )
            )
            self.d[key] = DetData(
                self.ndet, (self.nsample,) + sample_shape, value.dtype
            )
            # If the value has the full size of the DetData object, then we can do the
            # assignment, if not, then we cannot guess what detector / sample slice
            # the user is trying to assign.
            if (value.shape[0] == self.ndet) and (value.shape[1] == self.nsample):
                # We can do it!
                self.d[key][:] = value


# 2 detectors and 5 samples
mgr = Mgr(2, 5)

# This works fine, as expected:

mgr.create("A", (3, 4), np.int32)
mgr["A"][1, 2:3, 0:2, 0:2] = 5
print("mgr['A'] = \n", mgr["A"])

# This works, but it is annoying, since the user has to know the name
# of the DetData class and also has to get information from the Mgr
# class:

mgr["B"] = DetData(mgr.ndet, (mgr.nsample, 3, 4), np.int32)
mgr["B"][1, 2:3, 0:2, 0:2] = 5
print("mgr['B'] = \n", mgr["B"])

# This creates a buffer with the full number of detectors and samples and uses the
# last dimensions of the RHS to determine the shape of the data per sample.  However,
# we have no information about what LHS slice we are assigning the RHS data to.  UNLESS
# the user gives a RHS data with the full n_detector x n_sample data set:

# mgr["C"] is created by not assigned, since we don't know where to assign the data
# along the first 2 axes (detector and sample).
mgr["C"] = np.ones((1, 1, 3, 4), dtype=np.int32)
mgr["C"][1, 2:3, 0:2, 0:2] = 5
print("mgr['C'] = \n", mgr["C"])

# mgr["D"] is created AND assigned, since we specify data of the full size.
mgr["D"] = np.ones((mgr.ndet, mgr.nsample, 3, 4), dtype=np.int32)
mgr["D"][1, 2:3, 0:2, 0:2] = 5
print("mgr['D'] = \n", mgr["D"])

I think that the mgr["C"] case is actually very confusing, since we are using the right hand side value just to get dimensions but not actually placing those values into the new array (since we don't have information about the location of that data in the full array).

How about we support cases A, B, and D:

  • Explicit create method
  • Assignment of a pre-created DetData, with checks for compatible number of detectors and samples
  • Assignment of a numpy array with the full-size data.

Does that seem acceptable?

@zonca
Copy link
Member

zonca commented Oct 19, 2020

here:

# mgr["C"] is created by not assigned, since we don't know where to assign the data
# along the first 2 axes (detector and sample).
mgr["C"] = np.ones((1, 1, 3, 4), dtype=np.int32)
mgr["C"][1, 2:3, 0:2, 0:2] = 5
print("mgr['C'] = \n", mgr["C"])

This case is not supported, the user needs to initialize the array in 2 ways:

  • provide all samples/ all detectors (see case D)
  • provide all samples/1 detector (will be replicated to all detectors)

so the use case is:

# provide just 1 timeline, it will be copied to all detectors, we should support both 3D and 4D
mgr["C"] = np.ones((mgr.n_samples, 3, 4), dtype=np.int32)
# or
mgr["C"] = np.ones((1, mgr.n_samples, 3, 4), dtype=np.int32)
mgr["C"][1, 2:3, 0:2, 0:2] = 5
print("mgr['C'] = \n", mgr["C"])

inside mgr there should be an assert that checks that we have the right axis having a length of n_samples.

@tskisner
Copy link
Member Author

Ok, no problem that sounds good. Will work on implementing and addressing your other feedback.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

keskitalo and others added 30 commits April 23, 2025 16:28
* cannot format None values

* Make filterbin compatible with the SO split operator

* Must not delete the distributed pixel covariance

* Change the loop order to improve memory access pattern

* Use more optimal access pattern

* Allow flagging in our outside a view

* Add unit test for FlagIntervals and fix bugs it found

* Improve docstrings, add unit test for inverting views. Add warning for repeated inversions
* gcc15 refuses to compile extern C that uses templates

* Indexing errors
)

* Add unit test for IntervalList creation from close ended intervals

* Fix unit test

* Fix comment
* Update dipole defaults to PR4 and add a method for evaluating a map of the dipole

* Update copyright year
* Fix branching in the sidereal target

* Improve the crosslinking docstring
* Do not simulate azimuthal scanning when az range is zero

* Remove unnecessary verbosity

* Make sim_ground robust against zero length scans

* Support stare scans in the atmospheric simulation
Added check on max acceleration of sine elnod
* Add tool to plot schedules

* Add script to analyze schedules

* Add background image to hitmap projection
* Add support for HWP in SSS by using InterpolateHealpixMap

* SSS now requires stokes weights
* Fully functional implementation

* Use the helper function for string parsing
…s. (#835)

* This work fixes the spt3g code in light of recent changes to intervals.

- Add the new PyPI spt3g package to the CI test environment.
- Fix stale docstring.
- When redistributing intervals, use extra care with timestamps that fall
  fall on process boundaries.
- Fix off-by-one when exporting intervals, due to new exclusive last
  sample.

* Fix typo
* Add support for reading and writing WCS maps in HDF5 format

* Add support for per-detector WCS maps

* Add one more suffix

* Add support for getting WCS from FITS header

* Add unit test for HDF5 detector map scanning

* WIP

* Passing unit tests again

* Remove debug statements

* WIP

* Fix unit test and a corner case it automatic bounds detection

* Finally consistent

* Fix allgather

* Remove 'parallel' from the distributed WCS I/O method names and add 'serial' to the serial method names

* Make row/column positions more explicit

* Move I/O routines for the PixelData class. (#836)

* Move I/O routines for the PixelData class.

This creates `read()` and `write()` methods for the PixelData
class that can work with both WCS and Healpix, as well as
HDF5 and FITS formats.  These use the existence of the
PixelDistribution.wcs member to distinguish between WCS and
Healpix data, and they use the filename extension to determine
the format.

The serial I/O routines can now claim the shorter names
(read_wcs, write_healpix, etc) without conflicts.  The low-level
helper functions for gathering / broadcasting map chunks are
left in their current place outside the PixelData class, since
that class is becoming large.

* Address review comments:

- When creating a PixelDistribution with healpix pixelization, store
  the NEST / RING ordering similar to how we store the WCS information

- Break up `PixelData.write()` and `PixelData.read()` into subroutines
  for each pixelization type and file format.

- Remove `healpix_nest` argument to read / write throughout the code,
  since that is now tracked by the PixelDistribution.

---------

Co-authored-by: Theodore Kisner <[email protected]>
- Update RELEASE file

- Inside the cibuildwheel environment, restrict numpy to <2.3.  That
  version was just released and some of our dependencies seem to trigger
  an attempted local compilation of numpy, which is a bad idea.
* Add binned ground template to filterbin

* Use explicit indexing for very sparse templates

* Update copyright

* Clean up comments, relax definition of sparse template
* initial attempt at having demodulation return pol_only

* Add support for QU mode in stokes weights

* Stokes fix in demodulation qu only

* replace pol_only with mode to support intensity-only processing

* Update src/toast/ops/demodulation.py

Co-authored-by: Copilot <[email protected]>

* Add healpix support for scanning pure QU

* Tiny unrelated tweaks that don't merit a separate PR

---------

Co-authored-by: AERAdler <[email protected]>
Co-authored-by: Copilot <[email protected]>
* Store Az min/max metadata in the AzimuthIntervals operator

* Move azimuth range calculation into a separate operator (#844)

* Fix broadcast pattern of min / max values

---------

Co-authored-by: Reijo Keskitalo <[email protected]>
This situation may occur when demodulating data with cut
detectors or situations where having many processes is
useful for simulation operations that are independent of
the number of detectors.  Specific changes include:

- In distribute_discrete(), lift the restriction on having
  more groups than work units.

- In the DetectorData class, handle the case where there
  are no local detectors.

- Remove various warning / errors for this situation

- Add unit tests that test this "over distribution" of
  data with more processes than detectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants