Order of aggregation with skipna matters

### What is your issue?

This is not a bug report, rather a pitfall that should maybe be documented.

I noticed that the order of aggregations matters if nans are present, `skipna=True` (default), and the aggregation is done in separate calls. This is only a problem for aggregations that scale with N, e.g., `mean`, but not `sum`.

Example:
```python
da = xr.DataArray(np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, np.nan, 9]
]), dims=["height", "lat"])
```
```python
da.mean(["lat", "height"]) -> 4.625 (correct)
da.mean(["height", "lat"]) -> 4.625 (correct)
da.mean("lat").mean("height") -> 5.0
da.mean("height").mean("lat") -> 4.5
```
The same is the case when taking `nanmean`s with numpy, so this is not an xarray-only issue. The reason is that all data in the second operation have equal weights, even though they do not represent the same number of data points in the first operation (some rows/columns have 2, other 3 data points).

Xarray seems to be behaving correctly, and there may be no way around it without carrying weights across operations. However, I was still surprised by this behavior, so it might be worth documenting a warning since it is not uncommon that users perform aggregations in multiple steps, and `skipna` is `True` by default. The differences are largest when averaging over dimensions along which the number of nans varies a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Order of aggregation with skipna matters #10759

What is your issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Order of aggregation with skipna matters #10759

Description

What is your issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions