Skip to content

resampling half hourly data to hourly data takes a long time #8035

@billcohenhydro

Description

@billcohenhydro

What happened?

I have an xarray DataArray with half hourly data, spanning several decades (100 years in the reproducible example). I am attempting to resample this to hourly data and take the mean. This process takes a considerably long time: many minutes, and I have killed the process several times (> 10 minutes) when other dimensions have a high order (no other dimensions are present in the example given below).

Repeating this process using Pandas is very quick - Pandas does not have the same issue.

NOTE: using xarray, observed that as the resampling frequency decreases, the runtime increases;
-monthly is fast
-weekly is fast but takes about twice as long as weekly
-daily is slow, but will finish
-hourly is very slow

What did you expect to happen?

I expected that the resample would complete very quickly: most functions in xarray have very high performance

Minimal Complete Verifiable Example

# replicate issue: xarray resampling from half hourly data to hourly takes a long time
# %% import libraries
import numpy as np
import pandas as pd
import xarray as xr

# %% print package versions
print("numpy version", np.__version__)
print("pandas version", pd.__version__)
print("xarray version", xr.__version__)

# %% create half hourly data
time = pd.date_range("2000-01-01", "2100-1-1", freq="30T")
n = len(time)
data = np.random.uniform(size=n)
xarray_array = xr.DataArray(data=data, dims=["time"], coords=dict(time=time))
pandas_series = xarray_array.to_series()

# %% time - xarray: takes a long time ~1 min 40 sec
%%timeit 
xarray_hourly = xarray_array.resample(time="H").mean()


# %% time - pandas: fast, ~ 100ms
%%timeit
pandas_hourly = pandas_series.resample("H").mean()


# %% addtional note: for reference, resampling xarray monthly is fine
%%timeit
xarray_monthly = xarray_array.resample(time="M").mean()

# %% addtional note: for reference, resampling xarray weekly takes a bit longer
%%timeit
xarray_weekly = xarray_array.resample(time="W").mean()

# %% addtional note: for reference, resampling xarray daily also takes a long time, ~4 sec
%%timeit
xarray_daily = xarray_array.resample(time="D").mean()

# %%

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

# version
python version 3.11.3
numpy version 1.24.3
pandas version 2.0.1
xarray version 2023.7.0

# xarray resample hourly:
1min 41s ± 2.69 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

# pandas resample hourly:
104 ms ± 3.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Anything else we need to know?

No response

Environment

Details

INSTALLED VERSIONS

commit: None
python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:50:54) [MSC v.1934 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 0, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Australia', '1252')
libhdf5: 1.12.2
libnetcdf: 4.9.1

xarray: 2023.7.0
pandas: 2.0.1
numpy: 1.24.3
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.4.1
distributed: 2023.4.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.3.1
mypy: None
IPython: 8.13.2
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions