Skip to content

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Dec 8, 2025

  • eagerly compute multiscales
  • directly copy chunk bytes and metadata documents

@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 92.85714% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/eopf_geozarr/s2_optimization/s2_multiscale.py 88.88% 7 Missing ⚠️
src/eopf_geozarr/zarrio.py 93.54% 6 Missing ⚠️
src/eopf_geozarr/cli.py 0.00% 4 Missing ⚠️
src/eopf_geozarr/s2_optimization/s2_converter.py 97.64% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment on lines +1162 to +1164
s2_parser.add_argument(
"--omit-nodes", help="The names of groups or arrays to skip.", default="", type=str
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this argument solves #81. You would pass --omit-nodes "quality/l2a_quicklook" to omit that group

cc @emmanuelmathot

@d-v-b d-v-b marked this pull request as ready for review December 15, 2025 15:11
@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 15, 2025

@emmanuelmathot this is ready for review.

at a high level, the conversion process in this PR is redesigned to use a more functional and explicit pattern with less reliance on xarray APIs, which are not very transparent about how data is being moved.

Architecture

I created a new module that contains utilities specific to Zarr IO operations. That module contains functions that all work toward the goal of re-encoding Zarr v2 groups into zarr v3.

The main routine is reencode_group, which iterates over all the sub-arrays and sub-groups inside the group to re-encode. reencode_group takes a array_reencoder parameter, which is a function that takes an array's path (like "measurements/reflectance/r10m/b02") and that array's metadata document, and returns a new array metadata document. Complex mission-specific logic can be packed inside the array reencoder function, which is how we can keep reencode_group mission-agnostic.

because we are not relying on xarray for the basic copy procedure, we have to do more work on the encoding / attributes side, which is reflected in the array reencoder used for s2 conversion

Performance

Memory usage is improved on this branch, with peak memory down to ~4.5 GB from ~11 GB. Downsampling only adds a few GB of peak memory, which isn't too surprising.

Testing

I added quite a few tests but we need to see how the new output composes with the consuming code. @emmanuelmathot if you could try this branch out and check the output I would greatly appreciate it.

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 17, 2025

@emmanuelmathot the redundant multiscales calculation is now fixed, and the chunk sizes / sharding are now consistent with the design goal (use as few objects as possible).

On my local system, using dask for rechunking was much slower than what I am currently doing (plain assignment via zarr python).

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 17, 2025

since we are re-encoding the zarr groups here, I can also handle the NaN conversion in this branch, unless that's better in a separate branch @emmanuelmathot

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 17, 2025

with a1375b7 we have an option (defaulting to false) of allowing invalid values (nan and inf) in the output. When set to false (the default), any NaN or inf or -inf values in attributes field are replaced with string equivalents.

@emmanuelmathot
Copy link
Contributor

Just tested the last version of this PR: https://api.explorer.eopf.copernicus.eu/raster/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807/viewer
but multiscales are still missing at /measurements/reflectance group

@d-v-b
Copy link
Contributor Author

d-v-b commented Jan 5, 2026

i'll have a look later today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants