perf/direct zarr io #92

d-v-b · 2025-12-08T13:18:25Z

eagerly compute multiscales
directly copy chunk bytes and metadata documents

…to perf/direct-zarr-io

codecov-commenter · 2025-12-11T12:22:27Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 92.85714% with 19 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/eopf_geozarr/s2_optimization/s2_multiscale.py	88.88%	7 Missing ⚠️
src/eopf_geozarr/zarrio.py	93.54%	6 Missing ⚠️
src/eopf_geozarr/cli.py	0.00%	4 Missing ⚠️
src/eopf_geozarr/s2_optimization/s2_converter.py	97.64%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

…to perf/direct-zarr-io

d-v-b · 2025-12-11T19:08:12Z

src/eopf_geozarr/cli.py

+    s2_parser.add_argument(
+        "--omit-nodes", help="The names of groups or arrays to skip.", default="", type=str
+    )


this argument solves #81. You would pass --omit-nodes "quality/l2a_quicklook" to omit that group

cc @emmanuelmathot

…to perf/direct-zarr-io

…odel into perf/direct-zarr-io

…harded

d-v-b · 2025-12-15T19:46:25Z

@emmanuelmathot this is ready for review.

at a high level, the conversion process in this PR is redesigned to use a more functional and explicit pattern with less reliance on xarray APIs, which are not very transparent about how data is being moved.

Architecture

I created a new module that contains utilities specific to Zarr IO operations. That module contains functions that all work toward the goal of re-encoding Zarr v2 groups into zarr v3.

The main routine is reencode_group, which iterates over all the sub-arrays and sub-groups inside the group to re-encode. reencode_group takes a array_reencoder parameter, which is a function that takes an array's path (like "measurements/reflectance/r10m/b02") and that array's metadata document, and returns a new array metadata document. Complex mission-specific logic can be packed inside the array reencoder function, which is how we can keep reencode_group mission-agnostic.

because we are not relying on xarray for the basic copy procedure, we have to do more work on the encoding / attributes side, which is reflected in the array reencoder used for s2 conversion

Performance

Memory usage is improved on this branch, with peak memory down to ~4.5 GB from ~11 GB. Downsampling only adds a few GB of peak memory, which isn't too surprising.

Testing

I added quite a few tests but we need to see how the new output composes with the consuming code. @emmanuelmathot if you could try this branch out and check the output I would greatly appreciate it.

d-v-b · 2025-12-17T14:36:54Z

@emmanuelmathot the redundant multiscales calculation is now fixed, and the chunk sizes / sharding are now consistent with the design goal (use as few objects as possible).

On my local system, using dask for rechunking was much slower than what I am currently doing (plain assignment via zarr python).

d-v-b · 2025-12-17T14:50:14Z

since we are re-encoding the zarr groups here, I can also handle the NaN conversion in this branch, unless that's better in a separate branch @emmanuelmathot

d-v-b · 2025-12-17T15:44:43Z

with a1375b7 we have an option (defaulting to false) of allowing invalid values (nan and inf) in the output. When set to false (the default), any NaN or inf or -inf values in attributes field are replaced with string equivalents.

…to perf/direct-zarr-io

…odel into perf/direct-zarr-io

emmanuelmathot · 2026-01-05T08:21:05Z

Just tested the last version of this PR: https://api.explorer.eopf.copernicus.eu/raster/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807/viewer
but multiscales are still missing at /measurements/reflectance group

d-v-b · 2026-01-05T08:26:02Z

i'll have a look later today!

d-v-b added 7 commits December 5, 2025 14:10

eagerly compute multiscales

4cfc279

directly copy chunk bytes and metadata documents

73c8b27

Merge branch 'main' of https://github.com/EOPF-Explorer/data-model in…

f9e6823

…to perf/direct-zarr-io

lint

f6eec7b

untrack that which should not be tracked

b21e936

add spatial_ref after group re-encoding

5a6443c

simplify tests

fe294b3

This was linked to issues Dec 9, 2025

Skip quicklook group in conversion #81

Open

geozarr generated by V1 converter (convert_s2_optimized) doesn't meet all cf compliance requirements from geozarr minispec #86

Open

do we need to re-encode array chunks #91

Open

d-v-b added 10 commits December 9, 2025 12:22

fill value 0.0 -> nan in example JSON documents

2d36d78

update optimized geozarr example json

a4a3743

forward propagate attrs

269968d

update tests

60036c0

update test JSON models to have correct string fill value

896e275

simplify crs handling

02985dc

add module docstring

b184fe8

remove typo

a6dc580

tweak pydantic zarr usage in tests

25c1a52

simplify tests

d4852d6

d-v-b added 8 commits December 11, 2025 14:32

zarrio tests

f5dc9ae

fixes in zarrio

4ba6eb1

Merge branch 'main' of https://github.com/EOPF-Explorer/data-model in…

b257098

…to perf/direct-zarr-io

silence warnings

28c9282

silence warnings

1414497

silence warnings

d325f6d

treat warnings as errors in tests

62c56d8

add omit-nodes parameter to reencode-group

aeaaeed

d-v-b commented Dec 11, 2025

View reviewed changes

d-v-b added 6 commits December 15, 2025 15:26

ignore rio xarray warning

f3849bb

Merge branch 'main' of https://github.com/eopf-explorer/data-model in…

f51541e

…to perf/direct-zarr-io

update expected JSON outputs

373bfbe

filter more warnings

59cd14a

lint

fed6121

Merge branch 'perf/direct-zarr-io' of https://github.com/d-v-b/data-m…

5117005

…odel into perf/direct-zarr-io

d-v-b marked this pull request as ready for review December 15, 2025 15:11

d-v-b added 3 commits December 15, 2025 19:49

fix bugs in chunks / sharding, and ensure that small arrays are not s…

e995a98

…harded

update JSON examples

efdeb6b

remove debug statement

422a9cd

remove caching store

186024c

emmanuelmathot mentioned this pull request Dec 16, 2025

Add Spatial Zarr Convention models and metadata #100

Merged

d-v-b added 5 commits December 16, 2025 13:34

don't downsample existing data vars

9f1a131

add tests for multiscale skipping

f9b4845

improve automatic chunking and add tests

a8a9ee2

update JSON examples

00659f7

update test to check for auto_chunks output

f534648

d-v-b added 2 commits December 17, 2025 16:23

add option to replace invalid JSON floats (NaN and infs) with strings

4252194

thread allow_json_nan kwarg to cli

a1375b7

d-v-b added 3 commits December 17, 2025 16:52

zarrio tests

9cab718

Merge branch 'main' of https://github.com/EOPF-Explorer/data-model in…

8c3052f

…to perf/direct-zarr-io

Merge branch 'perf/direct-zarr-io' of https://github.com/d-v-b/data-m…

db7fad6

…odel into perf/direct-zarr-io

d-v-b added 2 commits January 7, 2026 17:24

add multiscale metadata to output

aab0873

write out multiscale metadata

1d09dc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf/direct zarr io #92

perf/direct zarr io #92

Uh oh!

d-v-b commented Dec 8, 2025

Uh oh!

codecov-commenter commented Dec 11, 2025 •

edited

Loading

Uh oh!

d-v-b Dec 11, 2025

Uh oh!

d-v-b commented Dec 15, 2025 •

edited

Loading

Uh oh!

d-v-b commented Dec 17, 2025

Uh oh!

d-v-b commented Dec 17, 2025

Uh oh!

d-v-b commented Dec 17, 2025

Uh oh!

emmanuelmathot commented Jan 5, 2026

Uh oh!

d-v-b commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf/direct zarr io #92

Are you sure you want to change the base?

perf/direct zarr io #92

Uh oh!

Conversation

d-v-b commented Dec 8, 2025

Uh oh!

codecov-commenter commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

d-v-b Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture

Performance

Testing

Uh oh!

d-v-b commented Dec 17, 2025

Uh oh!

d-v-b commented Dec 17, 2025

Uh oh!

d-v-b commented Dec 17, 2025

Uh oh!

emmanuelmathot commented Jan 5, 2026

Uh oh!

d-v-b commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Dec 11, 2025 •

edited

Loading

d-v-b commented Dec 15, 2025 •

edited

Loading