Skip to content

Conversation

@keewis
Copy link
Member

@keewis keewis commented Sep 17, 2025

This adds as_convention to the accessor, which allows converting to different conventions. For this to be really useful, we need decode to support reading from those conventions (and possibly auto-detect the convention).

@keewis keewis marked this pull request as draft September 17, 2025 10:11
@keewis
Copy link
Member Author

keewis commented Dec 18, 2025

@benbovy, I've implemented the changes we discussed yesterday (I'll add documentation / type hints to the Convention base class later).

However, now that we normalize to the "xdggs" convention, we can probably change Convention.decode to return a grid_info object and the cell_ids variable, and have decode (the accessor method / top-level function) create the index. The only problem with that is that then it is xdggs that needs to know how to create the underlying index for compressed coordinates.

Otherwise we could have backends call DGGSIndex.from_variables directly.

So this clearly needs some more thought to create a clean API, especially if we want people to create their own convention implementations.

@keewis
Copy link
Member Author

keewis commented Dec 19, 2025

actually, there might be an argument for the conventions to work on the objects themselves (encode already does that): cf may add grid_mapping to all variables that have the spatial dimension, which we'd have to remove that after decoding because that variable does not exist anymore. So that would mean we'd get a signature of:

class Convention:
    def decode(
        self,
        ds: xr.Dataset,
        *,
        grid_info: dict | DGGSInfo | None = None,
        name: Hashable | None = None,
        index_options: dict | None = None
    ) -> xr.Dataset:
        """
        Decode the dataset according to the convention.

        Parameters
        ----------
        ds : xr.Dataset
            The encoded dataset.
        grid_info : mapping or DGGSInfo, optional
             Overrides for the grid metadata.
        name : str, optional
            The name of the cell ids coordinate.
        index_options : mapping of str to Any, optional
            Additional options for the index.

        Returns
        -------
        decoded : xr.Dataset
            The decoded dataset with a DGGSIndex.
        """
        pass

    def encode(self, ds: xr.Dataset, *, encoding: dict | None = None) -> xr.Dataset:
        """
        Encode according to the convention.

        Parameters
        ----------
        ds : xr.Dataset
            The dataset to encode. Must have a DGGSIndex.
        encoding : mapping of str to Any, optional
            Additional options for the convention.

        Returns
        -------
        encoded : xr.Dataset
            The encoded dataset. Does not have a DGGSIndex anymore.
        """
        pass

(the convention shouldn't have to deal with the differences between DataArray and Dataset)

In theory, the grid_info and name kwargs could be packed into a encoding kwarg to make the method signatures symmetrical.

@keewis keewis marked this pull request as ready for review December 19, 2025 15:34
(this is more complicated because `coordinates` is a stringified list)
@keewis
Copy link
Member Author

keewis commented Dec 19, 2025

with this I'm pretty happy with how the convention support looks. The only weakness I see is in how the coordinate name is duplicated between the underlying index and the DGGSIndex. However, I think this PR is already big enough, so would punt resolving that to a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant