Skip to content

Commit a6c07d1

Browse files
Establishing the initial openEO UDP conversion repository structure (README + CONTRIB) (#3)
* readme and contrib initial commit. * Add attribution section to NDCI cyanobacteria detection notebook
1 parent e3b2d5f commit a6c07d1

File tree

3 files changed

+1291
-3
lines changed

3 files changed

+1291
-3
lines changed

CONTRIBUTING.md

Lines changed: 312 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,312 @@
1+
# Contributing to openEO UDP Conversion
2+
3+
Thank you for your interest in contributing to the Sentinel Hub evalscript to openEO conversion project! This guide will help you understand our conversion process and standards so your contributions can be integrated smoothly.
4+
5+
## Understanding the Conversion Process
6+
7+
Converting a Sentinel Hub evalscript to an openEO User-Defined Process involves more than simple translation. You need to understand both what the algorithm does scientifically and how to express that logic using openEO's process graph structure.
8+
9+
### The Nature of Evalscripts
10+
11+
Evalscripts are JavaScript functions that process satellite imagery pixel-by-pixel or in small neighborhoods. They typically define a `setup()` function that declares which bands are needed, an `evaluatePixel()` function that performs calculations on those bands, and often helper functions for color mapping or complex logic. The scripts are designed for synchronous, real-time visualization where users expect immediate visual feedback.
12+
13+
### openEO's Process Graph Approach
14+
15+
openEO represents algorithms as directed acyclic graphs where each node is a standardized process with defined inputs and outputs. Instead of procedural code that executes line-by-line, you construct a graph describing the data flow and transformations. This abstraction enables the same algorithm to run on different backends, but requires thinking about the computation differently than imperative programming.
16+
17+
## Conversion Workflow
18+
19+
### Phase 1: Script Analysis and Understanding
20+
21+
Before writing any code, thoroughly analyze the original evalscript:
22+
23+
- **Read the script documentation** to understand:
24+
- Its purpose and the scientific principle it implements
25+
- Typical applications
26+
- Any known limitations
27+
- Example: For NDCI, understand it's designed to detect cyanobacteria blooms by estimating chlorophyll-a concentrations using specific spectral bands
28+
29+
- **Identify all required inputs** including:
30+
- Satellite bands
31+
- Metadata like acquisition angles or timestamps
32+
- User-configurable parameters such as thresholds or scaling factors
33+
34+
- **Map the computational structure** by tracing the data flow from input bands through calculations to final output. Note:
35+
- Conditional logic (if statements that might require the `if` process)
36+
- Mathematical operations (which map to openEO's mathematical processes)
37+
- Temporal or spatial aggregations (requiring processes like `aggregate_temporal` or `reduce_dimension`)
38+
- Color mapping schemes (which may need custom color blend implementations)
39+
40+
- **Determine the processing pattern** to understand whether:
41+
- The algorithm can be computed tile-by-tile for web map tiling (suitable for synchronous visualization)
42+
- It requires broader temporal or spatial context (necessitating the synchronous processing endpoint)
43+
44+
### Phase 2: Environment Setup and Data Loading
45+
46+
- **Create a new Jupyter notebook** in the appropriate category directory
47+
- Start with a clear title and introduction section
48+
- Explain what the algorithm does, why it's useful, and what the expected outputs look like
49+
50+
- **Import the necessary libraries** including:
51+
- The openEO Python client
52+
- Visualization tools
53+
- Any specialized processing libraries
54+
- Establish a connection to the openEO backend, handling authentication appropriately
55+
56+
- **Define your test area** by selecting a spatial extent where the algorithm should produce meaningful results:
57+
- Water quality algorithms (like NDCI): choose areas with known water bodies
58+
- Vegetation indices: select regions with diverse vegetation types
59+
- Fire detection: ideally use areas with documented fire events
60+
61+
- **Load your initial dataset** using `load_collection`:
62+
- Request all necessary bands
63+
- Ensure appropriate temporal coverage
64+
- Use Cloud-Optimized GeoTIFF sources when available for best performance
65+
66+
### Phase 3: Algorithm Implementation
67+
68+
Implement the algorithm incrementally, building up from simple components to the complete process:
69+
70+
- **Start with basic band extraction**:
71+
- Ensure you can access all required spectral bands correctly
72+
- Test that band values are in the expected range and format
73+
74+
- **Implement intermediate calculations separately** if the algorithm has multiple stages:
75+
- Example: For NDCI, implement and validate the Floating Algae Index (FAI) calculation before proceeding to the main chlorophyll index
76+
- This modular approach makes debugging easier and creates reusable components
77+
78+
- **Build the main algorithm** using appropriate openEO processes:
79+
- Mathematical operations translate straightforwardly (addition, subtraction, multiplication, division)
80+
- Spectral indices often use the `normalized_difference` process
81+
- Conditional logic uses the `if` process
82+
- Array operations use `array_apply`, `array_element`, and related processes
83+
84+
- **Implement visualizations carefully**, as this often requires the most adaptation from evalscripts:
85+
- Simple color mappings can use openEO's color blend capabilities
86+
- Complex classifications may require building custom color lookup logic using nested `if` processes
87+
- Document your visualization choices clearly since color scales significantly affect how users interpret results
88+
89+
### Phase 4: Testing and Validation
90+
91+
Validate your converted algorithm against the original evalscript:
92+
93+
- **Compare visual outputs**:
94+
- Run the same scene through both the original evalscript (in Sentinel Hub) and your openEO conversion
95+
- They should produce visually similar results, though minor differences in interpolation or color mapping are acceptable
96+
97+
- **Test with multiple scenes** representing different conditions:
98+
- Vegetation index: test on forests, grasslands, agricultural areas, and bare soil
99+
- Water algorithm: test on clear water, turbid water, and coastal areas
100+
- This reveals whether your conversion generalizes properly or only works for specific conditions
101+
102+
- **Verify edge cases** including:
103+
- Areas with no data
104+
- Cloud-affected pixels
105+
- Extreme index values
106+
- Scenes at the boundaries of your spatial or temporal extent
107+
- Note: The original evalscript may handle these gracefully through JavaScript's error handling, so ensure your openEO version doesn't fail on such inputs
108+
109+
- **Document any differences honestly**:
110+
- If your conversion produces slightly different results than the original, explain why
111+
- This might be due to different data sources, preprocessing differences, or necessary adaptations to the openEO framework
112+
113+
### Phase 5: Documentation and Export
114+
115+
Write comprehensive documentation within the notebook:
116+
117+
- **The introduction** should:
118+
- Explain the algorithm's purpose in terms accessible to users who aren't remote sensing experts
119+
- Provide enough technical detail for those who are
120+
- Describe typical use cases and what users can learn from the output
121+
122+
- **Document the scientific methodology** by explaining:
123+
- The spectral indices used
124+
- The physical principles behind them (e.g., why near-infrared light reflects strongly from healthy vegetation)
125+
- Any calibration or validation studies that support the approach
126+
- Appropriate citations to peer-reviewed literature
127+
128+
- **Describe the implementation choices** you made:
129+
- Particularly where the openEO version differs from the original evalscript
130+
- Why you chose certain processes over alternatives
131+
- How you adapted conditional logic
132+
- Any limitations users should be aware of
133+
134+
- **Provide clear attribution**:
135+
- Credit the original evalscript author
136+
- Link to the source script
137+
- Acknowledge any modifications you made during conversion
138+
- Thank relevant institutions or funding sources
139+
140+
- **Export the final process graph** (TBD for automation):
141+
- Capture the built graph in a variable that can be serialized to JSON
142+
- Include a cell that demonstrates this export and saves the JSON to an appropriate location
143+
144+
## Coding Standards
145+
146+
### Process Graph Construction
147+
148+
Build process graphs that are clear and maintainable:
149+
150+
- Use meaningful variable names that describe what each step does. Rather than generic names like `cube1` and `cube2`, use `masked_water` and `vegetation_index` so the logic flow is self-documenting.
151+
- Break complex operations into intermediate steps with their own variables. Don't chain dozens of operations into one massive expression. Each meaningful transformation should be a separate step that can be understood and debugged independently.
152+
- Add comments explaining non-obvious logic, particularly for complex conditional structures or mathematical operations that implement specific scientific formulas. The comment should explain the "why," not just the "what."
153+
- Include assertions or validation checks where appropriate to catch common errors early. For example, verify that spectral bands contain values in the expected range before performing index calculations.
154+
155+
### Visualization Implementation
156+
157+
Visualization code requires special attention:
158+
159+
- Document your color scales thoroughly by explaining what each color represents, the thresholds you chose and why, and providing a legend that users can reference when interpreting results.
160+
- Test color scales across the full range of possible output values, ensuring that extreme values render appropriately and that transitions between colors occur at meaningful thresholds rather than arbitrary boundaries.
161+
- Consider colorblind-friendly palettes when possible, avoiding color combinations that become indistinguishable for users with common forms of color vision deficiency.
162+
163+
### Performance Considerations
164+
165+
Write efficient process graphs:
166+
167+
- Minimize unnecessary operations by removing redundant calculations, combining operations where possible, and avoiding repeated data loading or processing of the same temporal slices.
168+
- Use appropriate data types, ensuring you're not requesting unnecessarily high precision or resolution for intermediate calculations. Final outputs should have appropriate precision, but intermediate steps can often use lower precision for better performance.
169+
- Leverage Cloud-Optimized formats by preferring COG or Zarr data sources over traditional formats when both are available, as this dramatically improves response times for synchronous processing.
170+
171+
## Testing Requirements
172+
173+
Every contribution must include appropriate testing:
174+
175+
### Visual Validation
176+
177+
Include in your notebook:
178+
179+
- Side-by-side comparisons of your openEO output with the original evalscript output, executed on the same scene. Show multiple examples representing different environmental conditions.
180+
- Clear labeling that helps users understand what they're looking at, including dates, locations, algorithm parameters, and any relevant metadata.
181+
- Discussion of any differences observed, explaining whether they're acceptable variations or potential issues that need investigation.
182+
183+
### Numerical Validation
184+
185+
Where possible, validate numerical accuracy:
186+
187+
- For algorithms that compute specific physical quantities (like chlorophyll-a concentration in NDCI), compare output values against expected ranges from the literature or reference measurements
188+
- Test mathematical operations produce correct results by validating intermediate steps against manual calculations for sample pixels
189+
- Verify that statistical aggregations (means, medians, quantiles) produce sensible results when compared to summary statistics from the original evalscript
190+
191+
### Edge Case Testing
192+
193+
Document how your conversion handles challenging situations:
194+
195+
- Missing data (no data values, masked pixels)
196+
- Invalid values (negative indices where only positive values are physically meaningful)
197+
- Extreme values (saturated pixels, very dark shadows)
198+
- Temporal gaps in data availability
199+
- Areas at the edge of satellite swath coverage
200+
201+
## Submission Process
202+
203+
When you're ready to contribute your conversion:
204+
205+
### Preparing Your Contribution
206+
207+
- **Ensure your notebook runs completely** from top to bottom without errors in a fresh Python environment
208+
- Clear all output, then execute all cells to verify reproducibility
209+
210+
- **Verify that all required files are included**:
211+
- Exported JSON process graphs
212+
- Example output images for documentation
213+
- Any supporting data files needed for testing
214+
215+
- **Check that your contribution follows the repository structure**:
216+
- Place files in appropriate directories
217+
- Follow naming conventions consistent with existing notebooks
218+
219+
### Creating a Pull Request
220+
221+
- **Fork the repository and create a new branch** for your contribution
222+
- Use a descriptive name like `convert-ndwi-script` or `add-burned-area-detection`
223+
224+
- **Commit your changes** with clear, descriptive commit messages
225+
- Explain what you did and why
226+
- Each commit should represent a logical unit of work
227+
228+
- **Open a pull request** with a comprehensive description including:
229+
- The original evalscript you converted and its author
230+
- What the algorithm does and what it's used for
231+
- Any challenges you encountered during conversion
232+
- Notable differences from the original, if any
233+
- Example outputs demonstrating successful conversion
234+
- Links to relevant documentation or scientific papers
235+
236+
### Review Process
237+
238+
- **A maintainer will review your contribution**, checking that it:
239+
- Follows our standards
240+
- Produces valid results
241+
- Includes appropriate documentation
242+
- Integrates well with existing conversions
243+
244+
- **You may receive feedback** requesting changes such as:
245+
- Improving documentation
246+
- Adjusting implementation details
247+
- Adding additional test cases
248+
- Clarifying attribution
249+
250+
- **Once approved**, your contribution will be merged and made available to the community
251+
- You'll be credited in the repository contributors list and in the specific notebook you created
252+
253+
## Getting Help
254+
255+
If you encounter difficulties during conversion:
256+
257+
- **Check existing notebooks** for similar algorithms that might provide patterns you can follow
258+
- The NDCI cyanobacteria notebook is a comprehensive example covering many common patterns
259+
260+
- **Consult the openEO processes documentation** to understand what processes are available and how to use them correctly
261+
262+
- **Ask questions** in the project Slack channel or GitHub discussions
263+
- The community is helpful and happy to assist with technical challenges, conceptual questions about openEO, or advice on conversion strategies
264+
265+
- **Open a draft pull request early** if you want feedback before completing your conversion
266+
- Maintainers can provide guidance while you're still working
267+
268+
## Advanced Topics
269+
270+
### Creating New openEO Processes
271+
272+
Sometimes an evalscript uses capabilities that don't exist in standard openEO processes. When this happens:
273+
274+
- **First check if you can achieve the same result** by combining existing processes in creative ways
275+
- Many seemingly complex operations can be constructed from simpler building blocks
276+
277+
- **If a new process is truly needed**:
278+
- Document the requirement in the notebook
279+
- Show why existing processes are insufficient
280+
- Explain what the new process would need to do
281+
282+
- **Follow the process proposal pipeline** for openEO:
283+
- Create a formal process specification
284+
- Provide a reference implementation
285+
- Submit it to the openEO community for review
286+
- Your work on this project could contribute to expanding openEO's capabilities
287+
288+
### Multi-Sensor Fusion
289+
290+
Some evalscripts combine data from multiple satellites (such as overlaying Sentinel-1 SAR data on Sentinel-2 optical imagery). These conversions require:
291+
292+
- Understanding the `merge_cubes` process and its requirements for spatial and temporal alignment
293+
- Handling different spatial resolutions appropriately, either by resampling to a common grid or documenting requirements for pre-aligned data
294+
- Managing temporal mismatches when satellites don't acquire data simultaneously, which may require temporal aggregation or nearest-neighbor selection
295+
296+
### Temporal Processing
297+
298+
Evalscripts that work with time series or temporal composites require different approaches:
299+
300+
- The `aggregate_temporal` process handles temporal averaging, median composites, or other statistical aggregations across time
301+
- The `reduce_dimension` process along the temporal axis can implement custom temporal logic that doesn't fit standard aggregation functions
302+
- Multi-temporal change detection typically requires loading multiple temporal slices, aligning them carefully, and computing differences or ratios between time periods
303+
304+
## Conclusion
305+
306+
Converting evalscripts to openEO UDPs makes valuable algorithms accessible across the entire openEO ecosystem while maintaining their utility for real-time exploration and analysis. Your contributions help build a library of interoperable Earth observation processes that benefit researchers, operational users, and educators worldwide.
307+
308+
Thank you for contributing to this important work. Your efforts make Earth observation more open, accessible, and interoperable for everyone.
309+
310+
---
311+
312+
For questions about these guidelines or the contribution process, please contact the project maintainers through GitHub issues.

0 commit comments

Comments
 (0)