Add country field to EDLA schema #98

tmikula-dev · 2025-12-09T15:16:05Z

Summary

This pull request adds support for a new country field across the data ingestion pipeline. The main goal is to ensure that country information is properly captured, stored, and validated for both event and job records.

Release Notes:

Add country field to EDLA schema

Closes #79

Summary by CodeRabbit

New Features
- Added optional country field to data change records and job runs to track geographic location of data sources.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-09T15:16:27Z

Walkthrough

This PR adds an optional country field to EDLA and run event schemas in JSON configuration files, implements corresponding PostgreSQL INSERT column additions in the writer module, and updates test assertions to validate the new parameter positions.

Changes

Cohort / File(s)	Summary
Schema Updates `conf/topic_dlchange.json`, `conf/topic_runs.json`	Added optional `country` string property: in `topic_dlchange.json` root properties (after `timestamp_event`); in `topic_runs.json` under `jobs.items.properties` (after `catalog_id`). Field not added to required arrays.
PostgreSQL Writer `src/writers/writer_postgres.py`	Added `country` column to INSERT statements for both EDLA writes (using `message.get("country", "")`) and run job writes (using `job.get("country", "")`). Parameter positions adjusted accordingly while maintaining SQL alignment.
Test Updates `tests/writers/test_writer_postgres.py`	Updated test assertions to reflect new parameter indices for both EDLA and run write paths. Verified country field presence and empty-string defaults; adjusted subsequent parameter position validations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Changes follow consistent, repetitive patterns across files (adding optional field with empty-string default)
Schema structure and positioning logically placed relative to existing fields
SQL syntax and parameter alignment require verification
Test assertions accurately mirror implementation changes

Possibly related issues

Add country field to run and edla schemas #79: Parent/reference issue for country field schema definitions; work here directly implements the schema changes originally proposed in that issue.

Poem

🐰 A nimble rabbit hops with glee,
Adding countries, one, two, three!
From schemas sweet to SQL so neat,
Geographic data now complete,
Defaults dance as empty strings—
Adventure worldwide, the query brings! 🌍

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Add country field to EDLA schema' directly aligns with the main objective to add a country field to schemas, but incompletely describes the full scope which also includes the run schema.
Linked Issues check	✅ Passed	The PR implementation fully satisfies issue #96 requirements: country fields added to both schemas as optional, PostgreSQL writer updated with backward-compatible empty string defaults, and tests updated with correct parameter positions.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue #96: schema updates, writer function modifications, and test parameter adjustments. No extraneous changes detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/79-add-country-field-and-edla

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-09T15:16:32Z

Trivy has completed a full security repository scan ✅ You can find the analysis results for this PR branch on this overview.
Below is the summary of the findings:

TRIVY	HIGH	MEDIUM	LOW	TOTAL
vulnerability	0	0	0	0
secret	6	0	0	6
misconfiguration	0	1	10	11
license	0	0	0	0
➡️ Total	6	1	10	17

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/writers/writer_postgres.py (1)

187-221: Align jobs INSERT column order with spec: place country after catalog_id.

In the jobs INSERT, country currently appears before catalog_id:

event_id,
country,
catalog_id,
...

The PR objectives require inserting country after catalog_id. While named columns keep the query functionally correct, it’s better to match the requested order and what readers will expect.

You can adjust the column and value order like this:

-        INSERT INTO {table_jobs}
-        (
-                event_id,
-                country,
-                catalog_id,
-                status,
-                timestamp_start,
-                timestamp_end,
-                message,
-                additional_info
-        )
+        INSERT INTO {table_jobs}
+        (
+                event_id,
+                catalog_id,
+                country,
+                status,
+                timestamp_start,
+                timestamp_end,
+                message,
+                additional_info
+        )
         VALUES
         (
             %s,
             %s,
             %s,
             %s,
             %s,
             %s,
             %s,
             %s
         )""",
         (
-                message["event_id"],
-                job.get("country", ""),
-                job["catalog_id"],
+                message["event_id"],
+                job["catalog_id"],
+                job.get("country", ""),
                 job["status"],
                 job["timestamp_start"],
                 job["timestamp_end"],
                 job.get("message"),
                 (json.dumps(job.get("additional_info")) if "additional_info" in job else None),
         ),

This keeps the default behavior (job.get("country", "")) while aligning the position with the spec.

tests/writers/test_writer_postgres.py (1)

92-135: Update runs test indices to match country being after catalog_id in jobs INSERT.

Once postgres_run_write is updated to place country after catalog_id, the job parameter indices in this test should reflect:

index 1 → catalog_id

index 2 → country

indices 6 and 7 remain message and additional_info.

Concretely:
-    # Check first job
-    _job1_sql, job1_params = cur.executions[1]
-    assert job1_params[1] == ""
-    assert job1_params[2] == "c1"
+    # Check first job
+    _job1_sql, job1_params = cur.executions[1]
+    assert job1_params[1] == "c1"
+    assert job1_params[2] == ""
@@
-    # Check second job
-    _job2_sql, job2_params = cur.executions[2]
-    assert job2_params[1] == "bw"
-    assert job2_params[2] == "c2"
+    # Check second job
+    _job2_sql, job2_params = cur.executions[2]
+    assert job2_params[1] == "c2"
+    assert job2_params[2] == "bw"
     assert job2_params[6] == "err"
     assert json.loads(job2_params[7]) == {"k": "v"}
This keeps the existing coverage while bringing the tests in line with the intended column order.

🧹 Nitpick comments (1)

conf/topic_runs.json (1)

42-45: Move country below catalog_id to match the stated schema ordering.

The issue description asks for country to be positioned after catalog_id in jobs.items.properties. Here it appears before catalog_id. While JSON property order is not semantic, aligning the order with the spec and DB inserts will reduce confusion.

You could reorder as:

-                "properties": {
-                    "country": {
-                      "type": "string",
-                      "description": "The country the data is related to."
-                    },
-                    "catalog_id": {
+                "properties": {
+                    "catalog_id": {
                         "type": "string",
                         "description": "Identifier for the data definition (Glue/Hive) database and table name for  example"
+                    },
+                    "country": {
+                      "type": "string",
+                      "description": "The country the data is related to."
                     },

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0922f and 481ae29.

📒 Files selected for processing (4)

conf/topic_dlchange.json (1 hunks)
conf/topic_runs.json (1 hunks)
src/writers/writer_postgres.py (5 hunks)
tests/writers/test_writer_postgres.py (5 hunks)

🔇 Additional comments (3)

conf/topic_dlchange.json (1)

28-31: EDLA country schema addition looks correct and aligned with the spec.

country is optional, typed as string, and positioned immediately after timestamp_event without being added to required, matching the stated objectives.

src/writers/writer_postgres.py (1)

90-137: EDLA writer’s country column and default handling look correct.

country is inserted immediately after timestamp_event and uses message.get("country", "") so missing values fall back to an empty string as required. The parameter ordering in the VALUES tuple matches the column list and the updated tests.

tests/writers/test_writer_postgres.py (1)

41-90: EDLA tests correctly cover both presence and defaulting of country.

The tests assert the new parameter count, verify country at index 6 when present, and confirm the empty-string default and index shifts when optional fields are omitted. This aligns tightly with the EDLA writer behavior and objectives.

tmikula-dev added 6 commits December 8, 2025 14:25

Adding HandlerToken class as a part of refactoring process.

e77b9a4

black formatter tool fix

c939254

CodeRabbit review comments implementation.

71d1bee

CodeRabbit nitpicks.

9051785

Merge branch 'master' into feature/79-add-country-field-and-edla

37c73cd

Field country added to the EDLA schema.

481ae29

tmikula-dev self-assigned this Dec 9, 2025

tmikula-dev requested a review from oto-macenauer-absa as a code owner December 9, 2025 15:16

tmikula-dev added the enhancement New feature or request label Dec 9, 2025

tmikula-dev requested review from lsulak and petr-pokorny-absa as code owners December 9, 2025 15:16

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

oto-macenauer-absa approved these changes Dec 12, 2025

View reviewed changes

tmikula-dev merged commit be28362 into master Dec 12, 2025
16 checks passed

tmikula-dev deleted the feature/79-add-country-field-and-edla branch December 12, 2025 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add country field to EDLA schema #98

Add country field to EDLA schema #98

Uh oh!

tmikula-dev commented Dec 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add country field to EDLA schema #98

Add country field to EDLA schema #98

Uh oh!

Conversation

tmikula-dev commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Release Notes:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tmikula-dev commented Dec 9, 2025 •

edited

Loading

coderabbitai bot commented Dec 9, 2025 •

edited

Loading