Skip to content

Conversation

@tmikula-dev
Copy link
Collaborator

@tmikula-dev tmikula-dev commented Dec 9, 2025

Summary

This pull request adds support for a new country field across the data ingestion pipeline. The main goal is to ensure that country information is properly captured, stored, and validated for both event and job records.

Release Notes:

  • Add country field to EDLA schema

Closes #79

Summary by CodeRabbit

  • New Features
    • Added optional country field to data change records and job runs to track geographic location of data sources.

✏️ Tip: You can customize this high-level summary in your review settings.

@tmikula-dev tmikula-dev self-assigned this Dec 9, 2025
@tmikula-dev tmikula-dev added the enhancement New feature or request label Dec 9, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Walkthrough

This PR adds an optional country field to EDLA and run event schemas in JSON configuration files, implements corresponding PostgreSQL INSERT column additions in the writer module, and updates test assertions to validate the new parameter positions.

Changes

Cohort / File(s) Summary
Schema Updates
conf/topic_dlchange.json, conf/topic_runs.json
Added optional country string property: in topic_dlchange.json root properties (after timestamp_event); in topic_runs.json under jobs.items.properties (after catalog_id). Field not added to required arrays.
PostgreSQL Writer
src/writers/writer_postgres.py
Added country column to INSERT statements for both EDLA writes (using message.get("country", "")) and run job writes (using job.get("country", "")). Parameter positions adjusted accordingly while maintaining SQL alignment.
Test Updates
tests/writers/test_writer_postgres.py
Updated test assertions to reflect new parameter indices for both EDLA and run write paths. Verified country field presence and empty-string defaults; adjusted subsequent parameter position validations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Changes follow consistent, repetitive patterns across files (adding optional field with empty-string default)
  • Schema structure and positioning logically placed relative to existing fields
  • SQL syntax and parameter alignment require verification
  • Test assertions accurately mirror implementation changes

Possibly related issues

Poem

🐰 A nimble rabbit hops with glee,
Adding countries, one, two, three!
From schemas sweet to SQL so neat,
Geographic data now complete,
Defaults dance as empty strings—
Adventure worldwide, the query brings! 🌍

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Add country field to EDLA schema' directly aligns with the main objective to add a country field to schemas, but incompletely describes the full scope which also includes the run schema.
Linked Issues check ✅ Passed The PR implementation fully satisfies issue #96 requirements: country fields added to both schemas as optional, PostgreSQL writer updated with backward-compatible empty string defaults, and tests updated with correct parameter positions.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #96: schema updates, writer function modifications, and test parameter adjustments. No extraneous changes detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/79-add-country-field-and-edla

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

Trivy has completed a full security repository scan ✅ You can find the analysis results for this PR branch on this overview.
Below is the summary of the findings:

TRIVY CRITICAL HIGH MEDIUM LOW TOTAL
vulnerability 0 0 0 0 0
secret 0 6 0 0 6
misconfiguration 0 0 1 10 11
license 0 0 0 0 0
➡️ Total 0 6 1 10 17

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/writers/writer_postgres.py (1)

187-221: Align jobs INSERT column order with spec: place country after catalog_id.

In the jobs INSERT, country currently appears before catalog_id:

event_id,
country,
catalog_id,
...

The PR objectives require inserting country after catalog_id. While named columns keep the query functionally correct, it’s better to match the requested order and what readers will expect.

You can adjust the column and value order like this:

-        INSERT INTO {table_jobs}
-        (
-                event_id,
-                country,
-                catalog_id,
-                status,
-                timestamp_start,
-                timestamp_end,
-                message,
-                additional_info
-        )
+        INSERT INTO {table_jobs}
+        (
+                event_id,
+                catalog_id,
+                country,
+                status,
+                timestamp_start,
+                timestamp_end,
+                message,
+                additional_info
+        )
         VALUES
         (
             %s,
             %s,
             %s,
             %s,
             %s,
             %s,
             %s,
             %s
         )""",
         (
-                message["event_id"],
-                job.get("country", ""),
-                job["catalog_id"],
+                message["event_id"],
+                job["catalog_id"],
+                job.get("country", ""),
                 job["status"],
                 job["timestamp_start"],
                 job["timestamp_end"],
                 job.get("message"),
                 (json.dumps(job.get("additional_info")) if "additional_info" in job else None),
         ),

This keeps the default behavior (job.get("country", "")) while aligning the position with the spec.

tests/writers/test_writer_postgres.py (1)

92-135: Update runs test indices to match country being after catalog_id in jobs INSERT.

Once postgres_run_write is updated to place country after catalog_id, the job parameter indices in this test should reflect:

  • index 1 → catalog_id
  • index 2 → country
  • indices 6 and 7 remain message and additional_info.

Concretely:

-    # Check first job
-    _job1_sql, job1_params = cur.executions[1]
-    assert job1_params[1] == ""
-    assert job1_params[2] == "c1"
+    # Check first job
+    _job1_sql, job1_params = cur.executions[1]
+    assert job1_params[1] == "c1"
+    assert job1_params[2] == ""
@@
-    # Check second job
-    _job2_sql, job2_params = cur.executions[2]
-    assert job2_params[1] == "bw"
-    assert job2_params[2] == "c2"
+    # Check second job
+    _job2_sql, job2_params = cur.executions[2]
+    assert job2_params[1] == "c2"
+    assert job2_params[2] == "bw"
     assert job2_params[6] == "err"
     assert json.loads(job2_params[7]) == {"k": "v"}

This keeps the existing coverage while bringing the tests in line with the intended column order.

🧹 Nitpick comments (1)
conf/topic_runs.json (1)

42-45: Move country below catalog_id to match the stated schema ordering.

The issue description asks for country to be positioned after catalog_id in jobs.items.properties. Here it appears before catalog_id. While JSON property order is not semantic, aligning the order with the spec and DB inserts will reduce confusion.

You could reorder as:

-                "properties": {
-                    "country": {
-                      "type": "string",
-                      "description": "The country the data is related to."
-                    },
-                    "catalog_id": {
+                "properties": {
+                    "catalog_id": {
                         "type": "string",
                         "description": "Identifier for the data definition (Glue/Hive) database and table name for  example"
+                    },
+                    "country": {
+                      "type": "string",
+                      "description": "The country the data is related to."
                     },
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0922f and 481ae29.

📒 Files selected for processing (4)
  • conf/topic_dlchange.json (1 hunks)
  • conf/topic_runs.json (1 hunks)
  • src/writers/writer_postgres.py (5 hunks)
  • tests/writers/test_writer_postgres.py (5 hunks)
🔇 Additional comments (3)
conf/topic_dlchange.json (1)

28-31: EDLA country schema addition looks correct and aligned with the spec.

country is optional, typed as string, and positioned immediately after timestamp_event without being added to required, matching the stated objectives.

src/writers/writer_postgres.py (1)

90-137: EDLA writer’s country column and default handling look correct.

country is inserted immediately after timestamp_event and uses message.get("country", "") so missing values fall back to an empty string as required. The parameter ordering in the VALUES tuple matches the column list and the updated tests.

tests/writers/test_writer_postgres.py (1)

41-90: EDLA tests correctly cover both presence and defaulting of country.

The tests assert the new parameter count, verify country at index 6 when present, and confirm the empty-string default and index shifts when optional fields are omitted. This aligns tightly with the EDLA writer behavior and objectives.

@tmikula-dev tmikula-dev merged commit be28362 into master Dec 12, 2025
16 checks passed
@tmikula-dev tmikula-dev deleted the feature/79-add-country-field-and-edla branch December 12, 2025 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add country field to run and edla schemas

3 participants