-
Notifications
You must be signed in to change notification settings - Fork 0
Add country field to EDLA schema #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR adds an optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related issues
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Trivy has completed a full security repository scan ✅ You can find the analysis results for this PR branch on this overview.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/writers/writer_postgres.py (1)
187-221: Align jobs INSERT column order with spec: placecountryaftercatalog_id.In the jobs INSERT,
countrycurrently appears beforecatalog_id:event_id, country, catalog_id, ...The PR objectives require inserting
countryaftercatalog_id. While named columns keep the query functionally correct, it’s better to match the requested order and what readers will expect.You can adjust the column and value order like this:
- INSERT INTO {table_jobs} - ( - event_id, - country, - catalog_id, - status, - timestamp_start, - timestamp_end, - message, - additional_info - ) + INSERT INTO {table_jobs} + ( + event_id, + catalog_id, + country, + status, + timestamp_start, + timestamp_end, + message, + additional_info + ) VALUES ( %s, %s, %s, %s, %s, %s, %s, %s )""", ( - message["event_id"], - job.get("country", ""), - job["catalog_id"], + message["event_id"], + job["catalog_id"], + job.get("country", ""), job["status"], job["timestamp_start"], job["timestamp_end"], job.get("message"), (json.dumps(job.get("additional_info")) if "additional_info" in job else None), ),This keeps the default behavior (
job.get("country", "")) while aligning the position with the spec.tests/writers/test_writer_postgres.py (1)
92-135: Update runs test indices to matchcountrybeing aftercatalog_idin jobs INSERT.Once
postgres_run_writeis updated to placecountryaftercatalog_id, the job parameter indices in this test should reflect:
- index 1 →
catalog_id- index 2 →
country- indices 6 and 7 remain
messageandadditional_info.Concretely:
- # Check first job - _job1_sql, job1_params = cur.executions[1] - assert job1_params[1] == "" - assert job1_params[2] == "c1" + # Check first job + _job1_sql, job1_params = cur.executions[1] + assert job1_params[1] == "c1" + assert job1_params[2] == "" @@ - # Check second job - _job2_sql, job2_params = cur.executions[2] - assert job2_params[1] == "bw" - assert job2_params[2] == "c2" + # Check second job + _job2_sql, job2_params = cur.executions[2] + assert job2_params[1] == "c2" + assert job2_params[2] == "bw" assert job2_params[6] == "err" assert json.loads(job2_params[7]) == {"k": "v"}This keeps the existing coverage while bringing the tests in line with the intended column order.
🧹 Nitpick comments (1)
conf/topic_runs.json (1)
42-45: Movecountrybelowcatalog_idto match the stated schema ordering.The issue description asks for
countryto be positioned aftercatalog_idinjobs.items.properties. Here it appears beforecatalog_id. While JSON property order is not semantic, aligning the order with the spec and DB inserts will reduce confusion.You could reorder as:
- "properties": { - "country": { - "type": "string", - "description": "The country the data is related to." - }, - "catalog_id": { + "properties": { + "catalog_id": { "type": "string", "description": "Identifier for the data definition (Glue/Hive) database and table name for example" + }, + "country": { + "type": "string", + "description": "The country the data is related to." },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
conf/topic_dlchange.json(1 hunks)conf/topic_runs.json(1 hunks)src/writers/writer_postgres.py(5 hunks)tests/writers/test_writer_postgres.py(5 hunks)
🔇 Additional comments (3)
conf/topic_dlchange.json (1)
28-31: EDLAcountryschema addition looks correct and aligned with the spec.
countryis optional, typed as string, and positioned immediately aftertimestamp_eventwithout being added torequired, matching the stated objectives.src/writers/writer_postgres.py (1)
90-137: EDLA writer’scountrycolumn and default handling look correct.
countryis inserted immediately aftertimestamp_eventand usesmessage.get("country", "")so missing values fall back to an empty string as required. The parameter ordering in the VALUES tuple matches the column list and the updated tests.tests/writers/test_writer_postgres.py (1)
41-90: EDLA tests correctly cover both presence and defaulting ofcountry.The tests assert the new parameter count, verify
countryat index 6 when present, and confirm the empty-string default and index shifts when optional fields are omitted. This aligns tightly with the EDLA writer behavior and objectives.
Summary
This pull request adds support for a new
countryfield across the data ingestion pipeline. The main goal is to ensure that country information is properly captured, stored, and validated for both event and job records.Release Notes:
Closes #79
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.