-
Notifications
You must be signed in to change notification settings - Fork 377
feat(arrow): Convert Arrow schema to Iceberg schema with auto assigned field ids #1928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// Builds the schema. | ||
| pub fn build(self) -> Result<Schema> { | ||
| // If field IDs need to be reassigned, do it first before validation | ||
| if let Some(start_from) = self.reassign_field_ids_from { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation of this change? The problem with this change is that it may make the error more difficult to read. For example, if user passed a non exists identifier id, originally the error message would be quite clear and easy to understand. But this change will confuse user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point!
My original intention was to avoid building id-to-field map again when we are trying to reassign field ids.
| self.temp_field_id_counter += 1; | ||
| Ok(temp_id) | ||
| } else { | ||
| get_field_id_from_metadata(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a significant hehavior change, please add some doc to highlight it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are already docs above to explain reassign_field_ids_from:
/// When set, the schema builder will reassign field IDs starting from this value
/// using level-order traversal (breadth-first).
Which issue does this PR close?
What changes are included in this PR?
arrow_schema_to_schema_auto_assign_idsAre these changes tested?
Added uts