Skip to content

Conversation

@CTTY
Copy link
Collaborator

@CTTY CTTY commented Dec 12, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Modified ArrowSchemaConverter to enable id reassignment
  • Added a new pub helper: arrow_schema_to_schema_auto_assign_ids

Are these changes tested?

Added uts

@CTTY CTTY changed the title feat(arrow): Arrow schema to Iceberg schema with auto assigned field ids feat(arrow): Convert Arrow schema to Iceberg schema with auto assigned field ids Dec 13, 2025
/// Builds the schema.
pub fn build(self) -> Result<Schema> {
// If field IDs need to be reassigned, do it first before validation
if let Some(start_from) = self.reassign_field_ids_from {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation of this change? The problem with this change is that it may make the error more difficult to read. For example, if user passed a non exists identifier id, originally the error message would be quite clear and easy to understand. But this change will confuse user.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point!

My original intention was to avoid building id-to-field map again when we are trying to reassign field ids.

self.temp_field_id_counter += 1;
Ok(temp_id)
} else {
get_field_id_from_metadata(field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant hehavior change, please add some doc to highlight it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are already docs above to explain reassign_field_ids_from:

/// When set, the schema builder will reassign field IDs starting from this value
/// using level-order traversal (breadth-first).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a helper to auto assign field ids when converting arrow schema to iceberg schema

2 participants