Skip to content

Conversation

@gverkes
Copy link
Contributor

@gverkes gverkes commented Oct 13, 2025

Fixes #6669

Tracking issue

Closes flyteorg/flyte#6669

Why are the changes needed?

When deserializing Pydantic models containing FlyteFile or FlyteDirectory fields using model_validate(), the deserialized objects were
missing private attributes (_remote_source, _downloader, etc.). This caused an AttributeError when attempting to re-serialize these
objects with model_dump(), breaking the serialize → deserialize → serialize cycle.

This is a critical bug that prevents users from using FlyteFile/FlyteDirectory within Pydantic BaseModel classes in a normal way.

What changes were proposed in this pull request?

Added Pydantic model validators to both FlyteFile and FlyteDirectory classes that ensure private attributes are properly initialized during
deserialization:

  1. FlyteFile (flytekit/types/file/file.py):

    • Enhanced deserialize_flyte_file validator to check if private attributes exist
    • If missing, reconstructs the FlyteFile using dict_to_flyte_file() transformer
    • If attributes already exist (e.g., when passing already-constructed FlyteFile), returns as-is
  2. FlyteDirectory (flytekit/types/directory/types.py):

    • Applied same fix to deserialize_flyte_dir validator
    • Uses dict_to_flyte_directory() to properly reconstruct the object

How was this patch tested?

Added two new unit tests in test_pydantic_basemodel_transformer.py:

  1. test_flytefile_pydantic_model_dump_validate_cycle - Verifies FlyteFile can be serialized, deserialized, and re-serialized without errors
  2. test_flytedirectory_pydantic_model_dump_validate_cycle - Same for FlyteDirectory

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

  • This pull request introduces a critical bug in the deserialization of Pydantic models containing FlyteFile and FlyteDirectory fields, which could lead to AttributeErrors during re-serialization.
  • Model validators are added to ensure that private attributes are properly initialized during deserialization.
  • New unit tests have been added to verify the functionality of these changes, ensuring that the serialize-deserialize-serialize cycle works without errors.
  • Overall, this pull request addresses deserialization issues in Pydantic models, introduces critical bugs, and adds unit tests.

@welcome
Copy link

welcome bot commented Oct 13, 2025

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

if info.context is None or info.context.get("deserialize") is not True:
return self
# Check if all private attributes are already set up (e.g., from __init__)
if hasattr(self, "_downloader") and hasattr(self, "_remote_source"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
Do we need to check for the other private vars like _downloaded too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or will they all together be set or not set anyways?


dict_obj = {"path": str(self.path)}
metadata = getattr(self, "metadata", None)
if metadata is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check:
@gverkes did you test whether we need to do this for flyte directory too?

fg91
fg91 previously approved these changes Oct 13, 2025
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and can confirm that the fix works, thanks for fixing 🙏

My approval appears to be not enough to merge, I'm not a code owner for those dirs. Could you please take a look @davidmirror-ops or @pingsutw ? 🙏

@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from 82990a6 to 628cf96 Compare October 13, 2025 19:23
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.03%. Comparing base (531d2aa) to head (f16d4c1).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/types/directory/types.py 75.00% 0 Missing and 1 partial ⚠️
flytekit/types/file/file.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3339      +/-   ##
==========================================
- Coverage   76.21%   76.03%   -0.18%     
==========================================
  Files         216      216              
  Lines       22700    22718      +18     
  Branches     2975     2981       +6     
==========================================
- Hits        17301    17274      -27     
- Misses       4540     4582      +42     
- Partials      859      862       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from e279566 to f4a16e4 Compare October 18, 2025 13:49
  Fixes #6669

Signed-off-by: Govert Verkes <[email protected]>
Signed-off-by: Govert Verkes <[email protected]>
@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from f4a16e4 to 0b4bf36 Compare December 1, 2025 10:29
Extract hasattr check into _was_initialized_via_init() helper for clarity.
Remove redundant dict_to_flyte_* calls - fall through to to_python_value instead.

Signed-off-by: Govert Verkes <[email protected]>
@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from 8d7c203 to 38bf633 Compare December 6, 2025 22:15
@machichima
Copy link
Member

HI @gverkes,
Feel free to ping me when this PR is ready for review. Thanks!

Copy link
Member

@machichima machichima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good! Just few nits.
Thank you!

gverkes and others added 3 commits December 18, 2025 10:53
Co-authored-by: Nary Yeh <[email protected]>
Signed-off-by: Govert Verkes <[email protected]>
Signed-off-by: gverkes <[email protected]>
Co-authored-by: Nary Yeh <[email protected]>
Signed-off-by: Govert Verkes <[email protected]>
Signed-off-by: gverkes <[email protected]>
…basemodel_transformer.py

Co-authored-by: Nary Yeh <[email protected]>
Signed-off-by: Govert Verkes <[email protected]>
Signed-off-by: gverkes <[email protected]>
@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from f16d4c1 to 6bcdbef Compare December 18, 2025 10:54
@gverkes gverkes requested a review from machichima December 18, 2025 10:55
Copy link
Member

@machichima machichima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@machichima
Copy link
Member

@gverkes Could you please fix the CI? Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Deserializing Pydantic models with FlyteFile fields creates FlyteFile instances that can't be serialized again.

4 participants