Skip to content

Speed up Python GraphBinary deserialization#3493

Open
kirill-stepanishin wants to merge 2 commits into
apache:masterfrom
kirill-stepanishin:python-graphbinary-int-dispatch
Open

Speed up Python GraphBinary deserialization#3493
kirill-stepanishin wants to merge 2 commits into
apache:masterfrom
kirill-stepanishin:python-graphbinary-int-dispatch

Conversation

@kirill-stepanishin

Copy link
Copy Markdown
Contributor

The GraphBinary reader built a DataType enum member from the type byte for every object it decoded. That per-object enum construction heavily degrades deserialization performance on large result sets.

The reader now builds a {type code: deserializer} lookup table once up front and dispatches on the raw integer instead, avoiding per-object enum construction. Behavior is unchanged: an unknown type code still raises ValueError("... is not a valid DataType").

Performance

Benchmarked on two cross-region EC2 instances (server in US-EAST-2, client in US-WEST-2) to capture realistic network latency, against the Modern graph over GraphBinary V4 on Python 3.11. Each query was run with and without this change, alternating back to back across 3 sweeps, reporting the median.

Query Before After Change
g.V().repeat(both()).times(12) (~200k results) 7.97 s 5.85 s 26% faster
g.V() (6 results) 0.107 s 0.109 s no change

The improvement is significant on large result sets, where per-object deserialization cost dominates, and scales with the number of objects returned.

Assisted-by: Claude Code:claude-opus-4-8
@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.16%. Comparing base (a28cd1f) to head (2cfc742).
⚠️ Report is 181 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3493      +/-   ##
============================================
- Coverage     76.35%   76.16%   -0.20%     
- Complexity    13424    13935     +511     
============================================
  Files          1012     1030      +18     
  Lines         60341    63138    +2797     
  Branches       7075     7427     +352     
============================================
+ Hits          46076    48087    +2011     
- Misses        11548    12045     +497     
- Partials       2717     3006     +289     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread gremlin-python/src/main/python/gremlin_python/structure/io/graphbinaryV4.py Outdated
@kenhuuu

kenhuuu commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

VOTE +1

@Cole-Greer Cole-Greer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VOTE +1

@kirill-stepanishin kirill-stepanishin deleted the python-graphbinary-int-dispatch branch July 2, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants