feat: preserve durable context across summarization by ShenAC-SAC · Pull Request #3887 · bytedance/deer-flow

ShenAC-SAC · 2026-06-30T12:30:35Z

Why

DeerFlow has several long-lived runtime facts that need to survive message-window compaction, but those facts should not be stored as ordinary chat transcript entries.

Before this change, summarization wrote the generated summary back into messages as a hidden HumanMessage(name="summary"). Skill preservation had a similar shape: recently loaded skill read_file messages were kept around so the model could still see skill instructions after compaction. That worked, but it coupled durable runtime state to the mutable transcript:

UI/frontend merge logic, RunJournal processing, middleware scans, and model context all had to understand hidden control messages inside messages.
A generated summary could be treated like a real user message by later middleware unless every consumer remembered to special-case it.
Completed subagent work could disappear after summarization, causing the lead agent to delegate the same work again.
Preserving skill context by retaining raw read_file results could keep full SKILL.md bodies in checkpoints and future model windows.
Runtime facts had no explicit reducer, cap, or rendering policy separate from the transcript.

Concrete failure modes this addresses:

After summarization, completed task tool results could be compacted out of the live message window, so the lead agent could lose evidence that a subagent already finished the work and delegate the same task again.
Loaded skill instructions previously survived compaction only if the raw read_file call and paired ToolMessage stayed in recent messages, which made skill availability depend on the transcript retention policy.
Summary text stored as HumanMessage(name="summary") could be mistaken for user input by later middleware unless every middleware remembered to filter it.
Keeping hidden control messages in messages made frontend merge/archive logic and backend middleware scans depend on transcript-specific conventions instead of explicit runtime state.

This PR moves the first set of durable runtime facts out of messages and into explicit thread-state channels, then projects those facts back into model requests only at call time.

Design Summary

The important design shift is:

Raw user, assistant, and tool messages remain the transcript.
Middleware extracts durable facts into typed ThreadState channels.
Summarization compacts old raw messages into summary_text instead of creating a hidden summary message.
Each model call receives an ephemeral durable-context projection built from thread state.
That projection is not written back into checkpointed messages.

So the model still sees the summary, delegations, and skill references, but the checkpointed transcript no longer has to carry hidden control messages for those facts.

What Changed

DurableContextMiddleware

Adds DurableContextMiddleware, registered before summarization in the lead-agent chain.

It captures durable facts from the current transcript before summarization can compact the paired tool-call/tool-result messages, then injects a bounded model-request-only durable context projection on later calls.

The projection is intentionally split by authority level:

Static durable-context handling rules are injected as a SystemMessage.
Runtime-provided values such as summary_text, delegation results, and skill descriptions are injected separately as a hidden HumanMessage data block marked with additional_kwargs["durable_context_data"] and hide_from_ui.

This prevents user/model/tool/subagent text captured in durable fields from being promoted to system-role instructions. Captured values are escaped or bounded before rendering.

Summary state channel

Summarization now writes generated prose to ThreadState.summary_text instead of inserting a hidden summary message into messages.

When a later compaction happens, the summarizer receives both:

the previous summary_text; and
the new raw messages that need to be folded in.

The new summary replaces summary_text, while compacted raw messages are removed through the normal message reducer and the recent message window is preserved.

The summary prompt budgeting was also hardened after review:

existing summary text participates in trigger token counting;
previous summary and new messages are trimmed as separate content sections before XML-like wrapper tags are added;
partial trimming uses LangChain trim_messages with a character splitter so single long messages can still be budgeted;
deterministic fallback truncation is now a hard cap, including very small budgets.

Legacy HumanMessage(name="summary") messages remain filtered by middleware paths for old threads, but new compactions no longer create them.

Delegation ledger

Adds a deterministic delegations channel for task tool dispatches and paired ToolMessage results.

Each ledger entry stores:

task call id;
short description;
subagent type;
current or terminal status;
bounded result brief;
result hash;
source result reference;
creation timestamp.

The rendered ledger is newest-first and tells the model when in-progress work is already delegated and when completed work should be reused instead of delegated again. Failed, cancelled, and timed-out entries are represented as retryable prior attempts rather than reusable results.

The ledger now uses the shared backend/frontend extract_subagent_status contract instead of duplicating prefix parsing. Dispatches start as in_progress; unknown or streaming task outputs stay in_progress rather than being persisted as fake failed entries, so durable context does not mask contract drift or streaming intermediate states.

The reducer caps retained ledger entries to the newest window, and capture is cap-aware so entries intentionally evicted by the cap are not repeatedly re-emitted and allowed to evict newer entries.

Skill context references

Adds a durable skill_context channel for loaded SKILL.md references.

The extractor only captures successful read calls that:

use a configured read tool name;
resolve under the configured skills root;
point specifically to a SKILL.md file;
are not errored tool reads.

The persisted entry is a compact reference: name, path, frontmatter description, and loaded_at. It deliberately does not persist the full skill body. The model sees an active-skill reminder with the path and description; if it needs exact instructions, it should re-read the skill file.

skill_file_read_tool_names: [] now disables this durable skill-reference capture instead of falling back to default read tool names.

Middleware compatibility

Legacy summary messages are excluded from user-targeted middleware scans:

InputSanitizationMiddleware
DynamicContextMiddleware
SkillActivationMiddleware

This preserves compatibility for existing checkpointed threads that may already contain HumanMessage(name="summary") while keeping new threads on the state-channel design.

Documentation and config

Updates backend and frontend docs to reflect the final design:

summaries are stored in summary_text, not injected back as regular messages;
durable runtime values are projected as hidden durable-context data, not promoted to system-role instructions;
skill context stores references, not full skill bodies;
frontend middleware docs include DurableContextMiddleware in the correct order;
config.example.yaml explains that skill_file_read_tool_names: [] disables durable skill-reference capture.

Review Hardening Since Initial Version

The review pass found and fixed several edge cases:

Untrusted durable field values are no longer rendered inside SystemMessage.
Existing summary_text is counted when deciding whether summarization should trigger.
Previous summaries cannot blow up the summarization prompt unchecked.
Summary prompt trimming cannot return only wrapper markup with no actual message content.
Very small fallback budgets now stay within the requested cap.
Unknown/non-terminal task results remain in_progress instead of becoming fake failed ledger entries.
The delegation ledger cap no longer causes old evicted entries to be re-captured and evict newer entries.
Live durable skill-context assertions were updated to look for the hidden durable-context data message rather than a SystemMessage.
Docs and PR-facing config comments were corrected to avoid overclaiming that full skill instructions are persisted or re-injected.

Scope Notes

This PR intentionally keeps the durable context surface narrow:

delegation capture is limited to the existing task tool;
skill context captures only SKILL.md references, not arbitrary files under the skills directory;
slash-skill activation still injects the current-turn skill body directly and is not expanded into a durable channel here;
this does not attempt to redesign every possible summary or memory type.

The goal is to establish the durable channel pattern and apply it to the first high-value facts that are already affected by message-window compaction.

Follow-up / Remaining Decoupling Work

This PR moves the persistence and model-projection surface for summaries, delegated work, and loaded skill references out of messages, but it does not eliminate every message-derived capture path in DeerFlow.

The remaining production coupling is on the capture side:

delegations is still initially derived from AIMessage.tool_calls for the task tool and later upgraded from paired terminal ToolMessage results.
Task status and result text still rely on the existing task-result string contract, with shared backend/frontend prefix handling for historical compatibility.
skill_context is still initially derived from successful skill-file read tool calls and the paired ToolMessage.content, from which only the SKILL.md frontmatter description is parsed into a compact reference.
Backend and frontend still keep compatibility shims for old HumanMessage(name="summary") control messages and legacy subtask-result text, so historical threads remain renderable.

A good follow-up PR would move these capture points closer to their authoritative producers: emit structured task-result metadata directly from the task tool/runtime boundary, and emit structured skill-read metadata from the file-read/tool boundary instead of re-parsing tool transcript content. After that migration window, the legacy prefix and hidden-summary compatibility shims can be retired separately.

This is intentionally left out of this PR because it changes tool/result contracts across backend, frontend, runtime state, and historical-thread compatibility. The important step here is that once the facts are captured, they no longer need raw transcript messages to remain in the context window.

Testing

Local verification after resolving Copilot feedback and merging latest upstream/main:

cd backend && uv run ruff check packages/harness/deerflow/agents/thread_state.py packages/harness/deerflow/agents/middlewares/delegation_ledger.py packages/harness/deerflow/agents/middlewares/durable_context_middleware.py packages/harness/deerflow/runtime/journal.py tests/test_delegation_ledger.py tests/test_durable_context_middleware.py tests/test_thread_state_reducers.py tests/test_run_journal.py tests/test_delegation_ledger_live.py
- All checks passed!
cd backend && uv run pytest tests/test_delegation_ledger.py tests/test_thread_state_reducers.py tests/test_durable_context_middleware.py tests/test_run_journal.py::TestChatModelStartHumanMessage tests/test_summarization_summary_text.py tests/test_summarization_middleware.py tests/test_skill_context.py tests/test_input_sanitization_middleware.py tests/test_dynamic_context_middleware.py tests/test_slash_skills.py -q
- 257 passed, 1 warning
git diff --check and git diff --cached --check
- both pass

CI should be used as the final source of truth after the latest pushed commit.

Copilot

Pull request overview

This PR decouples long-lived runtime context (summary, delegated work results, and loaded skill references) from the mutable transcript by persisting them in explicit ThreadState channels and projecting them into model requests only at call time.

Changes:

Add DurableContextMiddleware to capture + inject summary_text, delegation_ledger, and skill_context as hidden durable-context data (with a separate authority SystemMessage contract).
Update summarization to write prose summaries into ThreadState.summary_text instead of inserting HumanMessage(name="summary") into messages, with improved budgeting/trimming behavior.
Add reducers + tests for the new thread-state channels; update docs/config and frontend docs to reflect the new middleware ordering and durable-context behavior.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
frontend/src/core/threads/hooks.ts	Adds backward-compat handling for legacy `HumanMessage(name="summary")` markers in frontend summarization tracking.
frontend/src/content/zh/harness/middlewares.mdx	Updates middleware ordering/docs to include `DurableContextMiddleware` and the new summary behavior.
frontend/src/content/en/harness/middlewares.mdx	Updates middleware ordering/docs to include `DurableContextMiddleware` and the new summary behavior.
config.example.yaml	Bumps config version and documents durable skill-reference capture via `skill_file_read_tool_names` (legacy preserve settings removed).
backend/tests/test_thread_state_reducers.py	Adds reducer coverage for `delegation_ledger` + `skill_context` and asserts summary channel existence.
backend/tests/test_summarization_summary_text.py	New tests validating summarization writes `summary_text`, budgets correctly, and fails safely.
backend/tests/test_summarization_middleware.py	Updates existing tests to assert no synthetic summary message and validates `summary_text` emission.
backend/tests/test_slash_skills.py	Adjusts skill activation tests for hidden/durable-context behavior and adds legacy-summary guard.
backend/tests/test_skill_context.py	New tests for extracting + rendering compact skill references (not full bodies).
backend/tests/test_run_journal.py	Removes test that ensured `name="summary"` messages were skipped when extracting first human message.
backend/tests/test_input_sanitization_middleware.py	Updates tests around genuine-user-message classification for legacy summary messages.
backend/tests/test_dynamic_context_middleware.py	Adds guard to prevent legacy summary messages from being injection targets.
backend/tests/test_durable_context_middleware.py	New integration tests for capture + injection across summarization and with skill refs.
backend/tests/test_delegation_ledger.py	New tests for extracting + rendering delegation ledger entries and bounds/escaping.
backend/tests/test_delegation_ledger_live.py	Adds an opt-in live E2E test validating ledger + summary persistence across real summarization.
backend/tests/fixtures/replay/write_read_file.ultra.events.json	Updates replay fixture keys to include new thread-state channels.
backend/packages/harness/deerflow/runtime/journal.py	Changes first-human extraction logic (now only filters `hide_from_ui`).
backend/packages/harness/deerflow/config/summarization_config.py	Removes legacy skill-preserve fields and documents skill-read tool names for durable capture.
backend/packages/harness/deerflow/agents/thread_state.py	Adds `delegation_ledger`, `skill_context`, `summary_text` channels and reducers.
backend/packages/harness/deerflow/agents/middlewares/summarization_middleware.py	Writes summaries to `summary_text`, counts prior summary in triggers, and improves trimming/failure handling.
backend/packages/harness/deerflow/agents/middlewares/skill_context.py	New: deterministic extraction + rendering of compact skill references under skills root.
backend/packages/harness/deerflow/agents/middlewares/input_sanitization_middleware.py	Updates documentation text around how injected HumanMessages are identified.
backend/packages/harness/deerflow/agents/middlewares/durable_context_middleware.py	New: capture + inject durable context (summary/ledger/skills) without persisting injected messages.
backend/packages/harness/deerflow/agents/middlewares/delegation_ledger.py	New: deterministic extraction + rendering of completed `task` delegations (with bounding + escaping).
backend/packages/harness/deerflow/agents/lead_agent/agent.py	Registers `DurableContextMiddleware` before summarization; removes old summarization skill-rescue wiring.
backend/docs/summarization.md	Updates documentation to reflect `summary_text` + durable projection model.
backend/AGENTS.md	Updates thread-state and middleware-order documentation to include durable-context channels/middleware.

        if caller == "lead_agent" and not self._first_human_msg and messages:
            for batch in reversed(messages):
                for m in reversed(batch):
-                    if isinstance(m, HumanMessage) and m.name != "summary" and m.additional_kwargs.get("hide_from_ui") is not True:
+                    if isinstance(m, HumanMessage) and m.additional_kwargs.get("hide_from_ui") is not True:
                        self.set_first_human_message(m.text)


+        super().__init__()
+        self._skills_root = (skills_container_path or _DEFAULT_SKILLS_ROOT).rstrip("/")
+        self._skill_read_tool_names = frozenset(_DEFAULT_SKILL_READ_TOOL_NAMES if skill_file_read_tool_names is None else skill_file_read_tool_names)
+


WillemJiang · 2026-07-01T03:05:06Z

@ShenAC-SAC, please take a look at the review comment of Copilot and resolve the conflict with the main branch.

…y-context # Conflicts: # backend/AGENTS.md # backend/packages/harness/deerflow/agents/thread_state.py # backend/tests/fixtures/replay/write_read_file.ultra.events.json # backend/tests/test_delegation_ledger.py

ShenAC-SAC · 2026-07-01T04:19:08Z

@WillemJiang Thanks for the reminder. I addressed the Copilot review comments and resolved the conflict with latest main.

What changed in this update:

Restored legacy-summary filtering in RunJournal, so old checkpointed HumanMessage(name="summary") messages are not recorded as llm.human.input or used as the first user message.
Replaced the skills_container_path.rstrip("/") normalization with POSIX path normalization, preserving / and slash-only roots instead of collapsing them to an empty string.
Merged latest upstream/main into this branch.

Conflict resolution note:

The conflict came from #3877 adding a system-maintained delegation ledger in parallel with this PR's durable-context work. I resolved it by keeping a single delegation state channel, ThreadState.delegations, instead of maintaining two ledgers.

The merged design keeps #3877's useful behavior for in_progress delegated tasks and terminal-status downgrade protection, while preserving this PR's richer durable context projection with terminal result metadata (result_brief, result_sha256, result_ref). I also removed the separate DelegationLedgerMiddleware registration path so delegation state is not injected twice; DurableContextMiddleware now owns the unified capture/projection path for summary, delegations, and skill references.

I also updated the PR body and docs to describe the current delegations design.

Verification:

cd backend && uv run ruff check ... -> all checks passed
cd backend && uv run pytest ... -q -> 257 passed, 1 warning
GitHub checks are now green for backend unit tests, e2e tests, frontend unit tests, lint backend/frontend, backend-blocking-io, and replay E2E layers.

Current PR state is MERGEABLE; it is still blocked only by required review.

ShenAC-SAC added 4 commits June 30, 2026 20:29

feat: preserve durable context across summarization

90b1a99

fix: harden durable context review gaps

7d4323f

style: format delegation ledger live test

4badfa6

chore: remove stale delegation ledger prefix

7f7f930

ShenAC-SAC marked this pull request as ready for review June 30, 2026 14:09

ShenAC-SAC requested review from Huixin615, WillemJiang, fancyboi999, ggnnggez and hetaoBackend June 30, 2026 14:18

WillemJiang requested a review from Copilot July 1, 2026 02:42

Copilot started reviewing on behalf of WillemJiang July 1, 2026 02:43 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

ShenAC-SAC added 2 commits July 1, 2026 11:32

fix: address durable context review feedback

ba5ec99

Merge remote-tracking branch 'upstream/main' into feat/durable-summar…

e4e9a1a

…y-context # Conflicts: # backend/AGENTS.md # backend/packages/harness/deerflow/agents/thread_state.py # backend/tests/fixtures/replay/write_read_file.ultra.events.json # backend/tests/test_delegation_ledger.py

WillemJiang approved these changes Jul 1, 2026

View reviewed changes

WillemJiang merged commit 442248d into bytedance:main Jul 1, 2026
14 checks passed

WillemJiang added this to the 2.1.0 milestone Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: preserve durable context across summarization#3887

feat: preserve durable context across summarization#3887
WillemJiang merged 6 commits into
bytedance:mainfrom
ShenAC-SAC:feat/durable-summary-context

ShenAC-SAC commented Jun 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

WillemJiang commented Jul 1, 2026

Uh oh!

ShenAC-SAC commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ShenAC-SAC commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Design Summary

What Changed

DurableContextMiddleware

Summary state channel

Delegation ledger

Skill context references

Middleware compatibility

Documentation and config

Review Hardening Since Initial Version

Scope Notes

Follow-up / Remaining Decoupling Work

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

WillemJiang commented Jul 1, 2026

Uh oh!

ShenAC-SAC commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShenAC-SAC commented Jun 30, 2026 •

edited

Loading