feat: preserve durable context across summarization#3887
Conversation
There was a problem hiding this comment.
Pull request overview
This PR decouples long-lived runtime context (summary, delegated work results, and loaded skill references) from the mutable transcript by persisting them in explicit ThreadState channels and projecting them into model requests only at call time.
Changes:
- Add
DurableContextMiddlewareto capture + injectsummary_text,delegation_ledger, andskill_contextas hidden durable-context data (with a separate authoritySystemMessagecontract). - Update summarization to write prose summaries into
ThreadState.summary_textinstead of insertingHumanMessage(name="summary")intomessages, with improved budgeting/trimming behavior. - Add reducers + tests for the new thread-state channels; update docs/config and frontend docs to reflect the new middleware ordering and durable-context behavior.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/core/threads/hooks.ts | Adds backward-compat handling for legacy HumanMessage(name="summary") markers in frontend summarization tracking. |
| frontend/src/content/zh/harness/middlewares.mdx | Updates middleware ordering/docs to include DurableContextMiddleware and the new summary behavior. |
| frontend/src/content/en/harness/middlewares.mdx | Updates middleware ordering/docs to include DurableContextMiddleware and the new summary behavior. |
| config.example.yaml | Bumps config version and documents durable skill-reference capture via skill_file_read_tool_names (legacy preserve settings removed). |
| backend/tests/test_thread_state_reducers.py | Adds reducer coverage for delegation_ledger + skill_context and asserts summary channel existence. |
| backend/tests/test_summarization_summary_text.py | New tests validating summarization writes summary_text, budgets correctly, and fails safely. |
| backend/tests/test_summarization_middleware.py | Updates existing tests to assert no synthetic summary message and validates summary_text emission. |
| backend/tests/test_slash_skills.py | Adjusts skill activation tests for hidden/durable-context behavior and adds legacy-summary guard. |
| backend/tests/test_skill_context.py | New tests for extracting + rendering compact skill references (not full bodies). |
| backend/tests/test_run_journal.py | Removes test that ensured name="summary" messages were skipped when extracting first human message. |
| backend/tests/test_input_sanitization_middleware.py | Updates tests around genuine-user-message classification for legacy summary messages. |
| backend/tests/test_dynamic_context_middleware.py | Adds guard to prevent legacy summary messages from being injection targets. |
| backend/tests/test_durable_context_middleware.py | New integration tests for capture + injection across summarization and with skill refs. |
| backend/tests/test_delegation_ledger.py | New tests for extracting + rendering delegation ledger entries and bounds/escaping. |
| backend/tests/test_delegation_ledger_live.py | Adds an opt-in live E2E test validating ledger + summary persistence across real summarization. |
| backend/tests/fixtures/replay/write_read_file.ultra.events.json | Updates replay fixture keys to include new thread-state channels. |
| backend/packages/harness/deerflow/runtime/journal.py | Changes first-human extraction logic (now only filters hide_from_ui). |
| backend/packages/harness/deerflow/config/summarization_config.py | Removes legacy skill-preserve fields and documents skill-read tool names for durable capture. |
| backend/packages/harness/deerflow/agents/thread_state.py | Adds delegation_ledger, skill_context, summary_text channels and reducers. |
| backend/packages/harness/deerflow/agents/middlewares/summarization_middleware.py | Writes summaries to summary_text, counts prior summary in triggers, and improves trimming/failure handling. |
| backend/packages/harness/deerflow/agents/middlewares/skill_context.py | New: deterministic extraction + rendering of compact skill references under skills root. |
| backend/packages/harness/deerflow/agents/middlewares/input_sanitization_middleware.py | Updates documentation text around how injected HumanMessages are identified. |
| backend/packages/harness/deerflow/agents/middlewares/durable_context_middleware.py | New: capture + inject durable context (summary/ledger/skills) without persisting injected messages. |
| backend/packages/harness/deerflow/agents/middlewares/delegation_ledger.py | New: deterministic extraction + rendering of completed task delegations (with bounding + escaping). |
| backend/packages/harness/deerflow/agents/lead_agent/agent.py | Registers DurableContextMiddleware before summarization; removes old summarization skill-rescue wiring. |
| backend/docs/summarization.md | Updates documentation to reflect summary_text + durable projection model. |
| backend/AGENTS.md | Updates thread-state and middleware-order documentation to include durable-context channels/middleware. |
| if caller == "lead_agent" and not self._first_human_msg and messages: | ||
| for batch in reversed(messages): | ||
| for m in reversed(batch): | ||
| if isinstance(m, HumanMessage) and m.name != "summary" and m.additional_kwargs.get("hide_from_ui") is not True: | ||
| if isinstance(m, HumanMessage) and m.additional_kwargs.get("hide_from_ui") is not True: | ||
| self.set_first_human_message(m.text) |
| super().__init__() | ||
| self._skills_root = (skills_container_path or _DEFAULT_SKILLS_ROOT).rstrip("/") | ||
| self._skill_read_tool_names = frozenset(_DEFAULT_SKILL_READ_TOOL_NAMES if skill_file_read_tool_names is None else skill_file_read_tool_names) | ||
|
|
|
@ShenAC-SAC, please take a look at the review comment of Copilot and resolve the conflict with the main branch. |
…y-context # Conflicts: # backend/AGENTS.md # backend/packages/harness/deerflow/agents/thread_state.py # backend/tests/fixtures/replay/write_read_file.ultra.events.json # backend/tests/test_delegation_ledger.py
|
@WillemJiang Thanks for the reminder. I addressed the Copilot review comments and resolved the conflict with latest What changed in this update:
Conflict resolution note: The conflict came from #3877 adding a system-maintained delegation ledger in parallel with this PR's durable-context work. I resolved it by keeping a single delegation state channel, The merged design keeps #3877's useful behavior for I also updated the PR body and docs to describe the current Verification:
Current PR state is |
Why
DeerFlow has several long-lived runtime facts that need to survive message-window compaction, but those facts should not be stored as ordinary chat transcript entries.
Before this change, summarization wrote the generated summary back into
messagesas a hiddenHumanMessage(name="summary"). Skill preservation had a similar shape: recently loaded skillread_filemessages were kept around so the model could still see skill instructions after compaction. That worked, but it coupled durable runtime state to the mutable transcript:messages.read_fileresults could keep fullSKILL.mdbodies in checkpoints and future model windows.Concrete failure modes this addresses:
tasktool results could be compacted out of the live message window, so the lead agent could lose evidence that a subagent already finished the work and delegate the same task again.read_filecall and pairedToolMessagestayed in recent messages, which made skill availability depend on the transcript retention policy.HumanMessage(name="summary")could be mistaken for user input by later middleware unless every middleware remembered to filter it.messagesmade frontend merge/archive logic and backend middleware scans depend on transcript-specific conventions instead of explicit runtime state.This PR moves the first set of durable runtime facts out of
messagesand into explicit thread-state channels, then projects those facts back into model requests only at call time.Design Summary
The important design shift is:
ThreadStatechannels.summary_textinstead of creating a hidden summary message.messages.So the model still sees the summary, delegations, and skill references, but the checkpointed transcript no longer has to carry hidden control messages for those facts.
What Changed
DurableContextMiddleware
Adds
DurableContextMiddleware, registered before summarization in the lead-agent chain.It captures durable facts from the current transcript before summarization can compact the paired tool-call/tool-result messages, then injects a bounded model-request-only durable context projection on later calls.
The projection is intentionally split by authority level:
SystemMessage.summary_text, delegation results, and skill descriptions are injected separately as a hiddenHumanMessagedata block marked withadditional_kwargs["durable_context_data"]andhide_from_ui.This prevents user/model/tool/subagent text captured in durable fields from being promoted to system-role instructions. Captured values are escaped or bounded before rendering.
Summary state channel
Summarization now writes generated prose to
ThreadState.summary_textinstead of inserting a hidden summary message intomessages.When a later compaction happens, the summarizer receives both:
summary_text; andThe new summary replaces
summary_text, while compacted raw messages are removed through the normal message reducer and the recent message window is preserved.The summary prompt budgeting was also hardened after review:
trim_messageswith a character splitter so single long messages can still be budgeted;Legacy
HumanMessage(name="summary")messages remain filtered by middleware paths for old threads, but new compactions no longer create them.Delegation ledger
Adds a deterministic
delegationschannel fortasktool dispatches and pairedToolMessageresults.Each ledger entry stores:
The rendered ledger is newest-first and tells the model when in-progress work is already delegated and when completed work should be reused instead of delegated again. Failed, cancelled, and timed-out entries are represented as retryable prior attempts rather than reusable results.
The ledger now uses the shared backend/frontend
extract_subagent_statuscontract instead of duplicating prefix parsing. Dispatches start asin_progress; unknown or streaming task outputs stayin_progressrather than being persisted as fakefailedentries, so durable context does not mask contract drift or streaming intermediate states.The reducer caps retained ledger entries to the newest window, and capture is cap-aware so entries intentionally evicted by the cap are not repeatedly re-emitted and allowed to evict newer entries.
Skill context references
Adds a durable
skill_contextchannel for loadedSKILL.mdreferences.The extractor only captures successful read calls that:
SKILL.mdfile;The persisted entry is a compact reference:
name,path, frontmatterdescription, andloaded_at. It deliberately does not persist the full skill body. The model sees an active-skill reminder with the path and description; if it needs exact instructions, it should re-read the skill file.skill_file_read_tool_names: []now disables this durable skill-reference capture instead of falling back to default read tool names.Middleware compatibility
Legacy summary messages are excluded from user-targeted middleware scans:
InputSanitizationMiddlewareDynamicContextMiddlewareSkillActivationMiddlewareThis preserves compatibility for existing checkpointed threads that may already contain
HumanMessage(name="summary")while keeping new threads on the state-channel design.Documentation and config
Updates backend and frontend docs to reflect the final design:
summary_text, not injected back as regular messages;DurableContextMiddlewarein the correct order;config.example.yamlexplains thatskill_file_read_tool_names: []disables durable skill-reference capture.Review Hardening Since Initial Version
The review pass found and fixed several edge cases:
SystemMessage.summary_textis counted when deciding whether summarization should trigger.in_progressinstead of becoming fake failed ledger entries.SystemMessage.Scope Notes
This PR intentionally keeps the durable context surface narrow:
tasktool;SKILL.mdreferences, not arbitrary files under the skills directory;The goal is to establish the durable channel pattern and apply it to the first high-value facts that are already affected by message-window compaction.
Follow-up / Remaining Decoupling Work
This PR moves the persistence and model-projection surface for summaries, delegated work, and loaded skill references out of
messages, but it does not eliminate every message-derived capture path in DeerFlow.The remaining production coupling is on the capture side:
delegationsis still initially derived fromAIMessage.tool_callsfor thetasktool and later upgraded from paired terminalToolMessageresults.skill_contextis still initially derived from successful skill-file read tool calls and the pairedToolMessage.content, from which only theSKILL.mdfrontmatter description is parsed into a compact reference.HumanMessage(name="summary")control messages and legacy subtask-result text, so historical threads remain renderable.A good follow-up PR would move these capture points closer to their authoritative producers: emit structured task-result metadata directly from the task tool/runtime boundary, and emit structured skill-read metadata from the file-read/tool boundary instead of re-parsing tool transcript content. After that migration window, the legacy prefix and hidden-summary compatibility shims can be retired separately.
This is intentionally left out of this PR because it changes tool/result contracts across backend, frontend, runtime state, and historical-thread compatibility. The important step here is that once the facts are captured, they no longer need raw transcript messages to remain in the context window.
Testing
Local verification after resolving Copilot feedback and merging latest
upstream/main:cd backend && uv run ruff check packages/harness/deerflow/agents/thread_state.py packages/harness/deerflow/agents/middlewares/delegation_ledger.py packages/harness/deerflow/agents/middlewares/durable_context_middleware.py packages/harness/deerflow/runtime/journal.py tests/test_delegation_ledger.py tests/test_durable_context_middleware.py tests/test_thread_state_reducers.py tests/test_run_journal.py tests/test_delegation_ledger_live.pyAll checks passed!cd backend && uv run pytest tests/test_delegation_ledger.py tests/test_thread_state_reducers.py tests/test_durable_context_middleware.py tests/test_run_journal.py::TestChatModelStartHumanMessage tests/test_summarization_summary_text.py tests/test_summarization_middleware.py tests/test_skill_context.py tests/test_input_sanitization_middleware.py tests/test_dynamic_context_middleware.py tests/test_slash_skills.py -q257 passed, 1 warninggit diff --checkandgit diff --cached --checkCI should be used as the final source of truth after the latest pushed commit.