Commit f8b7adc
committed
feat: add reflection layer for task verification
This adds a "judge" layer that verifies if the agent completed a task
correctly. After each turn, if reflection is enabled, it:
1. Collects the initial task, recent tool calls, and final result
2. Sends this context to a judge model for evaluation
3. If incomplete, provides feedback and forces the agent to continue
4. Limits to 3 reflection attempts to prevent infinite loops
Enable with: --enable reflection or [features].reflection = true
reflection
test: add reflection layer integration test for Azure OpenAI
Add integration tests that verify the reflection layer works correctly
with Azure OpenAI. Tests create hello.py and test_hello.py, run pytest,
and verify the reflection layer evaluates task completion.
Also fix missing wiremock imports in view_image.rs tests.
test: make reflection test model configurable via AZURE_OPENAI_MODEL
Defaults to gpt-5-mini if not set.
docs: update reflection layer documentation
- Document how reflection layer works
- Add configuration instructions
- Update test running instructions with AZURE_OPENAI_MODEL env var
test: add SWE-bench style eval suite for reflection layer
Add evaluation tests inspired by SWE-bench to measure the impact of
the reflection layer on coding task performance. Tests include:
- Task 1: Off-by-one errors in array processing
- Task 2: String logic errors (palindrome, word count)
- Task 3: Missing edge case handling
Each task can be run with or without reflection to compare results.
Includes eval_summary test that runs all tasks and reports comparison.
docs: add eval suite documentation to reflection.md
Document the SWE-bench style evaluation tests, including:
- Task descriptions and bug types
- Commands to run individual and comparative tests
- Sample output showing reflection layer improvement
Update docs
feat(reflection): implement judge_model parameter and improve error detection
- Add model_override support to ModelClient for judge model selection
- Add max_attempts field to ReflectionContext (removes hardcoded constant)
- Add sophisticated output_indicates_error() with 30+ error patterns
- Exclude false positives like "error handling", "no errors", etc.
- Update tests for new ReflectionContext signature
1. Protocol (codex-rs/protocol/src/protocol.rs)
- Added ReflectionVerdictEvent struct with fields: completed, confidence, reasoning, feedback, attempt, max_attempts
- Added ReflectionVerdict variant to EventMsg enum
2. Core (codex-rs/core/src/codex.rs)
- Added import for ReflectionVerdictEvent
- Emit ReflectionVerdict event right after getting the verdict from the judge model (line ~2253)
3. Rollout policy (codex-rs/core/src/rollout/policy.rs)
- Added ReflectionVerdict to persisted events (so it shows in rollout files)
4. TUI (codex-rs/tui/src/)
- Added new_reflection_verdict() function in history_cell.rs
- Added on_reflection_verdict() handler in chatwidget.rs
5. TUI2 (codex-rs/tui2/src/) - same changes as TUI
6. Exec (codex-rs/exec/src/event_processor_with_human_output.rs)
- Added reflection verdict display for CLI exec mode
7. MCP Server (codex-rs/mcp-server/src/codex_tool_runner.rs)
- Added ReflectionVerdict to the match arm for event handling
What You'll See Now
When reflection runs, you'll see output like:
On success:
✓ reflection: Task completed (confidence: 95%)
The agent successfully created the hello package with tests...
On incomplete (will retry):
⟳ reflection: Task incomplete - attempt 1/3 (confidence: 40%)
Reasoning: Tests were not run after code changes
Feedback: Please run the tests to verify the implementation works
To test, run codex with your task and you should now see the reflection verdict at the end!
feat(reflection): add JSON schema for structured verdict output
- Add verdict_json_schema() to ensure judge model returns valid JSON
- Use output_schema in reflection prompt for structured outputs
- Add demo1 Python hello world app with tests1 parent bef36f4 commit f8b7adc
File tree
26 files changed
+2602
-5
lines changed- codex-rs
- core
- src
- config
- rollout
- tests/suite
- exec/src
- mcp-server/src
- protocol/src
- tui2/src
- tui/src
- demo1
- hello
- tests
- docs
26 files changed
+2602
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
88 | 90 | | |
89 | 91 | | |
90 | 92 | | |
| 93 | + | |
91 | 94 | | |
92 | 95 | | |
93 | 96 | | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
94 | 106 | | |
95 | 107 | | |
96 | 108 | | |
| |||
294 | 306 | | |
295 | 307 | | |
296 | 308 | | |
297 | | - | |
| 309 | + | |
298 | 310 | | |
299 | | - | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
300 | 316 | | |
301 | 317 | | |
302 | 318 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
| 118 | + | |
117 | 119 | | |
118 | 120 | | |
119 | 121 | | |
| |||
371 | 373 | | |
372 | 374 | | |
373 | 375 | | |
| 376 | + | |
374 | 377 | | |
375 | 378 | | |
376 | 379 | | |
| |||
536 | 539 | | |
537 | 540 | | |
538 | 541 | | |
| 542 | + | |
539 | 543 | | |
540 | 544 | | |
541 | 545 | | |
| |||
2087 | 2091 | | |
2088 | 2092 | | |
2089 | 2093 | | |
| 2094 | + | |
2090 | 2095 | | |
2091 | 2096 | | |
2092 | 2097 | | |
| |||
2175 | 2180 | | |
2176 | 2181 | | |
2177 | 2182 | | |
| 2183 | + | |
| 2184 | + | |
| 2185 | + | |
2178 | 2186 | | |
2179 | 2187 | | |
2180 | 2188 | | |
| |||
2192 | 2200 | | |
2193 | 2201 | | |
2194 | 2202 | | |
| 2203 | + | |
| 2204 | + | |
| 2205 | + | |
| 2206 | + | |
| 2207 | + | |
2195 | 2208 | | |
2196 | 2209 | | |
2197 | 2210 | | |
| |||
2255 | 2268 | | |
2256 | 2269 | | |
2257 | 2270 | | |
2258 | | - | |
| 2271 | + | |
| 2272 | + | |
| 2273 | + | |
| 2274 | + | |
| 2275 | + | |
| 2276 | + | |
| 2277 | + | |
| 2278 | + | |
| 2279 | + | |
| 2280 | + | |
| 2281 | + | |
| 2282 | + | |
| 2283 | + | |
| 2284 | + | |
| 2285 | + | |
| 2286 | + | |
| 2287 | + | |
| 2288 | + | |
| 2289 | + | |
| 2290 | + | |
| 2291 | + | |
| 2292 | + | |
| 2293 | + | |
| 2294 | + | |
| 2295 | + | |
| 2296 | + | |
| 2297 | + | |
| 2298 | + | |
| 2299 | + | |
| 2300 | + | |
| 2301 | + | |
| 2302 | + | |
| 2303 | + | |
| 2304 | + | |
| 2305 | + | |
| 2306 | + | |
| 2307 | + | |
| 2308 | + | |
| 2309 | + | |
| 2310 | + | |
| 2311 | + | |
| 2312 | + | |
| 2313 | + | |
| 2314 | + | |
| 2315 | + | |
| 2316 | + | |
| 2317 | + | |
| 2318 | + | |
| 2319 | + | |
| 2320 | + | |
| 2321 | + | |
| 2322 | + | |
| 2323 | + | |
| 2324 | + | |
| 2325 | + | |
| 2326 | + | |
| 2327 | + | |
| 2328 | + | |
| 2329 | + | |
| 2330 | + | |
| 2331 | + | |
| 2332 | + | |
| 2333 | + | |
| 2334 | + | |
| 2335 | + | |
| 2336 | + | |
| 2337 | + | |
| 2338 | + | |
| 2339 | + | |
| 2340 | + | |
| 2341 | + | |
| 2342 | + | |
| 2343 | + | |
| 2344 | + | |
| 2345 | + | |
| 2346 | + | |
| 2347 | + | |
| 2348 | + | |
| 2349 | + | |
| 2350 | + | |
| 2351 | + | |
| 2352 | + | |
| 2353 | + | |
| 2354 | + | |
| 2355 | + | |
| 2356 | + | |
| 2357 | + | |
| 2358 | + | |
| 2359 | + | |
2259 | 2360 | | |
2260 | 2361 | | |
2261 | 2362 | | |
| |||
2292 | 2393 | | |
2293 | 2394 | | |
2294 | 2395 | | |
| 2396 | + | |
| 2397 | + | |
| 2398 | + | |
| 2399 | + | |
| 2400 | + | |
| 2401 | + | |
| 2402 | + | |
| 2403 | + | |
| 2404 | + | |
| 2405 | + | |
| 2406 | + | |
| 2407 | + | |
| 2408 | + | |
| 2409 | + | |
| 2410 | + | |
| 2411 | + | |
| 2412 | + | |
| 2413 | + | |
| 2414 | + | |
| 2415 | + | |
| 2416 | + | |
2295 | 2417 | | |
2296 | 2418 | | |
2297 | 2419 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| |||
274 | 276 | | |
275 | 277 | | |
276 | 278 | | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
277 | 282 | | |
278 | 283 | | |
279 | 284 | | |
| |||
666 | 671 | | |
667 | 672 | | |
668 | 673 | | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
669 | 677 | | |
670 | 678 | | |
671 | 679 | | |
| |||
1221 | 1229 | | |
1222 | 1230 | | |
1223 | 1231 | | |
| 1232 | + | |
1224 | 1233 | | |
1225 | 1234 | | |
1226 | 1235 | | |
| |||
2983 | 2992 | | |
2984 | 2993 | | |
2985 | 2994 | | |
| 2995 | + | |
2986 | 2996 | | |
2987 | 2997 | | |
2988 | 2998 | | |
| |||
3058 | 3068 | | |
3059 | 3069 | | |
3060 | 3070 | | |
| 3071 | + | |
3061 | 3072 | | |
3062 | 3073 | | |
3063 | 3074 | | |
| |||
3148 | 3159 | | |
3149 | 3160 | | |
3150 | 3161 | | |
| 3162 | + | |
3151 | 3163 | | |
3152 | 3164 | | |
3153 | 3165 | | |
| |||
3224 | 3236 | | |
3225 | 3237 | | |
3226 | 3238 | | |
| 3239 | + | |
3227 | 3240 | | |
3228 | 3241 | | |
3229 | 3242 | | |
| |||
0 commit comments