Skip to content

server: fix inaccurate WriteSQLRespDuration#69536

Open
ChangRui-Ryan wants to merge 2 commits into
pingcap:masterfrom
ChangRui-Ryan:changrui_resp_time
Open

server: fix inaccurate WriteSQLRespDuration#69536
ChangRui-Ryan wants to merge 2 commits into
pingcap:masterfrom
ChangRui-Ryan:changrui_resp_time

Conversation

@ChangRui-Ryan

@ChangRui-Ryan ChangRui-Ryan commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #64562

Problem Summary

WriteSQLRespDuration may be inaccurate for SELECT statements when writing the result set response fails. Some error paths in result set writing returned immediately after writePacket / writeEOF / related write operations failed, before adding the elapsed response write time to statement execution details.

What changed and how does it work

This PR updates the result set response writing paths to always record elapsed write time before returning, including failure paths. It adds small helper functions to start and accumulate WriteSQLRespDuration, then uses them in writeChunks and writeChunksWithFetchSize so SELECT result set writes and cursor fetch writes include time spent before a write error. A regression test simulates a failed row packet write for a normal SELECT and verifies the delayed write time is reflected in WriteSQLRespDuration.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes
    • Improved accuracy of response-time tracking for SQL result writing by sourcing statement execution details from request context.
    • Refined write-duration metrics for streamed result sets and fetch-size reads, ensuring correct segmentation across row writing, EOF markers, and related error paths.
    • Updated/added coverage to ensure write-duration metrics include failed row-write scenarios as expected (including delayed write failures).

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 30, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign qw4990 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: aa0d82bd-b284-44c5-9c49-22bcad2eaa8a

📥 Commits

Reviewing files that changed from the base of the PR and between 0fcdbc9 and 9f5374b.

📒 Files selected for processing (1)
  • pkg/server/conn.go

📝 Walkthrough

Walkthrough

This PR refactors WriteSQLRespDuration accounting in server connection handling. It adds shared context/timing helpers, splits response-write timing into smaller segments in result-set paths, and adds a test that checks failed writes are counted in the duration metric.

Changes

SQL response duration timing refactor

Layer / File(s) Summary
Context helpers and flush timing
pkg/server/conn.go
Adds stmtExecDetailsFromContext, beginWriteSQLRespDuration, and finishWriteSQLRespDuration, and updates flush to use them for WriteSQLRespDuration accumulation.
writeChunks timing segments
pkg/server/conn.go
writeChunks now reads stmtDetail from context and splits WriteSQLRespDuration across column metadata, row writes, and the final EOF write.
writeChunksWithFetchSize timing
pkg/server/conn.go
writeChunksWithFetchSize now reads stmtDetail from context and separates timing around the cursor loop, FetchNotifier.OnFetchReturned(), and writeEOF.
Failed-write duration test
pkg/server/conn_stmt_test.go
Adds a delayed writer test double and a result-set test that asserts WriteSQLRespDuration includes the configured failed write delay.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Poem

A rabbit timed each hop with care,
and split the pauses in the air.
From row to EOF, the clock runs true,
with failed writes counted through and through. 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately describes the main fix to WriteSQLRespDuration accuracy.
Description check ✅ Passed The description includes the required issue number, problem summary, change summary, tests, and release note section.
Linked Issues check ✅ Passed The code updates response-write timing and adds a regression test, matching the issue’s goal of accurately reporting Send response time.
Out of Scope Changes check ✅ Passed The changes stay focused on WriteSQLRespDuration timing and related tests, with no obvious unrelated scope creep.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/server/conn.go (2)

1281-1339: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Preserve delayed finalization on every dispatch-error exit.

Line 1339 runs after the io.EOF / result-undetermined early returns, so a delayed no-result statement can skip FinishExecuteStmt entirely on those exits. Also, writeError(ctx, err) uses the Run-loop context, while the per-statement execdetails context is created inside handleStmt, so error-packet write time may still not be charged before finalization. Move delayed finalization into the early-return paths and make the error response write use the statement execdetails context/state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 1281 - 1339, The dispatch-error handling in
conn.go can skip delayed statement finalization on the io.EOF and
result-undetermined early returns, so make sure FinishExecuteStmt-equivalent
delayed cleanup via cc.finishDelayedNoResultStmt is executed in those branches
too. Also, update the error-packet write path around cc.writeError and
handleStmt/execdetails so it uses the per-statement execdetails context/state
instead of the outer Run-loop ctx, ensuring write time is charged before
finalization.

2720-2754: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Do not include cursor fetch time in WriteSQLRespDuration.

Line 2720 starts timing before the fetch loop, but line 2739 can advance a lazy iterator and call rs.Next, so executor fetch time is accumulated as response-write time. Keep the timer scoped to row serialization / packet writes, matching writeChunks.

Proposed fix
-	start = beginWriteSQLRespDuration(stmtDetail)
-
 	iter := rs.GetRowIterator()
 	// send the rows to the client according to fetchSize.
 	for i := 0; i < fetchSize && iter.Current(ctx) != iter.End(); i++ {
 		row := iter.Current(ctx)
 
+		start = beginWriteSQLRespDuration(stmtDetail)
 		data = data[0:4]
 		data, err = column.DumpBinaryRow(data, rs.Columns(), row, cc.rsEncoder)
 		if err != nil {
 			addWriteSQLRespDuration(stmtDetail, start)
 			return err
@@
 			addWriteSQLRespDuration(stmtDetail, start)
 			return err
 		}
+		addWriteSQLRespDuration(stmtDetail, start)
 		writtenRows++
 
 		iter.Next(ctx)
 	}
 	if iter.Error() != nil {
-		addWriteSQLRespDuration(stmtDetail, start)
 		return iter.Error()
 	}
@@
 	// don't include the time consumed by `cl.OnFetchReturned()` in the `WriteSQLRespDuration`
-	addWriteSQLRespDuration(stmtDetail, start)
-
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 2720 - 2754, The COM_STMT_FETCH path in
`handleFetch` is counting cursor execution time in `WriteSQLRespDuration`
because `beginWriteSQLRespDuration(stmtDetail)` starts before the row loop and
`iter.Next(ctx)` may trigger lazy fetch work. Move the timing scope so it covers
only row serialization and `cc.writePacket` calls, and keep the final
`addWriteSQLRespDuration(stmtDetail, start)` updates aligned with the new scope,
matching the behavior in `writeChunks`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/server/conn.go`:
- Around line 2362-2367: The delayed finalization in
cc.handleFileTransInConn/finishDelayedNoResultStmt is dropping a write failure
when handled is false, because the else branch passes nil and clears
ExecStmtVarKey. Update the fallback path in the Run-loop flow so the actual err
from handleFileTransInConn is forwarded into finishDelayedNoResultStmt for both
handled and unhandled cases, preserving the no-result write error for later
summary/log handling.

---

Outside diff comments:
In `@pkg/server/conn.go`:
- Around line 1281-1339: The dispatch-error handling in conn.go can skip delayed
statement finalization on the io.EOF and result-undetermined early returns, so
make sure FinishExecuteStmt-equivalent delayed cleanup via
cc.finishDelayedNoResultStmt is executed in those branches too. Also, update the
error-packet write path around cc.writeError and handleStmt/execdetails so it
uses the per-statement execdetails context/state instead of the outer Run-loop
ctx, ensuring write time is charged before finalization.
- Around line 2720-2754: The COM_STMT_FETCH path in `handleFetch` is counting
cursor execution time in `WriteSQLRespDuration` because
`beginWriteSQLRespDuration(stmtDetail)` starts before the row loop and
`iter.Next(ctx)` may trigger lazy fetch work. Move the timing scope so it covers
only row serialization and `cc.writePacket` calls, and keep the final
`addWriteSQLRespDuration(stmtDetail, start)` updates aligned with the new scope,
matching the behavior in `writeChunks`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 67c5ab38-4f34-4544-b61a-dae7680f2191

📥 Commits

Reviewing files that changed from the base of the PR and between 1174be2 and 6fb290d.

📒 Files selected for processing (4)
  • pkg/server/conn.go
  • pkg/server/conn_stmt.go
  • pkg/server/conn_stmt_test.go
  • pkg/session/session.go

Comment thread pkg/server/conn.go
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.3324%. Comparing base (bbf8075) to head (9f5374b).
⚠️ Report is 31 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #69536        +/-   ##
================================================
- Coverage   76.3169%   74.3324%   -1.9846%     
================================================
  Files          2041       2045         +4     
  Lines        561123     578087     +16964     
================================================
+ Hits         428232     429706      +1474     
- Misses       131990     147836     +15846     
+ Partials        901        545       -356     
Flag Coverage Δ
integration 40.9323% <78.3783%> (+1.3042%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 47.1941% <ø> (-15.5570%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/server/conn.go (1)

2668-2725: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Scope WriteSQLRespDuration to the dump/write path only. With lazyRowIterator, the first Current(ctx) and each Next(ctx) can call rs.Next(ctx, chk), so the current window counts executor fetch time as response-write time. writeChunks keeps fetch outside that window; this path should do the same by timing only DumpBinaryRow/writePacket.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 2668 - 2725, `writeChunksWithFetchSize` is
counting lazy iterator fetch work inside `WriteSQLRespDuration` because
`iter.Current(ctx)` and `iter.Next(ctx)` may trigger `rs.Next(ctx, chk)`. Move
the timing window in `clientConn.writeChunksWithFetchSize` so it only covers the
`DumpBinaryRow` and `writePacket` path, matching `writeChunks` and excluding
executor fetch time; keep the existing EOF/notification handling outside that
measured span.
🧹 Nitpick comments (1)
pkg/server/conn_stmt_test.go (1)

219-224: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Document the magic failOnWrite: 4 value.

The buffer size of 1 forces byte-level flushing, so failOnWrite: 4 is implicitly tied to the exact wire-protocol byte layout (column info + row packet bytes) at the time this test was written. If packet-writing internals change later, this could silently target the wrong write (e.g., before column info completes) without an obvious failure signal. A short comment explaining what the 4th write corresponds to would make the test easier to maintain.

💡 Suggested comment
 	const delay = 50 * time.Millisecond
+	// failOnWrite=4 targets a byte write that occurs mid row-data write (after
+	// column info is flushed), so the failure happens during actual row
+	// transmission and WriteSQLRespDuration should include the delay.
 	c.pkt.SetBufWriter(bufio.NewWriterSize(&failedWriteResponseWriter{
 		delay:       delay,
 		failOnWrite: 4,
 	}, 1))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn_stmt_test.go` around lines 219 - 224, The test in
conn_stmt_test uses a magic failOnWrite value that is tied to the current packet
write sequence. Add a brief explanatory comment near the
failedWriteResponseWriter setup in conn_stmt_test clarifying what the 4th write
represents in the query response flow, so future changes to the wire protocol or
buffering behavior do not make the test’s intent unclear. Refer to the
c.Dispatch path and failedWriteResponseWriter setup when placing the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/server/conn.go`:
- Around line 2668-2725: `writeChunksWithFetchSize` is counting lazy iterator
fetch work inside `WriteSQLRespDuration` because `iter.Current(ctx)` and
`iter.Next(ctx)` may trigger `rs.Next(ctx, chk)`. Move the timing window in
`clientConn.writeChunksWithFetchSize` so it only covers the `DumpBinaryRow` and
`writePacket` path, matching `writeChunks` and excluding executor fetch time;
keep the existing EOF/notification handling outside that measured span.

---

Nitpick comments:
In `@pkg/server/conn_stmt_test.go`:
- Around line 219-224: The test in conn_stmt_test uses a magic failOnWrite value
that is tied to the current packet write sequence. Add a brief explanatory
comment near the failedWriteResponseWriter setup in conn_stmt_test clarifying
what the 4th write represents in the query response flow, so future changes to
the wire protocol or buffering behavior do not make the test’s intent unclear.
Refer to the c.Dispatch path and failedWriteResponseWriter setup when placing
the comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: add73d87-c4f4-4215-9e60-8144261c5988

📥 Commits

Reviewing files that changed from the base of the PR and between 9ca1c0c and 0fcdbc9.

📒 Files selected for processing (2)
  • pkg/server/conn.go
  • pkg/server/conn_stmt_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WriteSQLRespDuration is sometimes inaccurate

1 participant