Skip to content

Conversation

@mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Dec 26, 2025

What problem does this PR solve?

_clean_unused_rowset_metas could incorrectly remove rowset metas (and its delete_bitmap) that are still tracked in the _unused_rowsets map, which holds rowsets pending deletion.
The fix adds a check to skip these rowsets during cleanup, preventing premature removal of _unused_rowsets. Otherwise, if query a mow table with rowsets in _unused_rowsets map, and _clean_unused_rowset_metas delete delete_bitmap of these rowset, will get duplicated keys.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings December 26, 2025 06:38
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the storage engine where _clean_unused_rowset_metas could incorrectly remove metadata for rowsets that are still tracked in the _unused_rowsets map, which holds rowsets pending deletion. The fix adds a check to skip these rowsets during cleanup, preventing premature metadata removal.

Key Changes:

  • Added thread-safe check to skip rowsets in _unused_rowsets map during metadata cleanup
  • Enhanced debug code to test the interaction between unused rowset deletion and trash sweeping

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
be/src/olap/storage_engine.cpp Added check_rowset_id_in_unused_rowsets check at line 961 to prevent cleaning metadata for rowsets pending deletion in the _unused_rowsets map
be/src/olap/tablet.cpp Reformatted debug code and added start_trash_sweep call for testing the complete cleanup workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@deardeng deardeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Dec 26, 2025

run buildall

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 35391 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ca41413bb2b59259110f538e21d666ac11541da0, data reload: false

------ Round 1 ----------------------------------
q1	17626	4320	4112	4112
q2	2025	372	237	237
q3	10143	1340	775	775
q4	10241	933	327	327
q5	7535	2135	1918	1918
q6	190	170	138	138
q7	1018	875	720	720
q8	9357	1422	1167	1167
q9	7058	5284	5438	5284
q10	6793	2414	1996	1996
q11	536	343	318	318
q12	670	743	568	568
q13	17778	3688	3099	3099
q14	296	299	272	272
q15	595	520	522	520
q16	702	698	636	636
q17	727	827	510	510
q18	7455	7211	7187	7187
q19	928	981	612	612
q20	408	366	248	248
q21	4228	3898	3821	3821
q22	1050	1004	926	926
Total cold run time: 107359 ms
Total hot run time: 35391 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4171	4058	4102	4058
q2	334	396	313	313
q3	2156	2674	2325	2325
q4	1328	1739	1297	1297
q5	4230	4678	4710	4678
q6	224	182	130	130
q7	2061	1955	1806	1806
q8	2707	2528	2575	2528
q9	7718	7405	7462	7405
q10	3105	3286	2848	2848
q11	614	526	534	526
q12	726	771	631	631
q13	3667	3983	3229	3229
q14	300	322	290	290
q15	556	518	486	486
q16	676	722	631	631
q17	1153	1448	1362	1362
q18	8118	7727	7470	7470
q19	878	870	917	870
q20	2026	2062	2002	2002
q21	4979	4511	4144	4144
q22	1102	1024	997	997
Total cold run time: 52829 ms
Total hot run time: 50026 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 180589 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ca41413bb2b59259110f538e21d666ac11541da0, data reload: false

query5	4427	593	453	453
query6	329	237	215	215
query7	4219	463	280	280
query8	315	274	275	274
query9	8763	2571	2548	2548
query10	496	390	358	358
query11	15336	15365	15004	15004
query12	190	120	121	120
query13	1276	502	392	392
query14	5511	3068	2750	2750
query14_1	2665	2646	2639	2639
query15	210	199	177	177
query16	922	484	458	458
query17	1156	737	621	621
query18	2451	452	359	359
query19	233	240	220	220
query20	120	124	119	119
query21	227	139	120	120
query22	3919	4066	4047	4047
query23	16619	16224	16173	16173
query23_1	16043	16356	16183	16183
query24	7361	1663	1238	1238
query24_1	1227	1250	1258	1250
query25	592	510	461	461
query26	1247	277	175	175
query27	2745	475	313	313
query28	4482	2151	2133	2133
query29	857	593	475	475
query30	320	248	222	222
query31	853	727	631	631
query32	84	77	72	72
query33	559	345	309	309
query34	893	899	548	548
query35	781	804	741	741
query36	860	963	810	810
query37	126	90	82	82
query38	2960	2950	2924	2924
query39	759	744	717	717
query39_1	691	697	709	697
query40	226	150	128	128
query41	68	64	63	63
query42	105	100	101	100
query43	418	440	410	410
query44	1326	727	743	727
query45	191	192	186	186
query46	865	1025	614	614
query47	1692	1706	1697	1697
query48	309	325	247	247
query49	657	436	347	347
query50	663	298	222	222
query51	3826	3889	3829	3829
query52	115	115	105	105
query53	324	355	290	290
query54	287	256	251	251
query55	76	81	73	73
query56	297	307	297	297
query57	1177	1175	1118	1118
query58	276	257	257	257
query59	2428	2459	2520	2459
query60	327	307	299	299
query61	204	167	164	164
query62	706	720	665	665
query63	328	296	297	296
query64	5089	1291	998	998
query65	4009	3958	3946	3946
query66	1454	450	316	316
query67	15501	15094	15011	15011
query68	6981	983	716	716
query69	496	347	312	312
query70	1051	962	951	951
query71	368	295	284	284
query72	6151	5022	5058	5022
query73	678	603	310	310
query74	9029	8985	8751	8751
query75	3183	3182	2804	2804
query76	3843	1141	747	747
query77	533	405	298	298
query78	9614	9472	8759	8759
query79	1539	910	612	612
query80	741	703	561	561
query81	538	269	237	237
query82	408	137	101	101
query83	266	262	243	243
query84	258	117	119	117
query85	936	515	473	473
query86	406	280	281	280
query87	3134	3182	3081	3081
query88	3286	2290	2295	2290
query89	465	417	392	392
query90	2186	160	156	156
query91	179	172	146	146
query92	87	68	66	66
query93	1762	901	539	539
query94	477	312	276	276
query95	584	384	313	313
query96	593	458	209	209
query97	2260	2303	2206	2206
query98	207	196	194	194
query99	1313	1325	1307	1307
Total cold run time: 259255 ms
Total hot run time: 180589 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ca41413bb2b59259110f538e21d666ac11541da0, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.06	0.05
query3	0.26	0.09	0.10
query4	1.61	0.11	0.12
query5	0.27	0.25	0.25
query6	1.20	0.64	0.64
query7	0.03	0.02	0.02
query8	0.06	0.04	0.04
query9	0.56	0.50	0.49
query10	0.56	0.55	0.55
query11	0.15	0.11	0.11
query12	0.15	0.11	0.11
query13	0.62	0.61	0.59
query14	0.98	1.00	0.99
query15	0.80	0.79	0.80
query16	0.40	0.42	0.42
query17	1.00	1.04	1.04
query18	0.23	0.22	0.22
query19	1.86	1.86	1.86
query20	0.02	0.01	0.01
query21	15.44	0.27	0.14
query22	5.05	0.05	0.05
query23	16.12	0.27	0.10
query24	1.12	0.88	0.26
query25	0.11	0.09	0.06
query26	0.15	0.14	0.13
query27	0.05	0.06	0.08
query28	5.14	1.24	1.03
query29	12.63	4.00	3.20
query30	0.29	0.14	0.12
query31	2.81	0.64	0.40
query32	3.23	0.53	0.47
query33	2.96	3.01	2.99
query34	17.06	5.19	4.53
query35	4.56	4.60	4.56
query36	0.65	0.50	0.48
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.16	0.16	0.13
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 98.84 s
Total hot run time: 27.35 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 66.67% (4/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.40% (18952/35489)
Line Coverage 39.27% (175803/447678)
Region Coverage 33.87% (136157/402009)
Branch Coverage 34.78% (58761/168940)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.21% (25051/34694)
Line Coverage 58.92% (263048/446486)
Region Coverage 53.88% (218847/406174)
Branch Coverage 55.34% (93810/169504)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 26, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 1f6d752 into apache:master Dec 26, 2025
43 of 44 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 26, 2025
…in _unused_rowsets map (#59390)

### What problem does this PR solve?

`_clean_unused_rowset_metas` could incorrectly remove rowset metas (and
its delete_bitmap) that are still tracked in the `_unused_rowsets` map,
which holds rowsets pending deletion.
The fix adds a check to skip these rowsets during cleanup, preventing
premature removal of `_unused_rowsets`. Otherwise, if query a mow table
with rowsets in `_unused_rowsets` map, and `_clean_unused_rowset_metas`
delete delete_bitmap of these rowset, will get duplicated keys.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
github-actions bot pushed a commit that referenced this pull request Dec 26, 2025
…in _unused_rowsets map (#59390)

### What problem does this PR solve?

`_clean_unused_rowset_metas` could incorrectly remove rowset metas (and
its delete_bitmap) that are still tracked in the `_unused_rowsets` map,
which holds rowsets pending deletion.
The fix adds a check to skip these rowsets during cleanup, preventing
premature removal of `_unused_rowsets`. Otherwise, if query a mow table
with rowsets in `_unused_rowsets` map, and `_clean_unused_rowset_metas`
delete delete_bitmap of these rowset, will get duplicated keys.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants