Skip to content

[refactor](fe) Refactor external metadata cache with MetaCacheEntry#65126

Draft
wenzhenghu wants to merge 7 commits into
apache:masterfrom
HYDCP:wzh/external-meta-cache-refactor
Draft

[refactor](fe) Refactor external metadata cache with MetaCacheEntry#65126
wenzhenghu wants to merge 7 commits into
apache:masterfrom
HYDCP:wzh/external-meta-cache-refactor

Conversation

@wenzhenghu

@wenzhenghu wenzhenghu commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary:
This PR refactors the external metadata cache to use MetaCacheEntry-based publication and invalidation, and follows up with fixes from review and validation. It aligns the negative lookup semantics across init and replay paths, keeps replay lookup non-blocking, preserves the expected lower-case mode 0/1/2 behavior, and adds regression and manual validation coverage for refresh non-blocking and mutable name miss refresh scenarios.

Release note

Improve consistency and validation coverage for external metadata cache refresh and name lookup behavior.

Check List (For Author)

  • Test
    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor or code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason

Manual test details:

  • Validate mutable name miss refresh semantics on a real Doris instance with tools/manual_external_meta_cache_regression/verify_name_miss_refresh_mutable.sh
  • Validate tableNames refresh non-blocking on a real Doris instance with tools/manual_external_meta_cache_regression/verify_table_names_refresh_non_blocking.sh
  • Validate schema refresh non-blocking on a real Doris instance with tools/manual_external_meta_cache_regression/verify_schema_refresh_non_blocking.sh
  • Validate lower-case mode 0, 1 and 2 SQL paths on a real Doris instance

Unit test details:

  • ./run-fe-ut.sh --run org.apache.doris.datasource.metacache.MetaCacheEntryTest

  • Behavior changed:

    • No.
    • Yes. External metadata cache name lookup, invalidation coordination, and refresh behavior are aligned with the refactor design and follow-up fixes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Attached Documents

- fix MetaCacheEntry publication races for manual miss load and async refresh
- reduce MetaCacheEntry stripe overhead by using single-key stripes for names caches and a configurable default for object caches
- make NameCacheValue snapshots effectively immutable against external Pair mutation
- align ExternalDatabase table object cache construction with the normal non-removal-listener path
- add FE unit tests for replay exact-hit, mode 2 lookup, names-only invalidation, id map updates, system database paths, and atomic cache publication
- update the JDBC regression case after removing the obsolete manual miss load config
- source database id map updates from db.getId() in updateDatabaseCache
- cover the migrated tableNames refresh path in the JDBC regression case
- reuse invalidateKey() for invalidateAll() and invalidateIf() so bulk invalidation shares the same per-key publication coordination
- simplify asyncReload() to a best-effort generation-aware refresh path
- strengthen MetaCacheEntry FE unit tests for second-generation checks, async refresh completion, bulk invalidate combinations, and direct executor refresh
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: The negative lookup behavior of the external meta cache refactor needed further alignment across the init and replay paths. The intended semantics are: a cold names miss should load the names snapshot only once; a hot-snapshot miss should reload names only when enable_external_meta_cache_name_miss_refresh is enabled; replay misses should remain non-blocking and must not perform synchronous load-through; and the miss-refresh switch must take effect dynamically on an existing catalog or database instance instead of only at construction time. This change aligns the init object-loader paths, preserves cache-only replay miss semantics on hot snapshots, and updates the replay comments to reflect that hot cache hits may still schedule asynchronous refresh-after-write without blocking the caller.

### Release note

None

### Check List (For Author)

- Test: Unit Test
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: A local development design draft was mistakenly committed into the repository during the external meta cache refactor. The file is only used as an implementation-time guidance document and is not part of the Doris source tree. This change removes it from version control while keeping the local copy intact, so future code changes stay focused on actual product source files.

### Release note

None

### Check List (For Author)

- Test: No need to test (repository hygiene change only)
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu wenzhenghu marked this pull request as draft July 1, 2026 14:20
@wenzhenghu

Copy link
Copy Markdown
Contributor Author

run buildall

@wenzhenghu

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29809 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2231ed676f5c15ca4fc6f8f67ec2adf5b4923005, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17613	4059	4032	4032
q2	2036	315	193	193
q3	10284	1450	817	817
q4	4686	469	346	346
q5	7515	839	567	567
q6	179	166	140	140
q7	790	824	636	636
q8	9376	1623	1709	1623
q9	5571	4408	4432	4408
q10	6810	1768	1520	1520
q11	510	343	321	321
q12	702	545	430	430
q13	18138	3377	2757	2757
q14	269	255	238	238
q15	q16	784	776	704	704
q17	950	1000	999	999
q18	6800	5834	5606	5606
q19	1289	1325	1043	1043
q20	808	655	563	563
q21	5990	2720	2569	2569
q22	440	357	297	297
Total cold run time: 101540 ms
Total hot run time: 29809 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4365	4288	4296	4288
q2	287	319	208	208
q3	4665	4964	4485	4485
q4	2074	2145	1371	1371
q5	4451	4306	4342	4306
q6	234	180	129	129
q7	1753	1992	1833	1833
q8	2576	2195	2305	2195
q9	8043	8144	7763	7763
q10	4864	4773	4378	4378
q11	597	412	388	388
q12	766	772	564	564
q13	3207	3608	2953	2953
q14	295	313	287	287
q15	q16	706	738	668	668
q17	1346	1326	1480	1326
q18	7975	7314	7196	7196
q19	1192	1136	1079	1079
q20	2204	2181	1937	1937
q21	5244	4577	4401	4401
q22	561	465	409	409
Total cold run time: 57405 ms
Total hot run time: 52164 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173774 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2231ed676f5c15ca4fc6f8f67ec2adf5b4923005, data reload: false

query5	4335	647	495	495
query6	462	236	207	207
query7	4867	569	337	337
query8	339	188	169	169
query9	8747	4006	4024	4006
query10	508	344	306	306
query11	5961	2321	2169	2169
query12	164	103	101	101
query13	1260	598	459	459
query14	6271	5334	4986	4986
query14_1	4273	4307	4310	4307
query15	212	205	184	184
query16	1009	469	482	469
query17	1029	742	585	585
query18	2488	476	355	355
query19	214	195	163	163
query20	111	110	109	109
query21	242	157	132	132
query22	13665	13595	13441	13441
query23	17416	16531	16135	16135
query23_1	16299	16203	16245	16203
query24	7618	1749	1264	1264
query24_1	1340	1291	1293	1291
query25	578	460	395	395
query26	1341	338	212	212
query27	2619	551	385	385
query28	4517	2048	2018	2018
query29	1105	622	494	494
query30	336	264	228	228
query31	1123	1105	974	974
query32	143	66	63	63
query33	556	333	252	252
query34	1166	1197	663	663
query35	770	790	677	677
query36	1407	1418	1219	1219
query37	198	99	90	90
query38	1872	1704	1652	1652
query39	927	913	883	883
query39_1	884	883	889	883
query40	306	164	139	139
query41	69	63	64	63
query42	96	93	94	93
query43	327	320	284	284
query44	1446	776	760	760
query45	206	204	181	181
query46	1103	1185	714	714
query47	2396	2335	2253	2253
query48	428	406	299	299
query49	594	431	321	321
query50	1119	419	326	326
query51	4415	4501	4325	4325
query52	86	85	76	76
query53	265	276	208	208
query54	288	235	207	207
query55	76	70	67	67
query56	313	294	291	291
query57	1434	1414	1331	1331
query58	271	255	293	255
query59	1560	1683	1431	1431
query60	298	267	271	267
query61	157	155	170	155
query62	704	638	600	600
query63	248	202	211	202
query64	2528	767	604	604
query65	4869	4810	4751	4751
query66	1824	525	429	429
query67	29833	29581	29468	29468
query68	3382	1526	1038	1038
query69	422	308	271	271
query70	1069	973	975	973
query71	370	335	304	304
query72	2928	2647	2365	2365
query73	827	790	434	434
query74	5127	5040	4781	4781
query75	2618	2582	2250	2250
query76	2330	1179	788	788
query77	350	385	287	287
query78	12331	12345	11826	11826
query79	1449	1108	782	782
query80	762	557	471	471
query81	469	334	288	288
query82	878	158	121	121
query83	381	324	295	295
query84	325	168	140	140
query85	1003	621	535	535
query86	437	314	282	282
query87	1830	1821	1767	1767
query88	3717	2798	2764	2764
query89	472	409	364	364
query90	1927	205	198	198
query91	204	192	160	160
query92	64	58	59	58
query93	1554	1501	1014	1014
query94	624	349	316	316
query95	772	486	479	479
query96	1077	791	340	340
query97	2675	2702	2554	2554
query98	220	206	203	203
query99	1166	1144	1021	1021
Total cold run time: 259861 ms
Total hot run time: 173774 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2231ed676f5c15ca4fc6f8f67ec2adf5b4923005, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.06	0.05
query3	0.26	0.13	0.13
query4	1.61	0.16	0.14
query5	0.24	0.23	0.22
query6	1.32	1.05	1.07
query7	0.04	0.00	0.00
query8	0.06	0.04	0.04
query9	0.38	0.30	0.32
query10	0.55	0.60	0.55
query11	0.20	0.15	0.14
query12	0.19	0.15	0.15
query13	0.47	0.48	0.47
query14	1.02	1.01	1.02
query15	0.60	0.60	0.59
query16	0.33	0.33	0.32
query17	1.08	1.09	1.16
query18	0.22	0.20	0.21
query19	2.01	1.92	1.89
query20	0.02	0.01	0.01
query21	15.44	0.22	0.15
query22	4.85	0.06	0.05
query23	16.13	0.33	0.12
query24	2.98	0.45	0.34
query25	0.13	0.05	0.04
query26	0.73	0.20	0.16
query27	0.07	0.04	0.03
query28	3.58	0.95	0.56
query29	12.46	4.31	3.43
query30	0.30	0.14	0.15
query31	2.77	0.63	0.31
query32	3.23	0.60	0.49
query33	3.26	3.18	3.19
query34	15.51	4.26	3.56
query35	3.54	3.54	3.60
query36	0.56	0.41	0.44
query37	0.10	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.19	0.15	0.15
query41	0.11	0.04	0.03
query42	0.04	0.02	0.03
query43	0.04	0.04	0.03
Total cold run time: 96.82 s
Total hot run time: 25.22 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 81.87% (307/375) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 81.87% (307/375) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29829 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6c7d1d903a6f137cbc1835cda462d23e54f4eb67, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17665	4099	4028	4028
q2	2029	319	203	203
q3	10275	1432	830	830
q4	4716	471	337	337
q5	7591	839	569	569
q6	184	170	137	137
q7	774	839	650	650
q8	10274	1626	1656	1626
q9	5943	4422	4452	4422
q10	6841	1824	1538	1538
q11	509	342	314	314
q12	744	565	433	433
q13	18087	3371	2734	2734
q14	266	269	241	241
q15	q16	790	780	703	703
q17	1010	1022	978	978
q18	6951	5851	5619	5619
q19	1181	1253	1075	1075
q20	788	638	597	597
q21	5705	2662	2493	2493
q22	439	366	302	302
Total cold run time: 102762 ms
Total hot run time: 29829 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4333	4318	4305	4305
q2	290	318	209	209
q3	4566	4993	4401	4401
q4	2073	2146	1397	1397
q5	4428	4304	4280	4280
q6	241	180	127	127
q7	2004	1936	1667	1667
q8	2453	2131	2072	2072
q9	7800	7769	7664	7664
q10	4738	4674	4272	4272
q11	563	424	382	382
q12	939	797	554	554
q13	3296	3522	3000	3000
q14	304	312	302	302
q15	q16	720	724	642	642
q17	1352	1322	1329	1322
q18	8043	7308	6880	6880
q19	1116	1089	1116	1089
q20	2186	2221	1960	1960
q21	5297	4633	4482	4482
q22	508	450	408	408
Total cold run time: 57250 ms
Total hot run time: 51415 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173258 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6c7d1d903a6f137cbc1835cda462d23e54f4eb67, data reload: false

query5	4316	639	504	504
query6	473	227	201	201
query7	4834	565	316	316
query8	331	188	181	181
query9	8775	4037	4003	4003
query10	434	357	294	294
query11	5881	2343	2163	2163
query12	157	106	108	106
query13	1265	638	428	428
query14	6301	5342	5002	5002
query14_1	4305	4290	4309	4290
query15	208	206	183	183
query16	1030	484	438	438
query17	1144	712	603	603
query18	2440	488	358	358
query19	208	194	157	157
query20	112	109	109	109
query21	231	172	150	150
query22	13721	13575	13312	13312
query23	17464	16517	16106	16106
query23_1	16406	16233	16290	16233
query24	7530	1765	1317	1317
query24_1	1336	1306	1293	1293
query25	583	463	412	412
query26	1336	371	221	221
query27	2568	596	377	377
query28	4494	2048	2041	2041
query29	1084	628	502	502
query30	333	263	230	230
query31	1116	1099	1042	1042
query32	99	59	58	58
query33	522	326	239	239
query34	1164	1173	653	653
query35	756	782	649	649
query36	1416	1385	1248	1248
query37	155	102	93	93
query38	1877	1690	1645	1645
query39	952	919	892	892
query39_1	878	878	903	878
query40	249	162	135	135
query41	69	67	64	64
query42	97	97	95	95
query43	324	323	288	288
query44	1412	764	761	761
query45	206	194	184	184
query46	1129	1187	747	747
query47	2378	2333	2165	2165
query48	422	392	271	271
query49	585	415	319	319
query50	1071	433	330	330
query51	4460	4477	4306	4306
query52	87	84	75	75
query53	260	273	214	214
query54	276	236	207	207
query55	74	71	71	71
query56	289	312	287	287
query57	1421	1416	1294	1294
query58	299	261	249	249
query59	1534	1604	1395	1395
query60	302	269	251	251
query61	158	143	149	143
query62	696	647	619	619
query63	247	209	198	198
query64	2518	777	631	631
query65	4873	4792	4734	4734
query66	1834	505	391	391
query67	29600	29533	29399	29399
query68	3167	1565	1056	1056
query69	408	291	268	268
query70	1083	973	962	962
query71	369	341	293	293
query72	2893	2581	2332	2332
query73	857	789	447	447
query74	5097	4943	4764	4764
query75	2611	2565	2225	2225
query76	2318	1192	781	781
query77	348	381	288	288
query78	12378	12499	11796	11796
query79	1388	1187	749	749
query80	828	527	462	462
query81	502	319	288	288
query82	569	162	131	131
query83	393	323	297	297
query84	279	165	130	130
query85	954	614	515	515
query86	396	309	297	297
query87	1842	1824	1730	1730
query88	3684	2767	2764	2764
query89	453	408	359	359
query90	1808	203	195	195
query91	201	217	163	163
query92	66	58	58	58
query93	1682	1595	977	977
query94	618	356	288	288
query95	778	497	491	491
query96	1087	840	347	347
query97	2676	2712	2617	2617
query98	217	206	199	199
query99	1181	1164	1034	1034
Total cold run time: 258682 ms
Total hot run time: 173258 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6c7d1d903a6f137cbc1835cda462d23e54f4eb67, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.05	0.05
query3	0.26	0.15	0.13
query4	1.61	0.13	0.13
query5	0.24	0.23	0.23
query6	1.27	1.07	1.04
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.37	0.31	0.33
query10	0.60	0.55	0.53
query11	0.19	0.14	0.14
query12	0.19	0.15	0.15
query13	0.46	0.47	0.47
query14	1.00	1.01	1.00
query15	0.61	0.59	0.59
query16	0.31	0.32	0.32
query17	1.11	1.10	1.09
query18	0.23	0.21	0.21
query19	2.06	1.98	1.93
query20	0.02	0.01	0.02
query21	15.44	0.23	0.14
query22	4.76	0.06	0.05
query23	16.13	0.30	0.12
query24	3.00	0.45	0.33
query25	0.10	0.05	0.04
query26	0.73	0.20	0.15
query27	0.05	0.03	0.04
query28	3.56	0.90	0.56
query29	12.47	4.34	3.48
query30	0.28	0.15	0.15
query31	2.77	0.59	0.31
query32	3.23	0.59	0.48
query33	3.16	3.27	3.22
query34	15.63	4.23	3.51
query35	3.55	3.53	3.52
query36	0.56	0.43	0.41
query37	0.10	0.07	0.07
query38	0.05	0.04	0.03
query39	0.04	0.03	0.03
query40	0.17	0.16	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 96.68 s
Total hot run time: 25.27 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 47.20% (177/375) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants