Environment: google-genai 2.10.0 · Python 3.13 · macOS · Gemini API (not Vertex) · model gemini-3.5-flash with the file_search tool.
Summary
When using File Search via the Interactions API, there is no way to inspect the full set of chunks the retriever returned and injected into the model's context. The retrieved chunk text is observable only indirectly, via model_output → file_citation.source annotations — i.e. only for chunks the model ends up citing. This blocks reliability/eval work and debugging "what context did the model actually see."
What's observable today
file_citation annotations expose source (chunk text), document_uri, file_name, custom_metadata, start_index/end_index. Only for cited chunks. (Side note: the source field appears undocumented — the public File Search docs still describe annotations as metadata-only.)
- The
file_search_result step contains only call_id, type, and an opaque base64 signature (~16KB, encrypted) — no readable retrieved content.
file_search_stores.documents.get/list return metadata only; download_media on a document returns 403 PERMISSION_DENIED.
- Token usage doesn't help:
total_input_tokens stays flat and total_tool_use_tokens is ~constant regardless of top_k, so the retrieved-context size isn't reflected anywhere.
Why it matters
Rigorous RAG faithfulness/groundedness evaluation must judge an answer against the actual retrieved context, not the full source document. Today we can only approximate it with the cited chunks. Empirically, at the default top_k the model cites every retrieved chunk (cited count tracks top_k 1:1 up to ~5), so cited ≈ retrieved in that regime — but this is inferred, not guaranteed, and it breaks down at higher top_k (e.g. top_k=10 → 8 cited, top_k=20 → 16 cited), where retrieved-but-uncited chunks become invisible. (Same concern raised re: server-side context being a black box: https://x.com/_philschmid/status/2069458803074986376)
Repro
store = "fileSearchStores/<your-store>"
it = client.interactions.create(
model="gemini-3.5-flash",
input="<a question your store can answer>",
tools=[{"type": "file_search", "file_search_store_names": [store], "top_k": 20}],
store=True,
)
full = client.interactions.get(it.id)
# file_search_result step: only call_id / type / signature — no retrieved chunks.
# Retrieved chunk text is reachable only via model_output file_citation.source,
# i.e. only the CITED chunks; with a large top_k the cited count is < top_k, so
# the retrieved-but-uncited chunks are invisible.
Request
Surface the resolved File Search retrieval on the file_search_result step (or via an opt-in flag): the full set of retrieved chunks — text + document/source ref + relevance score, including uncited ones. This makes the retrieved context inspectable for evaluation and debugging, and lets developers trim/manage it as suggested in the GA guidance.
Environment:
google-genai2.10.0 · Python 3.13 · macOS · Gemini API (not Vertex) · modelgemini-3.5-flashwith thefile_searchtool.Summary
When using File Search via the Interactions API, there is no way to inspect the full set of chunks the retriever returned and injected into the model's context. The retrieved chunk text is observable only indirectly, via
model_output→file_citation.sourceannotations — i.e. only for chunks the model ends up citing. This blocks reliability/eval work and debugging "what context did the model actually see."What's observable today
file_citationannotations exposesource(chunk text),document_uri,file_name,custom_metadata,start_index/end_index. Only for cited chunks. (Side note: thesourcefield appears undocumented — the public File Search docs still describe annotations as metadata-only.)file_search_resultstep contains onlycall_id,type, and an opaque base64signature(~16KB, encrypted) — no readable retrieved content.file_search_stores.documents.get/listreturn metadata only;download_mediaon a document returns403 PERMISSION_DENIED.total_input_tokensstays flat andtotal_tool_use_tokensis ~constant regardless oftop_k, so the retrieved-context size isn't reflected anywhere.Why it matters
Rigorous RAG faithfulness/groundedness evaluation must judge an answer against the actual retrieved context, not the full source document. Today we can only approximate it with the cited chunks. Empirically, at the default
top_kthe model cites every retrieved chunk (cited count trackstop_k1:1 up to ~5), so cited ≈ retrieved in that regime — but this is inferred, not guaranteed, and it breaks down at highertop_k(e.g.top_k=10→ 8 cited,top_k=20→ 16 cited), where retrieved-but-uncited chunks become invisible. (Same concern raised re: server-side context being a black box: https://x.com/_philschmid/status/2069458803074986376)Repro
Request
Surface the resolved File Search retrieval on the
file_search_resultstep (or via an opt-in flag): the full set of retrieved chunks — text + document/source ref + relevance score, including uncited ones. This makes the retrieved context inspectable for evaluation and debugging, and lets developers trim/manage it as suggested in the GA guidance.