feat: auto-generate llms #2

0xHieu01 · 2025-11-17T09:19:16Z

This pull request implements the automation workflow of generating the llms.txt. Ensure llms.txt files stay up to date, detailed description with LLM enrichment and cost-efficient by updating only what changed.
The implementation includes

Adds scripts/generate_llms.ssh to generate llms.txt
Incremental mode:
Reads base commit from each file’s header; compares to origin/<default_branch>.
Only new/modified files call the LLM; removed files are dropped; unchanged lines are reused verbatim.
No-LLM compare runs update only headers and structural adds/removes.
Fast-path: if no file changes, just refreshes header commit/timestamp.
Output is grouped by directory; raw links pinned to clix-so/clix-flutter-sdk@main.
Atomic writes via staging file to prevent truncation; concise LLM prompts with logging.
GitHub Action Triggers:
Trigger automatically on release
GitHub Action will run script in scripts folder and create a Pull Request for updating llms.txt
Initial tested simulation and created the llms.txt by running script

Verify Accessible of links in llms.txt : Summary: total=37 ok=37 fail=0

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive repository inventory documenting Flutter SDK structure, modules, and components.
Chores
- Introduced automated workflow for generating and maintaining repository documentation index.

coderabbitai · 2025-11-17T09:19:24Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Introduces automated infrastructure for generating LLMS documentation index for Flutter SDK. Adds a GitHub Actions workflow to orchestrate generation, a Bash script to analyze and index repository contents with optional LLM enrichment, and an initial LLMS inventory file.

Changes

Cohort / File(s)	Change Summary
Workflow Automation `.github/workflows/generate-llms.yml`	New workflow triggered on `workflow_dispatch` and release events to automate LLMS generation with support for full or differential regeneration, branch creation, Git commits, and PR creation.
Generation Tooling `scripts/generate_llms.sh`	New Bash script for generating LLMS index with support for full scans or incremental updates, optional LLM-based description enrichment via OpenAI API, file filtering, and structured output formatting.
Output Documentation `llms.txt`	New repository-root documentation file providing comprehensive inventory of Flutter SDK contents including platforms, modules, bindings, and samples with high-level descriptions.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: auto-generate llms' directly summarizes the main change: introducing automation to generate llms.txt files. It is concise, clear, and accurately reflects the primary purpose of the PR.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/auto-generate-llms

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull Request Overview

This PR implements automated generation of llms.txt documentation files using LLM-powered descriptions. The script provides incremental updates to minimize API costs, atomic file writes for safety, and GitHub Actions integration triggered on releases.

Key changes:

Added scripts/generate_llms.sh with incremental diff-based LLM enrichment and surgical updates
Created GitHub Actions workflow (.github/workflows/generate-llms.yml) to auto-generate on release events
Generated initial llms.txt with file descriptions grouped by directory

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
scripts/generate_llms.sh	Main bash script implementing LLM-powered file description generation with incremental mode, GitHub API integration, and atomic writes
.github/workflows/generate-llms.yml	GitHub Actions workflow triggering on release or manual dispatch to run generation script and create PR
llms.txt	Generated output file containing organized file index with LLM-enriched descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/generate_llms.sh

llms.txt

.github/workflows/generate-llms.yml

Co-authored-by: Copilot <[email protected]>

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (4)

llms.txt (1)

19-22: Similar duplicate entries in iOS/platform section.

Messages.g.swift (lines 19 & 21) and ClixPlugin.swift (lines 20 & 22) are duplicated with slightly different descriptions. This reinforces the systematic duplication issue noted in the Android section.
scripts/generate_llms.sh (2)
111-111: Sed command in workflow (line 111 of .github/workflows/generate-llms.yml) lacks file existence check.

120-126: Case statement falls through without skipping excluded directories.

The case statement matches excluded directories on line 120-121 but doesn't skip them—it falls through to the find check on line 123, potentially adding excluded directories to the scan list anyway. This is the exact issue flagged in the past review.

Add continue after the empty command on line 121 to actually skip excluded directories:
  while IFS= read -r topdir; do
    case "$topdir" in
-     .git|.github|.vscode|build|.dart_tool|ios|android|gradle|.swiftpm) continue ;; # checked below by content
+     .git|.github|.vscode|build|.dart_tool|ios|android|gradle|.swiftpm) ;; # checked below by content
    esac
-   if find "$topdir" -type f $ -name "*.dart" -o -name "*.kt" -o -name "*.swift" $ -print -quit >/dev/null 2>&1; then
+   # FIXED: Only add if not in excluded list
+   if [[ ! "$topdir" =~ ^(.git|.github|.vscode|build|.dart_tool|ios|android|gradle|.swiftpm)$ ]] && \
+      find "$topdir" -type f $ -name "*.dart" -o -name "*.kt" -o -name "*.swift" $ -print -quit >/dev/null 2>&1; then
      dirs+=("$topdir")
    fi
Wait—upon closer inspection, the comment says "checked below by content", suggesting the excluded dirs should be added if they contain target files. If that's the intent, the logic is correct as-is and only needs clarification. However, the naming conflict between the top-level excluded directories (which have structure like ios/Pods) and the directories being checked (like ios/) suggests possible false matches. Please verify intent.
.github/workflows/generate-llms.yml (1)
107-114: Sed command on line 111 may fail silently on first run when llms.txt doesn't exist.

When llms.txt does not exist (first run), the sed command on line 111 will produce no output. The || true suppresses the error, but BASE remains empty. This is then used to construct COMPARE_RANGE="${BASE}...${HEAD_REF}", resulting in an invalid compare range like ...origin/main.

Add an explicit file existence check before attempting to extract the commit:
  # Read base SHA from existing file header
- BASE="$(sed -nE 's/^<!--[[:space:]]*commit:[[:space:]]*([0-9a-f]+).*/\1/p' llms.txt | head -1 || true)"
- [[ -z "${BASE}" ]] && BASE="${HEAD_REF}"
- COMPARE_RANGE="${BASE}...${HEAD_REF}"
+ if [[ -f llms.txt ]]; then
+   BASE="$(sed -nE 's/^<!--[[:space:]]*commit:[[:space:]]*([0-9a-f]+).*/\1/p' llms.txt | head -1)"
+ fi
+ BASE="${BASE:-${HEAD_REF}}"
+ COMPARE_RANGE="${BASE}...${HEAD_REF}"
This ensures the file is only parsed if it exists, and a sensible fallback (HEAD_REF) is used when the file is missing or commit header is not found.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea958db and 8770d21.

📒 Files selected for processing (3)

.github/workflows/generate-llms.yml (1 hunks)
llms.txt (1 hunks)
scripts/generate_llms.sh (1 hunks)

🧰 Additional context used

🪛 LanguageTool

llms.txt

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...token retrieval, and event tracking. - [Clix Config](https://raw.githubusercontent.c...