Skip to content

docs(benchmarks): add Independent benchmarks section#460

Merged
DietrichGebert merged 1 commit into
mainfrom
docs/independent-benchmarks
Jun 30, 2026
Merged

docs(benchmarks): add Independent benchmarks section#460
DietrichGebert merged 1 commit into
mainfrom
docs/independent-benchmarks

Conversation

@DietrichGebert

Copy link
Copy Markdown
Owner

Adds an Independent benchmarks section to benchmarks/README.md: a curated table of third-party runs with method, headline result, and date, framed clearly as independent corroboration rather than official figures.

Two rows, both plugin-installed runs only (pasted-prompt runs approximate full and skew results, so they're excluded):

  • KuldeepB19 — installed plugin, 24 tasks, 480 builds, Opus 4.8, graded by executing the code. ~44% less code, no correctness/security regression.
  • RicardoCostaGit — multi-turn agentic via Cursor SDK, isolated worktrees, rule toggled per run. Leaner output, higher process cost on completion-forced tasks.

Gives the recurring 'did anyone independent check this?' ask a canonical home, and credits/links both authors (both granted permission in-thread).

Closes #121
Closes #236

Note: overlaps #122 (which credits/links Ricardo only). This supersedes it by also adding Kuldeep and the framing/curation rule. Suggest closing #122 in favor of this, or I can rework it the other way if you prefer.

Curated, plugin-installed third-party runs (KuldeepB19, RicardoCostaGit) with
method, headline, and date, clearly framed as independent corroboration not
official figures. Gives the recurring 'did anyone independent check this?' ask
a canonical home.

Closes #121
Closes #236
@DietrichGebert DietrichGebert merged commit 16f6cbf into main Jun 30, 2026
1 check passed
@DietrichGebert DietrichGebert deleted the docs/independent-benchmarks branch June 30, 2026 00:04
github-actions Bot pushed a commit to harshav167/ponytail that referenced this pull request Jun 30, 2026
)

Curated, plugin-installed third-party runs (KuldeepB19, RicardoCostaGit) with
method, headline, and date, clearly framed as independent corroboration not
official figures. Gives the recurring 'did anyone independent check this?' ask
a canonical home.

Closes DietrichGebert#121
Closes DietrichGebert#236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant