[ 200 OK ][ ANALYZE ][ .SARIF ][ FIX-PR ]

Public benchmark · We maintain it

CodeSecBench — the AI-app SAST benchmark

We grade getdebug, gitleaks, trufflehog, bandit, and semgrep on the same hand-crafted AI-app fixtures and the same real-world repositories. The corpus is MIT-licensed; the truth files are public; any tool can submit a result via PR. The full benchmark lives on its own neutral-org domain.

Headline numbers

Live recall numbers from the corpus, head-to-head. Full per-category and per-target breakdowns at codesecbench.org/results.

Section A · JS/TS fixtures

Recall, head-to-head

getdebug
75%
gitleaks
25%
trufflehog
0%

Section B · Python fixtures

Recall, head-to-head

getdebug
100%
bandit
20%
semgrep
20%

Why a separate domain

A benchmark that grades getdebug needs to live somewhere that isn't getdebug's marketing site. codesecbench.org/governance documents the multi-maintainer model that takes over the moment a second tool maintainer joins — mirroring how MLPerf, SPEC, and TPC operate. getdebug currently maintains the corpus and the harness; the methodology, the truth files, and the scorer are all MIT-licensed and re-runnable.

What's on codesecbench.org

  • Landscape — every SAST tool we know of, by category, with AI-app coverage marked honestly
  • Results — head-to-head leaderboard across 4 sections (JS/TS fixtures, Python fixtures, app-shaped corpus, 24 real-world repos)
  • Targets — every fixture and target repo catalogued
  • Methodology — span labels, hallucination control, category disagreement scoring, borderline rows
  • Governance — neutrality model, open maintainer seats, contested-label adjudication
  • Blog — calibration cycle writeups, methodology decisions, tool submissions

Maintain a SAST tool?

Run it against the public corpus.

The corpus, the truth files, and the scorer go public once Tier C lands its final two repositories (cycles 5 and 6 in flight). Until then the methodology, results, and the SAST landscape are all browsable on codesecbench.org. Once the public release lands, this is where the "submit a tool" flow lives.