Section A · JS/TS fixtures
Recall, head-to-head
- getdebug
- 75%
- gitleaks
- 25%
- trufflehog
- 0%
Public benchmark · We maintain it
We grade getdebug, gitleaks, trufflehog, bandit, and semgrep on the same hand-crafted AI-app fixtures and the same real-world repositories. The corpus is MIT-licensed; the truth files are public; any tool can submit a result via PR. The full benchmark lives on its own neutral-org domain.
Live recall numbers from the corpus, head-to-head. Full per-category and per-target breakdowns at codesecbench.org/results.
Section A · JS/TS fixtures
Section B · Python fixtures
A benchmark that grades getdebug needs to live somewhere that isn't getdebug's marketing site. codesecbench.org/governance documents the multi-maintainer model that takes over the moment a second tool maintainer joins — mirroring how MLPerf, SPEC, and TPC operate. getdebug currently maintains the corpus and the harness; the methodology, the truth files, and the scorer are all MIT-licensed and re-runnable.
Maintain a SAST tool?
The corpus, the truth files, and the scorer go public once Tier C lands its final two repositories (cycles 5 and 6 in flight). Until then the methodology, results, and the SAST landscape are all browsable on codesecbench.org. Once the public release lands, this is where the "submit a tool" flow lives.