CodeSecBench

Corpus · Four tiers

Targets

Four classes of target: hand-crafted micro-fixtures (JS/TS and Python), hand-authored app-shaped repos, and 24 real-world AI app repositories. Each class measures something different — see methodology.

Section A
15

JS/TS fixtures

Section B
10

Python fixtures

Section C
4 / 6

App-shaped, 94 labels

Real-world
24

GitHub repos

Section A — JS/TS micro-fixtures

Hand-crafted JavaScript / TypeScript fixtures across six AI-app categories. Each fixture is either deliberately vulnerable or deliberately safe. Scored on /results against getdebug + gitleaks + trufflehog.

client-side-llm-key

5 fixtures (3 vuln, 2 safe)
  • client-side-llm-key/safe/express-backend-proxy safe
  • client-side-llm-key/safe/next-api-proxy safe
  • client-side-llm-key/vulnerable/direct-hardcode-browser vulnerable
  • client-side-llm-key/vulnerable/next-public-prefix vulnerable
  • client-side-llm-key/vulnerable/vite-import-meta vulnerable

pii-in-prompt

2 fixtures (1 vuln, 1 safe)
  • pii-in-prompt/safe/redact-to-display-fields safe
  • pii-in-prompt/vulnerable/stringify-user-object vulnerable

prompt-injection

2 fixtures (1 vuln, 1 safe)
  • prompt-injection/safe/role-separated-channels safe
  • prompt-injection/vulnerable/string-concat-prompt vulnerable

unbounded-stream

2 fixtures (1 vuln, 1 safe)
  • unbounded-stream/safe/abort-on-disconnect-and-timeout safe
  • unbounded-stream/vulnerable/no-abort-no-timeout vulnerable

unsafe-role-merge

2 fixtures (1 vuln, 1 safe)
  • unsafe-role-merge/safe/persona-allowlist-into-user-role safe
  • unsafe-role-merge/vulnerable/user-persona-into-system vulnerable

unsafe-tool-output

2 fixtures (1 vuln, 1 safe)
  • unsafe-tool-output/safe/validated-tool-output-allowlist safe
  • unsafe-tool-output/vulnerable/shell-exec-tool-output vulnerable

Section B — Python micro-fixtures

Hand-crafted Python AI-app fixtures across five categories. Scored on /results against getdebug + bandit + semgrep.

pii-in-prompt

2 fixtures (1 vuln, 1 safe)
  • pii-in-prompt/safe/redact-to-display-fields-py safe
  • pii-in-prompt/vulnerable/stringify-user-object-py vulnerable

prompt-injection

2 fixtures (1 vuln, 1 safe)
  • prompt-injection/safe/role-separated-channels-py safe
  • prompt-injection/vulnerable/string-concat-prompt-py vulnerable

unbounded-stream

2 fixtures (1 vuln, 1 safe)
  • unbounded-stream/safe/abort-on-disconnect-and-timeout-py safe
  • unbounded-stream/vulnerable/no-abort-no-timeout-py vulnerable

unsafe-role-merge

2 fixtures (1 vuln, 1 safe)
  • unsafe-role-merge/safe/persona-allowlist-into-user-role-py safe
  • unsafe-role-merge/vulnerable/user-persona-into-system-py vulnerable

unsafe-tool-output

2 fixtures (1 vuln, 1 safe)
  • unsafe-tool-output/safe/validated-tool-output-allowlist-py safe
  • unsafe-tool-output/vulnerable/shell-exec-tool-output-py vulnerable

Section C — App-shaped repositories

Six hand-authored AI-app repositories deliberately seeded with the six AI-app vulnerability categories at app density. Four baselined, two pending.

Pending (2)

  • cst-fastapi-tools in progress
  • cst-crewai-multiagent in progress

Real-world — 24 GitHub AI-app repositories

Public repos pulled in three sub-categories: a known-leaky baseline (high recall expected), popular references (high precision expected, near-zero false positives), and a sample of mid-popularity AI app templates. No span labels — these are scored by total finding count on /results.