Can AI understand legacy code?
LegacyCodeBench evaluates how well AI systems understand and document legacy COBOL. We test whether documentation is accurate enough to regenerate working code—not just whether it looks plausible.
Leaderboard
| # | Model | LCB Score0-100 | Structural Completeness |
Documentation Quality |
Behavioral Fidelity |
T1Basic | T4Enterprise | |
|---|---|---|---|---|---|---|---|---|
| Loading results... | ||||||||
What We Measure
If AI truly understands code, you should be able to recreate it from the documentation. That's what we test.
Structural Completeness 30%
Static analysis extracts all business rules, data structures, control flow, and external calls. We check if the AI documented each one.
Documentation Quality 20%
Algorithmic assessment of structure, readability, traceability, and abstraction level. No LLM-as-judge required.
Behavioral Fidelity 50%
Claim verification via execution (with Behavioral Specification Matching fallback for infrastructure). Documentation must accurately describe what the code actually does, verified through test generation for logic and pattern matching for dependencies.
Run the Benchmark
cd docker/cobol-sandbox && docker build -t legacycodebench-cobol:latest .
Without Docker, BF evaluation falls back to heuristic verification (claim quality analysis). For full accuracy, Docker is recommended.
results/ directory.