Documentation

This site is static. All execution happens via the CLI, similar to SWE-bench. Use the commands below to run LegacyCodeBench locally.

Installation

git clone https://github.com/kalmantic/legacycodebench.git
cd legacycodebench
pip install -e .
legacycodebench --help

Requires Python 3.10+, Git, and Docker (for verification).

Core CLI Commands

Load Datasets

legacycodebench load-datasets

Create Tasks

legacycodebench create-tasks

Run a Model

legacycodebench run-ai --model gpt-4o

Full Evaluation

legacycodebench evaluate

Interactive Mode

legacycodebench interactive

Reference Documentation Workflow

Reference docs are created by COBOL experts using CLI templates:

python scripts/create_reference_template.py \
--task-id LCB-DOC-001 \
--expert-id expert1

Experts complete the template, inter-annotator agreement is measured, and merged references are stored under references/.

Troubleshooting

OpenAI quota or 5xx errors?

Wait a few minutes or use legacycodebench run-ai --model claude-sonnet-4. The CLI falls back to mock responses when APIs are unavailable.

Leaderboard not printing?

Run legacycodebench leaderboard --print. Ensure results/leaderboard.json exists (generated by evaluate).

Need new tasks?

Edit legacycodebench/task_generator.py and rerun create-tasks. Intelligent selection ensures PRD-aligned coverage.