What is LegacyCodeBench?

LegacyCodeBench is the benchmark designed to evaluate how well AI systems can understand and document decades-old legacy software, especially COBOL.

Modernization fails not because conversion is hard, but because the business rules, the intent, the quiet rules are buried inside the legacy code.

You can’t modernize what you don’t understand. Yet no one measures whether AI actually understands these systems. That’s why LegacyCodeBench tests whether AI can accurately extract and explain that knowledge.

What LegacyCodeBench Measures

LegacyCodeBench evaluates two core capabilities:

  1. Documentation Quality — Can AI explain the program clearly?
  2. Program Understanding — Can AI extract the structure of the system?

Why Understanding Matters

Modernization projects often rush into COBOL→Java translation, automated refactoring, or AI code generation without first understanding:

This lack of understanding leads to failed migration.

Documentation Tasks

Explain business purpose, rules, edge cases, and data structures for legacy programs.

Understanding Tasks

Extract dependency graphs, business rules, and data flows. Scored via F1 precision/recall.

Evaluation

50% documentation + 50% understanding. Weighted metrics with expert validation.

Methodology Highlights