Datasets

LegacyCodeBench ships with a curated collection of open source, synthetic, and anonymized COBOL systems. All downloads are available via GitHub and mirrored in the CLI.

legacycodebench load-datasets

Compatible with Windows, macOS, and Linux. Requires Git and Python 3.10+.

Available Datasets

AWS Card Demo

Banking · 2.4K LOC License: Apache-2.0

Card authorization workload with CICS, IMS, VSAM, and DB2 components. Includes copybooks and JCL jobs.

Checksum: 835c1a…

Woodgrove Legacy

Finance · 3.1K LOC License: MIT

Hybrid ATM + 3270 banking application. Includes transaction, customer care, and ATM sub-systems.

Checksum: f19c4b…

Rocket Bank Demo

Core Banking · 4.6K LOC License: BSD-3

Mainframe banking workload with COBOL, BMS maps, JCL, and VSAM datasets. Mirrors real enterprise layouts.

Checksum: a51b09…

Task Statistics

15 Tasks

8 documentation + 7 understanding, calibrated for difficulty.

LOC Range

Tasks include files between 500 and 2,000 lines to ensure rich business logic.

Multi-file Coverage

Tasks include copybooks, subprograms, and sample I/O artifacts when required.

Dataset Structure

datasets/
├── aws-carddemo/
│ ├── cbl/
│ ├── cpy/
│ ├── jcl/
│ └── data/
├── az-legacy/
└── rocket-bank/

Each dataset retains original directory layout to preserve context (copybooks, JCL jobs, data files).

Dataset Availability

If datasets fail to load, confirm Git is installed and rerun legacycodebench load-datasets.