CBP2025 Simulator Framework
Simulator
- The simulator reads instructions from a trace in .gz format directly.
- Each instruction from the trace may generate several “pieces”. The number of pieces depends on the number of 64-bit output values and the number of output registers. For instance :
- A vector instruction that writes 128-bit registers will cause two pieces to be generated.
- A scalar instruction that writes two general-purpose registers will cause two pieces to be generated.
- A vector instruction that writes two 128-bit registers will cause four pieces to be generated.
- A scalar load instruction with base update will produce two pieces: 1 load and 1 ALU for the base-register update.
- The simulator schedules instructions according to register dependencies, memory dependencies (note: oracle memory disambiguation is modeled), instruction execution latencies, and structural constraints:
- Window size
- Superscalar width
- Execution lanes
- 3-level cache hierarchy
- Contestants are provided with a hook to predict conditional branches.
- All other branches are predicted perfectly.
- The simulator reports instruction count, cycle count, conditional branch mispredictions, and other relevant measurements, both for the full simulation and post-warmup simulation
- The first half of the simulation is considered warmup.
- The second half is what’s used for measuring results.
- Also see the README for more detailed information.
Training Traces
- There are 105 training traces.
- A trace is between 10 million and 130 million instructions.
- To untar the traces: tar -xvf foo.tar.xz
- Contestants may also download the traces using gdown:
- pip install gdown
- gdown –folder //drive.google.com/drive/folders/10CL13RGDW3zn-Dx7L0ineRvl7EpRsZDW
- The CBP2025 organizers characterized data-dependent conditional branches in the training traces, which may inspire interesting directions for branch predictor design. Access the PDF file containing this characterization.