Creating a simple configuration script More complex configuration script Understanding gem5 statistics and output Using the default configuration scripts

Part 2 - Modifying/Extending

Setting up your development environment Creating a very simple SimObject Debugging gem5 Event-driven programming Adding parameters to SimObjects and more events

authors: Jason Lowe-Power (modified by Saugata Ghose)

More complex config for gem5 v25.1

In the previous section, we learned the basics of setting up a Python configuration script for use with gem5. In this section, we will learn more about the components we used previously, as well as how to use other components in the gem5 standard library to set up a simulation.

gem5 stdlib File Structure

The gem5 stdlib is located in src/python/gem5/. Of interest here are the components and prebuilt folders:

gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/processors

gem5/src/python/gem5/prebuilt
----/demo
----/riscvmatched

The components folder contains components with which you can build systems. The prebuilt folder contains various prebuilt systems, including demo systems for the X86, Arm, and RISC-V isas, and riscvmatched, which is a model of SiFive Unmatched.

gem5/src/python/gem5/components
----/boards
    ----/simple
    ----/arm_board
    ----/riscv_board
    ----/x86_board
----/cachehierarchies
----/memory
----/processors

Boards are what components plug into. The SimpleBoard has SE mode only, the ArmBoard has FS mode only, and X86Board and RiscvBoard have both FS and SE mode.

gem5/src/python/gem5/components
----/boards
----/cachehierarchies
    ----/chi
    ----/classic
    ----/ruby
----/memory
----/processors

Cache hierarchy components have a fixed interface to processors and memory.

Ruby: detailed cache coherence and interconnect
CHI: Arm CHI-based protocol implemented in Ruby
Classic caches: Hierarchy of crossbars with inflexible coherence

As of gem5 v24.1, it is possible to use any Ruby cache coherence protocol with the ALL gem5 build. This is the build included in pre-compiled binaries.

gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
    ----/single_channel
    ----/multi_channel
    ----/dramsim
    ----/dramsys
    ----/hbm
----/processors

The memory directory contains pre-configured (LP)DDR3/4/5 DIMMs. Single and multi channel memory systems are available. There is integration with DRAMSim and DRAMSys, which while not needed for accuracy, is useful for comparisons. The hbm directory is an HBM stack.

gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/processors
    ----/generators
    ----/simple
    ----/switchable

The processors directory mostly contains configurable processors to build off of.

Generators create synthetic traffic, but act like processors. They have linear, random, and more interesting patterns.

Simple processors only have default parameters and one ISA.

Switchable processors allow you to change processor types during simulation.

More on processors

Processors are made up of cores. Cores have a “BaseCPU” as a member. This is the actual CPU model. Processor is what interfaces with CacheHierarchy and Board Processors are organized, structured sets of cores. They define how cores connect with each other and with outside components and the board though standard interface.

gem5 has three (or four or five) different processor models

They are as follows:

CPUTypes.TIMING: A simple in-order CPU model This is a “single cycle” CPU. Each instruction takes the time to fetch and executes immediately. Memory operations take the latency of the memory system. OK for doing memory-centric studies, but not good for most research.

CPUTypes.O3: An out-of-order CPU model Highly detailed model based on the Alpha 21264. Has ROB, physical registers, LSQ, etc. Don’t use SimpleProcessor if you want to configure this.

CPUTypes.MINOR: An in-order core model A high-performance in-order core model. Configurable four-stage pipeline Don’t use SimpleProcessor if you want to configure this.

CPUTypes.ATOMIC: Used in “atomic” mode (more later) CPUTypes.KVM: This is covered in detail in the 2024 gem5 bootcamp.

Full System (FS) vs. Syscall Emulation (SE) Mode

gem5 can run in two different modes called “syscall emulation” and “full system” or SE and FS modes. In full system mode, gem5 emulates the entire hardware system and runs an unmodified kernel. Full system mode is similar to running a virtual machine.

Syscall emulation mode, on the other hand, does not emulate all of the devices in a system and focuses on simulating the CPU and memory system. Syscall emulation is much easier to configure since you are not required to instantiate all of the hardware devices required in a real system. However, syscall emulation only emulates Linux system calls, and thus only models user-mode code.

If you do not need to model the operating system for your research questions, and you want extra performance, you should use SE mode. However, if you need high fidelity modeling of the system, or OS interaction like page table walks are important, then you should use FS mode.

SE mode relays application syscalls to the host OS. This means we don’t need to simulate an OS for applications to run. In addition, we can access host resources such as files of libraries to dynamically link in. For class, we will stick to SE mode.

Don’t treat SE mode as “FS but faster”: You must understand what you’re simulating and whether it will impact results. Not all syscalls will ever be implemented: We’d love to have all the syscalls implemented but Linux changes rapidly. We try to cover common use-cases but we can’t cover everything. If a Syscall is missing, you can implement it, ignore it, or use FS mode. Binaries with elevated privileges do not work in SE mode: If you’re running a binary that requires elevated privileges, you’ll need to run it in FS mode.

FS mode does everything SE mode does (and more!) but can take longer to get to the region of interest. You have to wait for the OS to boot each time (unless you accelerate the simulation).

However, as SE mode doesn’t simulate the OS, you risk missing important events triggered via syscalls, I/O, or the operating system, which may mean your simulated system doesn’t properly reflect the real system.

Think through what SE mode is doing and if it’s right for your use-case. If in doubt, use FS mode. It’s (generally) not worth the risk using SE mode if you’re not sure.

Full Boot Example (Do Not Implement)

For an example of a configuration file that runs the entire boot of Ubuntu 24.04 on an X86 system, see the gem5 stdlib documentation. Of note is that we need to define an exit event handler in order to get through the entire boot:

def exit_event_handler():
    print("First exit: kernel booted")
    yield False  # gem5 is now executing systemd startup
    print("Second exit: Started `after_boot.sh` script")
    # The after_boot.sh script is executed after the kernel and systemd have
    # booted.
    # Here we switch the CPU type to Timing.
    print("Switching to Timing CPU")
    processor.switch()
    yield False  # gem5 is now executing the `after_boot.sh` script
    print("Third exit: Finished `after_boot.sh` script")
    # The after_boot.sh script will run a script if it is passed via
    # m5 readfile. This is the last exit event before the simulation exits.
    yield True

simulator = Simulator(
    board=board,
    on_exit_event={
        ExitEvent.EXIT: exit_event_handler(),
    },
)

At the first exit event, the generator yields False to continue the simulation. At the second exit event, the generator switches the CPUs, then yields False again. At the third exit event, it yields True to end the simulation.

There are various types of exit events. The Simulator has default behavior for these events, but they can be overridden.

ExitEvent.EXIT
ExitEvent.CHECKPOINT
ExitEvent.FAIL
ExitEvent.SWITCHCPU
ExitEvent.WORKBEGIN
ExitEvent.WORKEND
ExitEvent.USER_INTERRUPT
ExitEvent.MAX_TICK

An aside on FS simulations:

Note that FS simulations take a long time; like “1 hour to load the kernel” long time! There are ways to “fast-forward” a simulation and then resume the detailed simulation at the interesting point, but these are beyond the scope of this chapter.

Key idea: The Simulator object controls simulation

To place our idea of gem5:

models (or SimObjects) are the fine-grained objects that are connected together in Python scripts to form a simulation. components are the coarse-grained objects that are connected defined as a set of configured models in Python scripts to form and delivered as part of the Standard Library The standard library allows users to specify a board and specify the properties of the board by specify the components that are connected to it. The Simulator takes a board and launches the simulation and gives an API which allows for control of the simulation: specifying the simulation stopping and restarting condition, replacing components “on the fly”, defining when the simulation should stop and start, etc. See src/python/gem5/simulate/simulator.py for the Simulator source.

Simulator parameters are as follows:

board: The Board to simulate (required) full_system: Whether to simulate a full system (default: False, can be inferred from the board, not needed specified in most cases) on_exit_event: A complex data structure that allows you to control the simulation. The simulator exits for many reasons, this allows you to customize what happens. We just saw an example. checkpoint_path: If we’re restoring from a checkpoint, this is the path to the checkpoint. More on checkpoints later. id: An optional name for this simulation. Used in multisim. More on this in the future.

Some useful functions are below:

run(): Run the simulation get/set_max_ticks(max_tick): Set the absolute tick to stop simulation. Generates a MAX_TICK exit event that can be handled. schedule_max_insts(inst_number): Set the number of instructions to run before stopping. Generates a MAX_INSTS exit event that can be handled. Note that if running multiple cores, this happens if any core reaches this number of instructions. get_stats(): Get the statistics from the simulation. Returns a dictionary of statistics.

See src/python/gem5/simulate/simulator.py for more details.

Creating new standard library components

The gem5 standard library is designed around extension and encapsulation, not parametarization. If you want to create a component with different parameters, extend using object-oriented semantics.

We will now create a new component. We will specialize/extend the “BaseCPUProcessor” to create an ARM processor with a singular out-of-order core.

Let’s create a new configuration file:

touch configs/tutorial/part1/components.py

First, let’s add our imports to this new config file:

from gem5.components.boards.simple_board import SimpleBoard
from gem5.components.cachehierarchies.classic.private_l1_shared_l2_cache_hierarchy import (
    PrivateL1SharedL2CacheHierarchy,
)
from gem5.components.memory.single_channel import SingleChannelDDR4_2400
from gem5.resources.resource import obtain_resource
from gem5.simulate.simulator import Simulator
from gem5.isas import ISA

from gem5.components.processors.base_cpu_core import BaseCPUCore
from gem5.components.processors.base_cpu_processor import BaseCPUProcessor

from m5.objects import ArmO3CPU
from m5.objects import TournamentBP

Next, let’s make a new subclass to specialize the core’s parameters:

class MyOutOfOrderCore(BaseCPUCore):
    def __init__(self, width, rob_size, num_int_regs, num_fp_regs):
        super().__init__(ArmO3CPU(), ISA.ARM)
        self.core.fetchWidth = width
        self.core.decodeWidth = width
        self.core.renameWidth = width
        self.core.issueWidth = width
        self.core.wbWidth = width
        self.core.commitWidth = width

        self.core.numROBEntries = rob_size

        self.core.numPhysIntRegs = num_int_regs
        self.core.numPhysFloatRegs = num_fp_regs

        self.core.branchPred = TournamentBP()

        self.core.LQEntries = 128
        self.core.SQEntries = 128

Next, let’s make a processor using this core. The BaseCPUProcessor assumes a list of cores that are BaseCPUCores. We’ll just make one core and pass the parameters to it:

class MyOutOfOrderProcessor(BaseCPUProcessor):
    def __init__(self, width, rob_size, num_int_regs, num_fp_regs):
        cores = [MyOutOfOrderCore(width, rob_size, num_int_regs, num_fp_regs)]
        super().__init__(cores)

Next, let’s use these components to set up a processor for the simulation:

my_ooo_processor = MyOutOfOrderProcessor(
    width=8, rob_size=192, num_int_regs=256, num_fp_regs=256
)

Finally, let’s set up the rest of the simulation:

main_memory = SingleChannelDDR4_2400(size="2GB")

cache_hierarchy = PrivateL1SharedL2CacheHierarchy(l1d_size="64KiB", l1i_size="64KiB", l2_size="8MiB")

board = SimpleBoard(
    processor=my_ooo_processor,
    memory=main_memory,
    cache_hierarchy=cache_hierarchy,
    clk_freq="3GHz",
)

board.set_workload(obtain_resource("arm-gapbs-bfs-run"))

simulator = Simulator(board)
simulator.run()

You can now run this simulation with the following command:

./build/ARM/gem5.opt configs/tutorial/part1/components.py