====== Flexus Hands-on Session ====== This section of the SimFlex tutorial will cover a hands-on example of using Flexus. This part of the tutorial covers three high-level steps: (1) booting a simulated system in Simics-only, (2) importing a test application into the simulated system and creating a Simics checkpoint, and (3) using the flexus tools along with the sampling concepts to measure the newly-generated workload. In this part of the tutorial, you will only need a Windows-based laptop with Remote Desktop Connection. All work will be performed through remote windows machines at CMU. Split into groups of 2 or 3. Each group will be assigned a number. To connect, first open up the Remote Desktop Connection tool(Start –> All Programs → Accessories → Communications → Remote Desktop Connection) and connect to the following address: scirocco.ece.cmu.edu:300X where X is your designated group number (so for example, if you are in the protoflex3 group, connect to scirocco.ece.cmu.edu:3003). The username/password is: pf_user/protoflex. This username/password pair is common for all remaining steps of this tutorial, unless otherwise noted. From the desktop open the SSH Secure Shell Client and click on “Quick Connect”. In the “Host Name” field type protoflexN.scotch.ece.cmu, where N is your group number (e.g. protoflex1.scotch.ece.cmu) and in the “User Name” field type pf_user. Hit connect and enter the password “protoflex”. Now open “SSH Secure File Transfer” and repeat the same steps one more time. =====1. Preparing a Simics checkpoint===== In this section, we will give a short tutorial on what is needed to set up and create your own Simics checkpoints. A Simics checkpoint is simply a snapshot of simulated machine state in the form of one or more CPU's worth of registers, a physical main memory image, and device state. Checkpoints allow us to stage and position our workloads without having to reboot the target machine over and over. When running Simics, the simulation of a target machine can be interrupted at any moment in order to save a checkpoint.

SKIP - Installing Solaris in a simulated machine

We will omit this step, because installing an OS on the simulated target machine is an (uninteresting) process that can take many hours. We have provided you with a fresh disk image of the target system (prior to boot). ++++SKIP - Expand/Collapse| - The first step is to acquire the Solaris 10 CDROM ISO images, which are freely available for download from http://www.sun.com/software/solaris/get.jsp. The specific edition of Solaris 10 we have tested with is: **Solaris 10 8/07, labeled as sol-10-u4-ga-sparc**. Note: you MUST download the **CDROM** ISO images since the Simics scripts do not handle the DVD version. As of this writing, the 5 CDROM ISO image files that you should expect to have are: ''sol-10-u4-ga-sparc-{v1, v2, v3, v4, v5}.iso''. - The Simics package includes scripts to automate the installation of Solaris within a simulated target machine. These scripts can be found under the ''/simics-3.0.22/targets'' directory. The specific target system that we use for our configuration of ProtoFlex is the **serengeti** target. To make our lives easy, copy all of the ISO images downloaded from the previous step into this folder. - Within the ''/simics-3.0.22/targets/serengeti'' folder, there are a large number of scripts that automate the Solaris installation process. To customize our target machine configuration, first open up and edit the ''serengeti-6800-system.include'' file. - Near the top of the file, you will notice some high-level options for your simulated target machine. Specifically, we are interested in **the number of CPUs** as well as the **number of megs per CPU**. At the minimum, Solaris 10 requires at least 256MB of memory. With respect to the FPGA/board we are using, we are currently limited to only 4 CPUs and at most 1.9GB of simulated main memory. - **VERY IMPORTANT STEP (DO NOT SKIP!)**: At the top of ''serengeti-6800-system.include'', change ''$cpu_class = "ultrasparc-iii-plus"'' to ''$cpu_class = "ultrasparc-iii"'' - For speeding up installation purposes, set the number of CPUs to **1** and the amount of main memory per CPU to **512MB**. These parameters can be changed at a later time after the OS installation completes and the machine is rebooted. - Once you have completed this step, open up and edit the ''abisko-sol10-cd-install1.simics'' file. You should then set the path to the first CD image by setting the line: $cdrom_path = "sol-10-u4-ga-sparc-v1.iso" - Start the simics installation by typing ../../scripts/start-simics -x abisko-sol10-cd-install1.simics and wait for the entire process to complete. A terminal from the target machine should appear and show you the progress of the OS installation. - During the installation, you may be asked to answer a few questions manually (since the Simics scripts are slightly out-of-date). You will get one question about NFS (just hit ESC-2 twice) and another on setting the root password (put whatever you want). You will also be asked to enable/disable remote services (select 'no'). - The entire installation may take several hours, depending on the performance of your host PC workstation. - When the script terminates, the installation from the first CD is finished, and Solaris will have tried to reboot the system. You will need to exit Simics at this point by hitting ''CTRL-C'' at the Simics console, and typing ''quit''. - Edit the ''abisko-sol10-cd-install2.simics'' script and set the proper ''$cdrom_path'' as before. Now run the second script by typing: ''../../scripts/start-simics -x abisko-sol10-cd-install2.simics''. During the 2nd script, you may be asked for additional input, such as the preferred keyboard type. At some point, you will be asked to select the media type. Choose 'CD/DVD'. - When the second script is finished, the Solaris installation will have tried to reboot the system. Like before, hit ''CTRL-C'' and type ''quit'' at the Simics console. - Start the third script by typing ''../../scripts/start-simics -x abisko-sol10-cd-install3.simics''. These should only take a few minutes to complete. Afterwards, you will be presented with a login prompt. Type ''root'' and the password you specified earlier. - The machine will shut down momentarily and at this point, a large Simics disk image called **abisko-sol10-install.disk** and a state file called **abisko-sol10.state** will have been created. After the machine shuts down, type ''quit'' at the Simics console. ++++ ==== Boot Solaris and Save Checkpoint ==== ++++CLICK - Expand/Collapse| **To create our first Simics checkpoint, we will need to boot a simulated target system and save out a new checkpoint. For your convenience, we have already created a disk image that contains an freshly installed copy of Solaris 10.** - Navigate over to ''/home/pf_user/simics-3.0.22/targets/serengeti'' on the primary PC - Open and edit the ''abisko-common.simics'' file and add the following lines near the top: $os = solaris10 $num_cpus = 4 $megs_per_cpu = 256 - These parameters allow us to configure the target machine at boot time according to our preferences. The design we will be demonstrating will be a 4-CPU system with a total of 256MB. - For the purposes of this tutorial, create a new folder ''~/checkpoints''. We will store all Simics-generated checkpoints in this directory. - Once you have edited the parameters, type ''../../scripts/start-simics -x abisko-common.simics'' to boot our machine. - A simulated terminal should appear and show the Solaris 10 boot process. Type c to begin simulating at the console. - Once you reach the interactive terminal, login using the username "root" and the password "cmu". Once you are at the simulated command prompt, we are now ready to save our first checkpoint. - Hit ''CTRL-C'' in the Simics console, and type ''write-configuration ~/checkpoints/after-boot-4cpu''. - Type ''quit'' to exit out of Simics. - To load up your checkpoint again, type ''../../scripts/start-simics''. Once you are at the Simics console, type ''read-configuration ~/checkpoints/after-boot-4cpu''. You should see your simulated terminal re-appear where you last left it. \\ ++++ ======2. Preparing a test workload====== ++++CLICK - Expand/Collapse| In this section, we will cover the basics necessary to prepare a simple multithreaded microbenchmark for executing within the target system. This process of moving the workload into the target machine and executing until a breakpoint is usually carried out entirely within a Simics-only environment. The microbenchmark that we will be providing is a simple pthreads example that can be downloaded from {{:documentation:microbenchmarks.tgz|}}. Within the tarball, there are two source files: ''counter.c'', ''spinlock.c''. These two files have already been precompiled using a SPARC compiler and can be executed within the target machine. In the next step, we will implement the steps needed to move these files into the simulated target system. First, you will need to acquire the {{:documentation:simicsfs.iso.zip|simicsfs.iso}} file, which contains a cdrom image of the Simics files to facilitate target-to-host file transfers. - Start up a checkpoint that was saved out from the previous section (e.g., ~/simics-3.0.22/scripts/start-simics ~/checkpoints/after-boot-4cpu). At the Simics console, type ''new-file-cdrom simicsfs.iso'' (make sure you started simics in the direcotry that contains the simicsfs.iso file, otherwise type in the full path of the simicsfs.iso file) - Then type ''cd0.insert iso0'' - Type ''c'' to begin simulating at the console. You may need to wait a few minutes until the simulated cdrom drive has loaded the image. - Once you have done this, navigate to ''/cdrom/cdrom0'' within the target machine. You will see several files named ''mount_simicsfs'' and ''simicsfs-sol*''. - Type the following commands below: bash mkdir -p /usr/lib/fs/simicsfs cp /cdrom/cdrom0/mount_simicsfs /usr/lib/fs/simicsfs/mount cp /cdrom/cdrom0/simicsfs-sol10 /usr/kernel/fs/sparcv9/simicsfs export TERM=vt100 vi /etc/vfstab * Inside the vfstab file, add a new line to the very end (with each entry tab-delimited): simicsfs - /host simicsfs - no - * Hit ESC and type '':wq'' to save the file and exit. * Type ''mkdir /host'' * This is usually a good time to save out a checkpoint right before you mount the host file system. At the Simics console, type ''CTRL-C'' followed by something like ''write-configuration /'' * Type ''c'' at the Simics console to resume. * Within the simulated console, type ''mount /host'' * Type ''ls /host'' to see the underlying host machine's root directory At this point, you should place the microbenchmark files somewhere on the host machine and copy them over to the target machine. Save out a NEW checkpoint called ''~/checkpoints/benchloaded'' and quit out of Simics. Now open the checkpoint you saved with vi by typing ''vi ~/checkpoints/benchloaded''and locate and delete the following lines: OBJECT iso0 TYPE file-cdrom { file: "tutorial_files/simicsfs.iso" in_use: 0 } cd_media: iso0 Save and exit vi by hitting the ESC key and typing '':wq''. In this next section, we will create a Simics script that will allow us to detect breakpoints inserted within our application in order to stage the workload. A breakpoint (also known as a 'magic breakpoint' in Virtutech parlance) is simply a predefined assembly instruction inlined into your code. This instruction usually has no effect (e.g., a write to register 0) but is recognized by Simics. You can take a look at all the magic breakpoint instructions within the ''magic-instruction.h'' file within the microbenchmarks tarball downloaded earlier. - Create a new Simics script called break.simics and fill it in with this: @def hap_callback(user_arg, cpu, arg): if arg == 1: SIM_break_simulation("Entered main()") if arg == 2: SIM_break_simulation("First thread spawned") @SIM_hap_add_callback("Core_Magic_Instruction", hap_callback, None) read-configuration ~/checkpoints/benchloaded - Launch Simics by typing ''start-simics break.simics'' - Within the simulated console, navigate to the directory where you copied over the microbenchmark files. - Type: ''./spinlock 4 1000000000 10 10 0'' (this indicates we want 4 threads and run for effectively an infinite number of iterations) - Simics should immediately break to the console and output ''Entered main()'' - Typing ''c'' again will break once the first thread reaches the beginning of its handler - You can see how the source code inserts the magic instructions by looking at ''spinlock.c'' - **Save out a final checkpoint by typing ''write-configuration ~/checkpoints/spinlock''** \\ ++++ ======3. Working with Flexus====== From the workload we just created, you will get to chance to run some sample jobs with Flexus and create a Flexpoint library. By this point you should have a valid initial checkpoint stored as ''~/checkpoints/spinlock''. - Before starting, you should create a few initial directories in the home (which we will explain in the next steps): mkdir ~/checkpoints mkdir ~/images mkdir ~/specs - The flexus simulator is stored as ~/tutorial_files/flexus_tutorial.tgz. Copy this file to your home directory and extract the tarball. You should have a directory called ~/flexus. ====Getting familiar with the run_job script==== In this section, we will cover the run_job script, which is used in all cases to run Flexus simulations. ++++CLICK - Expand/Collapse| The run_job script should be run from the ~/flexus/ directory and requires for the home directory of the user to contain a .run_job.rc.tcl file. Additionally, a ''~/specs'' directory must contain at least an interactive job configuration. - Copy the example RC file from ~/flexus/scripts/.run_job.rc.tcl into ~/ - Create a ~/specs/interactive/ directory and place a user-preload.simics file there (empty file is OK for the tutorial) - Execute the run_job script from the ~/flexus/ directory to confirm correct setup (the command-line help will be displayed when the prerequisites are met) The .run_job.rc.tcl file contains "rungen" sections with directives for each workload. When executing run_job, the rungen is selected with the "-run" parameter. Typical rungens are "phase" for phase generation, "flexpoint" for flexpoint generation, "trace" for functional simulation jobs, and "timing" for the detailed cycle-accurate simulations. The run_job script has already been configured for you. * Take a look at the various paths and options that are specified in it by examining the ~/flexus/scripts/global.run_job.rc.tcl file. Flexus scripts expect a specific directory hierarchy for the checkpoints. * Create a new directory called ~/ckpts/spinlock/baseline/phase_000/simics/ directory for our "spinlock" workload. (NOTE: we are creating a NEW sub-directory called ckpts, and NOT using the original ~/checkpoint folder) In order for the run_job script to accept a path as a valid workload, the directory must contain a job-postload.simics file that includes commands that are always run in Simics when the workload is loaded. * Create an empty job-postload.simics file in ~/ckpts/spinlock/baseline/specs/ (there are no special commands to run for our spinlock example workload). Before we proceed with creating Flexus-compatible checkpoints, there are a number of post-processing steps needed that must be performed directly on the Simics checkpoint we created earlier. A Simics script is to provide these steps. * load the initial checkpoint in Simics (using the ''start-simics'' script) * simics> ''read-configuration ~/checkpoints/spinlock'' * simics> ''run-command-file ~/flexus/scripts/create_mem_and_io_proxy.simics'' * simics> ''write-configuration ~/ckpts/spinlock/baseline/phase_000/simics/phase_000'' To verify that the basic run_job settings are correct and that the spinlock workload is properly set up, use run_job to launch Simics with the spinlock workload (NONE indicates that no Flexus simulator library should be loaded): * ''~/flexus/scripts/run_job NONE spinlock'' (error message about "flexus" missing is OK) Add configuration for the "spinlock" benchmark to the "trace" rungen of ~/.run_job.rc.tcl * configure simulation to stop at 100000000 (100M) cycles * configure statistics region interval at 50000000 (50M) cycles Run a "spinlock" trace job with TraceCMPFlex. * Example trace configuration can be found in the ''scripts/trace/user-*load.simics'' files. * ''~/flexus/scripts/run_job -run trace -cfg test_cfg_trace -local TraceCMPFlex spinlock'' * Explanation of "local": -local requests to run a batch of jobs locally. without -local an interactive run is assumed which waits at the simics> prompt instead of running. * Explanation of "remote": -remote will submit jobs to a remote cluster (e.g., Condor, PBS, etc...) [not available for the tutorial]. ++++ ====Displaying statistics through the stat-manager tool==== ++++CLICK - Expand/Collapse| Find the run directory for the trace job in ~/results/ and examine the resulting statistics database: * ''~/flexus/stat-manager/stat-manager list-measurements'' * See the cache hit/miss statistics, branch predictor stats, and instruction mix breakdown. * ''~/flexus/stat-manager/stat-manager print "Region 000" | less'' * ''~/flexus/stat-manager/stat-manager print "Region 001" | less'' * By default, stat-manager aggregates statistics across all cores. You can override this behavior with the -per-node flag. * ''~/flexus/stat-manager/stat-manager -per-node print "Region 001" | less'' ++++ ====Creating a flexpoint library==== In this section you will be creating a flexpoint library (warm microarchitectural state and simics checkpoints). ++++CLICK - Expand/Collapse| * Configure the "flexpoint" rungen in .run_job.rc.tcl to create 20 flexpoints, spaced 200000 (200K) instructions apart. * ''~/flexus/scripts/run_job -ckpt-gen -postprocess "$HOME/flexus/scripts/postprocess_ckptgen.sh flexpoint 20 mystate" -local -cfg test_cfg_trace -run flexpoint TraceCMPFlex spinlock'' * ''-ckpt-gen'' ensures that state is written out at the end of simulation * ''-postprocess'' specifies the script to run after each job * the postprocess_ckptgen.sh script re-runs the job for each flexpoint, saving cache state under the 'mystate' directory. Note that quoted parameters are passed to the postprocess_ckptgen.sh that specify the number of flexpoints and the state name. * Examine the results in ''~/ckpts/'', the procedure should create ''simics/'' and ''mystate/'' directories for each flexpoint, containing the simics checkpoint and corresponding flexus microarchitectural state respectively. * Untar flexstate.tar.gz to /tmp to examine the warmed flexus microarchitectural state files. ++++ ====Running a timing-accurate Flexus simulation==== In this section you will be using the "timing" mode of Flexus. ++++CLICK - Expand/Collapse| Add configuration for the "spinlock" benchmark to the "timing" rungen of ~/.run_job.rc.tcl * configure simulation to stop at 15000 (15K) cycles * configure statistics region interval at 5000 (5K) cycles Run a "spinlock" timing job with CMPFlex.OoO. * Example timing configuration can be found in the scripts/timing_v9/user-*load.simics files. * ''~/flexus/scripts/run_job -run timing -cfg test_cfg_timing -local -ma -state mystate CMPFlex.OoO spinlock'' * NOTE: When running timing simulations, one must pass the ''-ma'' parameter to Simics. * NOTE: Don't forget to specify ''-state'' to load the microarchitectural state created with the trace simulator, otherwise each flexpoint is run from cold microarchitectural state, severely biasing the results! Find the run directory for the timing job in ~/results/ and examine the resulting statistics databases with stat-manager. * Notice much more detailed statistics for timing simulator compared to the trace simulator. * Find the IPC of some of the flexpoints' results using stat-manager: * ''~/flexus/stat-manager/stat-manager format-string "" "Region 001"'' The default postprocess.sh script (which runs after each job if a -postprocess override is not specified) automatically creates a stats_db.out.selected.gz file that contains only statistics between 100K and 150K instructions. Use stat-sample to combine all the stats_db.out.selected.gz files into a single statistics file. * ''~/flexus/stat-manager stat-sample stats_db.out.gz */stats_db.out.selected.gz'' * Examine the resulting stats_db.out.gz file that contains the combined results of all flexpoints. * Examine the IPCs of the various flexpoints: * ''cat */UIPC'' * If bringing UIPCs into Excel, compute =STDEV() and =CONFIDENCE() for 95% confidence. * Bring time Breakdowns into Excel: * ''~/flexus/stat-manager/stat-manager print sum | grep ":Bkd:" > breakdown.tsv'' * Use Excel to split data by the colon (":") character into columns. Apply the Pivot Chart feature to plot the time breakdown. \\ ++++ ====Cleanup==== To cleanup execute the following commands: rm ~/.run_job.rc.tcl rm -rf ~/ckpts/ rm -rf ~/specs/ rm -rf ~/results/