====== ProtoFlex Hands-on Session ====== Welcome to the second half of the MICRO 2009 tutorial! In this section, we will cover the ProtoFlex portion of the tutorial. You can download the {{:documentation:micro_tutorial_2009.pptx|tutorial slides here}}. ===== Introduction ===== During this tutorial we will first briefly cover the basic **ProtoFlex simulation architecture** concepts and we will then move onto a hands-on session, where we will test-drive the recently released **ProtoFlex Simulator**. In particular we will go over the hardware and software installation procedures, and the process for staging and running your first simulation on the FPGA. This tutorial assumes you are familiar with basic computer architecture concepts and general simulation tools. No prior knowledge of FPGAs is required. ====The ProtoFlex Simulator==== The **ProtoFlex Simulator** is an open-sourced simulator developed at Carnegie Mellon University to facilitate scalable, shared-memory multiprocessor research using FPGAs. In its basic form, the ProtoFlex Simulator simulates a functional model of an N-way UltraSPARC III server system and is able to run unmodified, multithreaded applications on a Solaris operating system. The ProtoFlex Simulator is a parameterizable simulator and has been shown to simulate up to 16 processors on a {{http://bee2.eecs.berkeley.edu/|BEE2 FPGA}} platform. The version of the ProtoFlex Simulator that you will be using has been ported over to the {{http://www.xilinx.com/univ/xupv5-lx110t.htm|XUPV5-LX110T platform}}, which is a widely-available commodity FPGA platform. ==== Preliminaries ==== Throughout this tutorial, we will assume the following terminology. A **target system** refers to the simulated machine that we are interested in modeling (in the case of ProtoFlex, this is the Serengeti-based UltraSPARC III server). A **host system** refers to the underlying collection of hardware and software used to support the simulation of the target system. This includes the FPGA platform as well as software components that run on an x86-based workstation. The **target** machine that we will be simulating on the FPGA is a functional model of a 4-CPU UltraSPARC III shared-memory server. The target application that runs on this model will be the Solaris 10 operating system. We will also stage and run a simple multithreaded microbenchmark within the operating system. The **Primary PC** is a PC with a PCI Express x1 slot that hosts the {{http://www.xilinx.com/univ/xupv5-lx110t.htm|XUPV5 FPGA board}}. The **Secondary PC** is another PC that is used for configuring the FPGA and monitoring the RS232 output. The figure below shows a high-level view of our remote setup, which consists of 4 PC pairs. {{:documentation:pc_setup.png?600|}} Since we only have three FPGA boards available in our remote infrastructure, please divide up into **three** separate groups (protoflex1, protoflex2, protoflex4). Each group should have a Windows laptop with the Remote Desktop client available. (**NOTE: protoflex3 is currently unavailable**). ===Remote Desktop Access to CMU's FPGA infrastructure=== For this tutorial you will only need to connect to the Secondary PC that is assigned to your group; all steps below will be performed through this Secondary PC, running Windows XP. To connect, first open up the ++Remote Desktop Connection tool|(Start --> All Programs -> Accessories -> Communications -> Remote Desktop Connection)++ and connect to the following address: scirocco.ece.cmu.edu:300X where X is your designated group number (so for example, if you are in the protoflex3 group, connect to scirocco.ece.cmu.edu:3003). The username/password is: **pf_user/protoflex**. This username/password pair is common for all remaining steps of this tutorial, unless otherwise noted. **Now please go ahead and connect to your designated Secondary PC!** Once connected to the Secondary PC you will need to establish an SSH connection to the Primary PC that runs Linux and hosts the FPGA board. From the desktop open SSH Secure Shell Client and click on "Quick Connect". In the "Host Name" field type protoflex**N**.scotch.ece.cmu, where N is your group number (e.g. protoflex1.scotch.ece.cmu) and in the "User Name" field type pf_user. Hit connect and enter the password "protoflex". Now open "SSH Secure File Transfer" and repeat the same steps one more time. ++At this point you should see something like this|{{:documentation:rdp_connection_w_ssh.png?600|}}++ ======Hands-on Session====== During the hands-on session of this tutorial you will become familiar and try out most of the steps involved in setting up and using the ProtoFlex Simulator. Some time-consuming steps will be omitted. Such steps are marked with **SKIP** in their title. For instance instead of going through the lengthy and automated process of generating the FPGA bitstream, we will directly provide you with pre-generated files. More details on the ProtoFlex Simulator can be found in the **[[documentation:userguide|User Guide]]**. Below is a time line highlighting the basic steps we will be going through. Note, any steps that are marked with **CLICK** should be followed through. {{:documentation:micro_tutorial_timeline.png?400|}}
sudo zypper install gcc
sudo zypper install gcc-c++
sudo zypper install subversion
sudo zypper install ncurses-devel
====Installing and configuring Virtutech Simics 3.0.22 on the Primary PC====
To install and run Simics on the Primary PC, it is necessary to acquire a FlexLM license from Virtutech (www.virtutech.com) and have it installed on a FlexLM server. Instructions for acquiring and installing an academic license can be found here: http://www.virtutech.com/academia/licensing.html. Instructions for downloading the Simics package can be found at www.simics.net. The package should be: ''package-20-3.0.22-linux.tar.gz''.
To install Simics, unpackage this into your home folder:
gunzip -c package-20-3.0.22-linux.tar.gz | tar -xvf -
This should create a simics folder: ~/simics-3.0.22
Create a new file called .flexlmrc in your HOME directory (e.g., ~/.flexlmrc) and add:
VTECH_LICENSE_FILE=
To accept the license agreement, cd to ''~/simics-3.0.22/scripts'' and type ''./start-simics''. When you are asked to, agree to the licensing terms and type ''Yes''
We recommend reading the {{:documentation:simics-user-guide-unix.pdf|Simics User Guide for Unix}} and following through the "First Steps" guide and also familiarizing with the concept of Simics checkpoints and machine targets. Specifically, the simulated system that ProtoFlex supports is a ''Serengeti''-based server system that utilizes UltraSPARC III processors.
**WARNING: YOU MUST ABSOLUTELY USE VERSION 3.0.22. The Simics API library changes between versions, and we cannot offer any support if you choose to use an unsupported version. Our use of the Simics API library is extensive, and it is unlikely that any untested version will work.**\\
==== Installing and configuring Bluespec System Verilog on the Primary PC ====
Acquiring the Bluespec compiler requires you to directly contact Bluespec, Inc. @ http://www.bluespec.com/support/index.htm to request an academic FlexLM license. This license must be installed on your FlexLM server. You must then register on the forum at http://bluespec.com/forum, which is currently used to host the Bluespec compiler releases.
Once you have unpackaged the Bluespec compiler onto the Primary PC, you should double-check that your **.bashrc** file contains the following:
export LM_LICENSE_FILE=
export BLUESPEC_HOME=/Bluespec-2008.11.C
export BLUESPECDIR=$BLUESPEC_HOME/lib
export PATH=$PATH:$BLUESPEC_HOME/bin
To verify that your Bluespec compiler is ready for use, type: **bsc --help**. At the bottom, you should see something similar to:
License BCOMP expires in 362 days.
\\ ++++
===== 1. Downloading and compiling the ProtoFlex source code to the Primary PC =====
{{ :documentation:micro_tutorial_stage1.png?400| }}
All of these steps should be performed on the Primary PC (Linux). **Note**: For your convenience we have already placed all of the files that you would normally download in the ''tutorial_files'' folder in your home directory.
++++CLICK - Expand/Collapse|
* We recommend placing all of the source code in a folder such as ''/home/pf_user/protoflex''. We will refer to this directory as tar -zxvf protoflex_1.0.tgz
* The following commands modify your your ''.bashrc'' file. To save you from typing this is what your final ''.bashrc'' should look like once you are done:
export LM_LICENSE_FILE=1703@dmv.ece.cmu.edu:1717@dmv.ece.cmu.edu
source /home/ise-10.1/ISE/settings32.sh
source /home/edk-10.1/EDK/settings32.sh
##########################
# Bluespec
##########################
export BLUESPEC_HOME=/home/pf_user/Bluespec-2008.11.C
export BLUESPECDIR=$BLUESPEC_HOME/lib
export PATH=$PATH:$BLUESPEC_HOME/bin
##########################
# ProtoFlex
##########################
export PF_HOME=/home/pf_user/protoflex
export PF_SIMICS=/home/pf_user/simics-3.0.22
export PF_DIAG=/home/pf_user/diags
export PF_REG=/home/pf_user/regress
export PF_SUN_HOST=none
source /home/pf_user/protoflex/settings.sh
* An explanation of the variables are shown below:
\\
^ Environment variable ^ Description ^ Example |
| PF_SIMICS | Base directory where Simics is installed | export PF_SIMICS=/home/pf_user/simics-3.0.22 |
| PF_HOME | Directory where Protoflex source was checked out | export PF_HOME=/home/pf_user/protoflex |
| PF_DIAG | Directory used to store SPARC diagnostics | export PF_DIAG=/home/pf_user/diags |
| PF_REG | Directory used to store regressions | export PF_REG=/home/pf_user/regress |
| PF_SUN_HOST | | export PF_SUN_HOST=none |
\\
* **Note:** Before proceeding make sure you logout and reestablish a SSH connection to the Primary PC.
* To build the ProtoFlex software modules (which are used to faciliate PC-to-FPGA communication), type:
$> cd
$> make sw
* After observing some compilation output, you should verify that the following files have been generated:
/apps/pfmon/bin/pfmon
/modules/simics_remote_ctrl/simics_listener/x86-linux/lib/simics_cpu_listener.so
/modules/simics_remote_ctrl/simics_listener/x86-linux/lib/simics_device_listener.so
/modules/simics_remote_ctrl/simics_listener/x86-linux/lib/sparc-irq-bus.so
++++
source /home/ise-10.1/ISE/settings32.sh
source /home/edk-10.1/EDK/settings32.sh
* After installation, you will need to install the ''libdb'' library using ''yast'' (otherwise Xilinx EDK will not run properly). At the command-line, type ''sudo /sbin/yast2''. Under ''Software->Software Management'', search for the ''db43'' (Berkeley DB Database Library) package and install it. After installation, type the following commands:
cd /usr/lib
sudo ln -s libdb-4.3.so libdb-4.1.so
==== Patching PCI Express ====
* Due to Xilinx licensing restrictions, there are certain HDL files and netlists related to the PCI express components that we cannot include in the packaged release. These files must be downloaded and generated separately and will require the appropriate IP core licenses. Fortunately, most academic groups enrolled in the Xilinx University Program (http://www.xilinx.com/univ) are eligible to receive this license for free.
* We will start by first generating the netlist + Verilog files for the PCI Express Endpoint Plus IP block. To implement these steps, follow the instructions beginning on slide 10 from http://www.xilinx.com/univ/xupv5-lx110t/design_files/PCIe/XUPV5-LX110T_PCIe_x1_Endpoint_Plus_Design_Creation.pdf ({{:documentation:xupv5-lx110t_pcie_x1_endpoint_plus_design_creation.pdf|local copy}}) until slide 18. To launch coregen simply type coregen
on the Primary PC console. **When you reach slide 15, rather than inputing ''5050'' for the ''Device ID'' field, input ''0007'' instead.** Note: if Coregen appears to have an out-of-date endpoint block (not 1.9), then you forgot to update your Coregen IP library.
* Assuming that you created a folder called ''xupv5_pcie_x1_plus'' in the previous step, there should be a file named **''endpoint_blk_plus_v1_9.ngc''**. Copy this file to ''
BMD_32.v
BMD_64.v
BMD_EP.v
BMD.v
BMD_INTR_CTRL_DELAY.v
BMD_32_RX_ENGINE.v
BMD_64_TX_ENGINE.v
pcie_endpoint_product.v
BMD_CFG_CTRL.v
BMD_32_TX_ENGINE.v
BMD_RD_THROTTLE.v
BMD_TO_CTRL.v
BMD_EP_MEM.v
BMD_INTR_CTRL.v
BMD_64_RX_ENGINE.v
BMD_EP_MEM_ACCESS.v
pci_exp_64b_app.v
* On the primary PC, type:
cd /platforms/edk/xupv5-1.0/pcores/pcie_ram/hdl/verilog
patch -p1 -i bmd.patch
* This command will patch the HDL files to fit our application requirements. You should expect to see the following output:
patching file BMD_64_RX_ENGINE.v
patching file BMD_64_TX_ENGINE.v
patching file BMD_EP_MEM_ACCESS.v
patching file BMD_EP_MEM.v
patching file BMD_EP.v
patching file BMD.v
patching file pci_exp_1_lane_64b_ep.v
patching file pci_exp_64b_app.v
patching file xilinx_pci_exp_ep.v
\\
$cdrom_path = "sol-10-u4-ga-sparc-v1.iso"
- Start the simics installation by typing ../../scripts/start-simics -x abisko-sol10-cd-install1.simics
and wait for the entire process to complete. A terminal from the target machine should appear and show you the progress of the OS installation.
- During the installation, you may be asked to answer a few questions manually (since the Simics scripts are slightly out-of-date). You will get one question about NFS (just hit ESC-2 twice) and another on setting the root password (put whatever you want). You will also be asked to enable/disable remote services (select 'no').
- The entire installation may take several hours, depending on the performance of your host PC workstation.
- When the script terminates, the installation from the first CD is finished, and Solaris will have tried to reboot the system. You will need to exit Simics at this point by hitting ''CTRL-C'' at the Simics console, and typing ''quit''.
- Edit the ''abisko-sol10-cd-install2.simics'' script and set the proper ''$cdrom_path'' as before. Now run the second script by typing: ''../../scripts/start-simics -x abisko-sol10-cd-install2.simics''. During the 2nd script, you may be asked for additional input, such as the preferred keyboard type. At some point, you will be asked to select the media type. Choose 'CD/DVD'.
- When the second script is finished, the Solaris installation will have tried to reboot the system. Like before, hit ''CTRL-C'' and type ''quit'' at the Simics console.
- Start the third script by typing ''../../scripts/start-simics -x abisko-sol10-cd-install3.simics''. These should only take a few minutes to complete. Afterwards, you will be presented with a login prompt. Type ''root'' and the password you specified earlier.
- The machine will shut down momentarily and at this point, a large Simics disk image called **abisko-sol10-install.disk** and a state file called **abisko-sol10.state** will have been created. After the machine shuts down, type ''quit'' at the Simics console.
++++
==== Boot Solaris and Save Checkpoint ====
++++CLICK - Expand/Collapse|
**With a finalized disk image, we are now ready to boot the operating system and create our first Simics checkpoint.**
- Navigate over to ''/home/pf_user/simics-3.0.22/targets/serengeti'' on the primary PC
- Open and edit the ''abisko-common.simics'' file and add the following lines near the top:
$os = solaris10
$num_cpus = 4
$megs_per_cpu = 64
- These parameters allow us to configure the target machine at boot time according to our preferences. The design we will be demonstrating will be a 4-CPU system with a total of 256MB. These settings must match the capabilities of the FPGA platform that is used. In the case of XUPv5, the maximum # of CPUs we are able to support at the moment is 4, and the maximum amount of memory is 1.9GB (although Simics requires this to be a power of two, so 1GB is the true max).
- Once you have edited the parameters, type ''../../scripts/start-simics -x abisko-common.simics'' to boot our machine.
- A simulated terminal should appear and show the Solaris 10 boot process. Type c to begin simulating at the console.
- Once you reach the interactive terminal, login using the username "root" and the password "cmu". We are now ready to save our first checkpoint.
- Hit ''CTRL-C'' in the Simics console, and type ''write-configuration /
bash
mkdir -p /usr/lib/fs/simicsfs
cp /cdrom/cdrom0/mount_simicsfs /usr/lib/fs/simicsfs/mount
cp /cdrom/cdrom0/simicsfs-sol10 /usr/kernel/fs/sparcv9/simicsfs
export TERM=vt100
vi /etc/vfstab
* Inside the vfstab file, add a new line to the very end (with each entry tab-delimited):
simicsfs - /host simicsfs - no -
* Hit ESC and type '':wq'' to save the file and exit.
* Type ''mkdir /host''
* This is usually a good time to save out a checkpoint right before you mount the host file system. At the Simics console, type ''CTRL-C'' followed by something like ''write-configuration
OBJECT iso0 TYPE file-cdrom {
file: "tutorial_files/simicsfs.iso"
in_use: 0
}
cd_media: iso0
Save and exit vi by hitting the ESC key and typing '':wq''.
In this next section, we will create a Simics script that will allow us to detect breakpoints inserted within our application in order to stage the workload. A breakpoint (also known as a 'magic breakpoint' in Virtutech parlance) is simply a predefined assembly instruction inlined into your code. This instruction usually has no effect (e.g., a write to register 0) but is recognized by Simics. You can take a look at all the magic breakpoint instructions within the ''magic-instruction.h'' file within the microbenchmarks tarball downloaded earlier.
- Create a new Simics script called break.simics and fill it in with this:
@def hap_callback(user_arg, cpu, arg):
if arg == 1:
SIM_break_simulation("Entered main()")
if arg == 2:
SIM_break_simulation("First thread spawned")
@SIM_hap_add_callback("Core_Magic_Instruction", hap_callback, None)
read-configuration
- Launch Simics by typing ''start-simics break.simics''
- Within the simulated console, navigate to the directory where you copied over the microbenchmark files.
- Type: ''./spinlock 4 1000 10 10 0''
- Simics should immediately break to the console and output ''Entered main()''
- Typing ''c'' again will break once the first thread reaches the beginning of its handler
- You can see how the source code inserts the magic instructions by looking at ''spinlock.c''
- **Save out a final checkpoint**
- **FINAL STEP**. This final step is needed to maximum the performance of the underlying simulated I/O system. Simics is typically the initiator of DMA transactions, which occur at some bulk-sized granularity. This granularity is set by default to a very low value (64 Bytes) in default Simics checkpoints. Since Simics is a software-based simulator, issuing many small bulk transfers imposes no simulation overhead. In our system, large bulk transfers are far more desirable. To change this default setting, you will need to **EDIT** the checkpoint file and make one small change.
- Type the following commands:
cd
perl -pi -e 's/dma_block_size: 64/dma_block_size: 8192/'
\\
++++
====Validating a Workload for ProtoFlex====
++++CLICK - Expand/Collapse|
Prior to loading any Simics checkpoints into the ProtoFlex simulator, it is necessary to verify and see if the checkpoint has any transient state that cannot be loaded into FPGA hardware. For example, Simics allows a checkpoint to be saved while a pending interrupt is queued up for a processor (or if a DMA transaction is waiting on the event queue). To check against this, you should run this script prior to loading any Simics checkpoint:
checkpfckpt
If there are no errors, the script will return with no messages. If there are reported problems, the solution is to load up the checkpoint and advance its state by some amount of time and saving out a new checkpoint. This usually allows the transient operations (e.g., DMA, interrupts) to complete. In I/O-intensive applications, this may take several tries before you can get the system to be "quiet".
\\
++++
====== 2. Generating the Bitstream ======
{{ :documentation:micro_tutorial_stage2.png?400| }}
++++CLICK - Expand/Collapse|
In this section, we will "simulate" the basic steps needed to generate the bitstream file that will be used to program the XUPV5-LX110T FPGA (we will not actually be waiting for the tool to finish). The top level project we use is a modified version of an XUPv5-LX110T reference design (taken from http://www.xilinx.com/univ/xupv5-lx110t-refdes.htm) based on Xilinx Embedded Development Kit 10.1 (EDK), which is a tool for building System-on-Chips in Xilinx FPGAs. In our design, we have created our own ''pcore'' (in Xilinx parlance), which is an IP core that contains our multithreaded UltraSPARC III core called the **BlueSPARC**. BlueSPARC is written using a high-level, synthesizable hardware description language called Bluespec SystemVerilog (BSV).
The BSV compiler takes our Bluespec description in the form of ''*.bsv'' files and generates purely synthesizable Verilog code. In our flow, once this Verilog code is generated, we then synthesize it into an .NGC netlist file using Xilinx XST 10.1. This .NGC file is then imported into a template ''pcore'', which is then inserted into our EDK project. Once we have done this, we simply "press a button" and EDK will generate a bitstream for us that can be programmed onto the FPGA.
The process of generating the bitstream typically takes several hours. For demonstration purposes, we will have you "simulate" the steps needed to begin the bitstream generation (and verifying that your setup has no errors) but skipping forward using pregenerated files instead.
++++
==== Generating and synthesizing RTL on the Primary PC ====
++++CLICK - Expand/Collapse|
- To generate the UltraSPARC III core model (BlueSPARC) used in the simulator, navigate over to the RTL directory at: ''
cp ~/tutorial_files/pcie/verilog/* ~/protoflex/platforms/builds/xupv5-001-Oct-04/pcores/pcie_ram_v2_00_a/hdl/verilog/.
cp ~/tutorial_files/endpoint_blk_plus_v1_9.ngc ~/protoflex/platforms/builds/xupv5-001-Oct-04/pcores/pcie_ram_v2_00_a/netlist/.
- The **FINAL** step is to copy over the NGC file into the generated EDK project. Example: **''cp ~/tutorial_files/mkBluesparc.ngc
cd /platforms/build/xupv5-001-Sep-18
xps -nw xupv5.xmp
% run init_bram
**Note**: Since this step takes about 3 hours to complete we have placed the pregenerated ''download.bit'' and ''pfserver.elf'' files under the ''tutorial_files'' directory.
* When this step is completed (about 3 hours), a final bitstream file will be located under ''
cd /cygdrive/c
xmd
% connect mb mdm
% dow pfserver.elf
% con
* At this point you should observe output in the HyperTerminal window indicating that the XUPv5 hardware is ready. In the XMD shell the ''con'' command runs the Microblaze program (pfserver) and the ''stop'' command is used to halt execution. For more information on other xmd commands type ''help'' in the XMD shell.
* We are now ready to boot the Primary PC and connect to the FPGA over PCI express.
++++
=== Configuring the PCI express driver ===
++++CLICK - Expand/Collapse|
* Power on the Primary PC. **Don't forget to select the correct kernel at the GRUB menu (we use 2.6.27.29-0.1)**.
* During bootup, your startup screen should show the FPGA board as a ''Memory Controller'':
{{:documentation:bios.png?350|Bootup Screen}}
* After the Primary PC is booted, navigate over to the ''
xupv5: module license 'unspecified' taints kernel.
xupv5_module_init(395): Initialization
vendor=8086 device=27d0
xupv5_pcird 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
xupv5_probe(178): BAR0 length: 1024
xupv5_probe(180): BAR0 physical address: e1000000
xupv5_probe(183): BAR0 virtual address: f91b0000
xupv5_probe(255): Probe completed
* This message will only show up once the PCI express detects the XUPv5 FPGA board.
\\
++++
====== 5. Running the PFMON Tool ======
{{ :documentation:micro_tutorial_stage5.png?400| }}
++++CLICK - Expand/Collapse|
* In this section, we will cover the basic process of starting up the ProtoFlex simulator and running the test workload you created earlier.
* A ProtoFlex simulation is initiated and controlled by a software tool called PFMON, which is a top-level controller that orchestrates interprocess communication between the Simics modules and the actual hardware running on the FPGA. PFMON is mainly operated through scripts and a command-line interface.
* Before we start, it is necessary to create a memory cache directory (this is a cache of memory images dumped out by Simics for loading into the FPGA's memory). This can be done by typing: ''mkdir -p /home/pf_user/imgcache''.
* At the command-line, start by typing: ''pfmon -job
connect -hw simcpu
connect -hw simdev
connect -hw default
select -dev simdev0
timer -hsrc simdev0
reginit -hs simcpu0
memdump -hw simcpu0 -path /home/pf_user/imgcache
memload -path /home/pf_user/imgcache
setcpu -en
* The ''connect'' commands as shown above are used to initialize and establish a connection between various platforms used throughout our system. Specifically, the ''connect -hw simcpu'' and ''connect -hw simdev'' commands will instantiate the Simics processes in the background that will be responsible for providing initial checkpoint state as well as facilitating simulated I/O devices.
* The ''connect -hw default'' is synonymous with ''connect -hw fpga_pcie'' (as was set in the initial launch command). This command establishes a connection between the Primary PC and the server code that runs on the microblaze within the XUPv5 FPGA.
* The ''select -dev'' command is simply for convenience and reduces the # of arguments that have to be passed into subsequent commands that involve devices.
* The ''timer'' command programs the default hardware (in this case ''fpga_pcie'') with the expected rate at which the CPU/System timers (i.e., TICK, STICK) in the target system should advance.
* The ''reginit'' commands copy over the full register file + TLBs of each simulated CPU over from Simics to the FPGA.
* The ''memdump'' command instructs Simics to generate a binary image of the target system's main memory when the checkpoint was taken. The ''-path'' command dumps the image to a cached directory (this avoids repeating this step each time the simulation is started up).
* The ''memload'' command searches the cache directory and initializes the FPGA's memory system with the binary image. This process typically takes 60-70 seconds for a 1GB memory target system.
* The ''setcpu'' command enables particular CPUs for running.
* The ''stats -reset'' command is explained in the Section on ''Statistics'' below.
* **Note, all of these commands can be placed into a script file and passed into pfmon without re-typing them each time**. For example, if the above commands were pasted into a file named ''connect.scr'', then one could simply type: ''pfmon -job
4643 19.1 0.0 7412 1620 pts/4 R+ 18:35 12:14 pfmon -job /home/pf_user/checkpoints/spec2k-4cpu-1gb-ready -defaulthw fpga_pcie -script scripts/connect.scr
4644 0.0 0.0 4196 1372 pts/4 S+ 18:35 0:00 /bin/sh /home/pf_user/protoflex/modules/simics_remote_ctrl/simics_listener/run_simics_cpus.sh
4646 0.0 2.7 100096 85260 pts/4 Sl+ 18:35 0:01 /home/pf_user/simics-3.0.22/x86-linux/bin/simics-common -no-win -stall -x launch_cpus.simics
4664 0.0 0.0 4196 1368 pts/4 S+ 18:35 0:00 /bin/sh /home/pf_user/protoflex/modules/simics_remote_ctrl/simics_listener/run_simics_devices.sh
4666 6.3 2.8 101860 87088 pts/4 Sl+ 18:35 4:03 /home/pf_user/simics-3.0.22/x86-linux/bin/simics-common -stall -x launch_devices.simics
* At this point, we should be ready to begin executing our first simulation. To begin, type: ''run -n 10000000 -q 1000000''. This command will instruct the FPGA platform to begin executing 10 million instructions.
* The ''-n'' argument specifies the total number of instructions that are to be executed across all CPUs that are enabled. The ''-q'' command is also in units of instructions and simply indicates how frequent pfmon should halt the simulation on the FPGA and issue probes to the hardware. Having periodic "breaks" also allows us to halt the FPGA on-demand using ''CTRL-C'' if necessary. For example, a typical way to execute 1 billion instructions would be: ''run -n 1000000000 -q 10000000''. Having a ''-q'' value will allow us to monitor the state of the simulation more frequently at the expense of performance overhead.
* Once the simulation is running, you will notice a few statistics being updated in real time, for example:
10850M/100000000M 813s avgmips:13.3 [probe:23457 mtp:5843799 ior:19647 iow:4277 irpt:499 dma-i:288kB dma-o:18423kB]
* A complete run from beginning to end is shown below:
pfmon v0.3 last rev: 7/1/09
Type 'help'
Work directory: /home/pf_user/pflogs//spec2k-4cpu-1gb-ready_263_183535
pfmon> connect -hw simcpu
Successful simics interface registration
Waiting for connection to simics...
Successful connection
simcpu0 created
pfmon> connect -hw simdev
Successful simics interface registration
Waiting for connection to simics devices...
Successful connection
simdev0 created
pfmon> connect -hw default -ip 192.168.1.10
Successful fpga interface registration
Connecting over PCI express...
Opening PCIE
fpga_pcie0 created
fpga_pcie0 set as default hw
pfmon> select -dev simdev0
selecting simdev0 as default device instance
pfmon> timer -hsrc simdev0
programming cpu timers (stick ratio: 6)
pfmon> reginit -hs simcpu0
Setting # of cpus for fpga_pcie0 to 4
loaded from
loaded from
loaded from
loaded from
pfmon> memdump -hw simcpu0 -path /home/pf_user/imgcache
/home/pf_user/imgcache/_home_pf_user_checkpoints_spec2k-4cpu-1gb-ready.img already exists.
pfmon> setcpu -en
enabling
enabling
enabling
enabling
pfmon> memload -path /home/pf_user/imgcache
|==================================================| 100% of 1024MB loaded
memory image from /home/pf_user/imgcache/_home_pf_user_checkpoints_spec2k-4cpu-1gb-ready.img loaded into fpga_pcie0 (72s)
pfmon> stats -reset
Statistics reset
pfmon> step -n 100000000000000 -q 10000000
fpga stepping 100000000000000 instructions
10850M/100000000M 813s avgmips:13.3 [probe:23457 mtp:5843799 ior:19647 iow:4277 irpt:499 dma-i:288kB dma-o:18423kB]
\\
++++
=====Statistics=====
++++CLICK - Expand/Collapse|
* At the PFMON command-line, you can view various runtime statistics by typing: ''stats''
* To view stats that are specific to a single CPU, type ''stats -cpu
========================== Aggregate BlueSPARC statistics ===========================
Unless otherwise noted, % values in parenthesis indicate rate of the event per total # of instructions
cycles: 10526913715 // total # of cycles (this is start & stopped during 'step' commands)
stalls: 12586525 (0.120%) // total # cycles stalled due to resource hazard (does not include memory stalls)
instructions: 1570000000 // total # instructions executed
stalls per 100 inst: 0.8
privileged insts: 1306905953 (83.242%) // total # privileged instructions executed
cpu progress breakdown: // percentage of instructions executed by specific CPUs
cpu 0 (30.0%) cpu 1 (33.5%) cpu 2 (9.4%) cpu 3 (27.1%)
aggregate ipc: 0.149 // average IPC of the BlueSPARC pipeline
micro-transplants: 1078068 (0.068667%) // # micro-transplants executed by the Microblaze
pipeline retries: 9248885 (0.589%) // # aborted instructions (e.g., due to resource hazard)
assist instructions: 21044559 (1.340%) // # micro-instructions used to facilitate complex instructions
fetches: 1570000000 // # SPARC instructions fetched and executed
fetch misses: 18456469 (1.176%) // # BlueSPARC I-cache misses
stores: 49057631 (3.125%) // # store instructions
store misses: 2202635 (0.140%) // # store misses
loads: 165757375 (10.558%) // # load instructions
load misses: 15656816 (0.997%) // # load misses
interrupts recv'd: 1271 (0.000081%) // total # of interrupts
device interrupts: 25 (0.000002%) // # device interrupts
cpu cross-calls sent: 1246 (0.000079%) // # cpu-to-cpu interrupts
cross-calls aborted: 208561 // # cpu-to-cpu interrupts that aborted due to busy CPU
i/o reads: 147 (0.000009%) // # of memory-mapped I/O reads
i/o writes: 159 (0.000010%) // # of memory-mapped I/O writes
simics i/o cnt: 306 // total # I/Os
simics i/o lat (us): 1544 // average latency of Simics I/O transplant (in microseconds)
simics lat (us): 1108 // average latency (Simics-only overhead)
flushes: 107682 (0.006859%) // total # of i- and d-cache flushes
tick interrupts: 0 // # interrupts generated by TICK register
stick interrupts: 1488 // # interrupts generated by STICK register
illtraps: 0 (0.000000%) // # illegal traps (should be 0 otherwise something is wrong)
fp_disabled: 0 (0.000000%) // # floating-point disabled traps
fetch_align: 0 (0.000000%) // # misaligned fetches (should be 0)
privileged_op: 0 (0.000000%) // # trapped non-privileged accesses
total # branch: 0 (0.000000%) // # of branch instructions (requires OPT_BRANCH_STATS = True)
# taken branch: 0 (0.000000%) // # taken branches (same as above)
total # priv branch: 0 (0.000000%) // # of branches in privileged mode
# taken priv branch: 0 (0.000000%) // # of taken branches in privileged mode
++++
\\
====== Congratulations! You made it! ======
{{:documentation:micro_tutorial_done.png?600|}}
\\
\\
======References======
**{{:documentation:mc09.pdf|ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs}}**\\
Eric S. Chung, Michael K. Papamichael, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, and Ken Mai.\\
//ACM Transactions on Reconfigurable Technology and Systems//, 2009.\\
**{{http://www.ece.cmu.edu/~echung/memocode-camera.pdf|Implementing a High-performance Multithreaded Microprocessor: A Case Study in High-level Design and Validation}}**\\
Eric S. Chung and James C. Hoe.\\
//Formal Methods and Models for Codesign (MEMOCODE)//, July 2009.\\
**{{http://www.ece.cmu.edu/~echung/fpga08-chung.pdf|A Complexity-Effective Architecture for Accelerating Full-System Multiprocessor Simulations Using FPGAs}}**\\
Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, and Ken Mai.\\
//International Symposium on Field Programmable Gate Arrays//, February 2008, Monterey, CA.\\
\\
======Resources======
* Support email: [[protoflex@ece.cmu.edu]]
* Bluespec forum: http://www.bluespec.com/forum
* XUPv5 reference pages: http://www.xilinx.com/univ/xupv5-lx110t.htm
* {{:documentation:usiiiv2.pdf|UltraSPARC III Cu Reference Manual}}
\\