Flexible Architecture for Simulation and Testing (FAST)

 

We are at the point in the area of multithreaded microprocessor architectures where further progress will require the development of a hardware prototype. This prototype should support more than two parallel threads and thread-level speculation (TLS). Currently, no commercial microprocessor has these multithreading capabilities and this prevents the serious OS, compiler and application development that is required to take full advantage of multiple threads and TLS. Without the resulting optimized software, it will be difficult to understand the true benefits of these capabilities or make the appropriate hardware/software design tradeoffs to achieve the best performance. However, the problem with building a microprocessor is that it requires VLSI chip design, a resource-intensive process. The immense task of chip design verification before tape out, in particular, can make microprocessor design a difficult undertaking in an academic environment.  This is the primary reason that all prior multithreading and TLS research has relied on software simulators.  

 

FAST is a flexible simulation platform that will enable chip multiprocessor (CMP) and multithreaded simulation on a real hardware platform that enables complex system design with the ability to execute millions of instructions per second.  FAST is a flexible platform that enables the manipulation of the memory hierarchy and other key components.  FPGAs are used to interface between the 4 processor tiles and within the processor tiles.  Figure 1 below, illustrates the FAST PCB at a high level.  The yellow tiles are processor tiles and the blue tile servers as the internal and external system interconnect.  The initial implementation of the FAST leverages the existing Hydra Architecture components, but other CMP designs can be realized by changing the FPGA configuration.

 

Figure 1: Generic FAST implementation on a PCB.

 

Figure 2 shows an expanded view of the processor tiles.  Each tile consists of an FPU, CPU, L1 memory, and FPGAs.  This configuration enables the processing tile to run both floating point and integer applications, while giving it the flexibility to modify the L1 memory configuration and adding other components, like multithreading support and profiling metrics, via the FPGAs. 

 

Figure 2: Expanded view of the FAST processor tile.

 

We believe it is possible to build a flexible research prototype using fixed-function processors together with FPGAs without doing any VLSI design. We intend to demonstrate this by building a flexible CMP prototyping environment with existing chips called FAST. The key idea is that by combining ten-year-old microprocessor chips with state-of-the-art FPGA chips, it is possible to build a single-board multiprocessor prototyping environment that provides support for TLS and operates at hardware speeds, yet has the latency and bandwidth characteristics equivalent to a modern CMP architecture.  Figure 3 below, is a simplified scaled drawing of the FAST PCB.  A processor tile occupies each corner of the FAST PCB.  The L2 memory, L2 memory controller, Read/Write controller and internal and external glue components occupy the center of the FAST PCB.

 

Figure 3: Simplified component layout for the FAST PCB.

 

As Figure 2 illustrate the processor tile components, Figure 3 uses the same color coding to differentiate the CPU (red), FPU (green), FPGAs (purple) and L1 memory (yellow).  In the center of the PCB resides the L2 memory in light blue and all the internal and external glue components in dark blue.  Starting from the top of the PCB in the center, there is the embedded Ethernet Board that enables the external communication interface.  Next to that is the CPLD that provides more glue logic to augment the capabilities of the microcontroller on the embedded Ethernet board. Below these components are the 2 XC2V6000 which controller the L2 memory and external memory interfaces.  The power connector and DC-to-DC voltage regulators are shown in orange. The voltage regulators are labeled with their output voltages that supply the core FPGA voltages.

 

Figure 4: Completed FAST PCB with component labels.


FAST PCB Specification Files
FAST Complete Software Archive

The overall goals of the FAST project is to develop a hardware prototype system that allows hardware and software experimentation with fine-grain and speculative multithreading.  More specifically:

 

Hardware Goals:

  • Explore spectrum of variation in multithreading architectures exploiting the flexibility of the FPGAs.
  • Explore alternative ways to use multithreaded hardware, e.g., support fault tolerance.

 

Software Goals:

  • Explore the design of OS, programming environments, programming paradigms and applications
  • Determine full potential for speedup provided by a TLS architecture for general purpose applications.

 

Education Goals:

  • Provide a project development environment for advance digital design and computer architecture classes

 

Grants & Donations

This project is supported by NSF grant # CCR-0220138, as well as donations from Xilinx, Inc.