Flexible Architecture for Simulation and Testing (FAST)
We are at the point in the area of multithreaded microprocessor architectures where further progress will require the development of a hardware prototype. This prototype should support more than two parallel threads and thread-level speculation (TLS). Currently, no commercial microprocessor has these multithreading capabilities and this prevents the serious OS, compiler and application development that is required to take full advantage of multiple threads and TLS. Without the resulting optimized software, it will be difficult to understand the true benefits of these capabilities or make the appropriate hardware/software design tradeoffs to achieve the best performance. However, the problem with building a microprocessor is that it requires VLSI chip design, a resource-intensive process. The immense task of chip design verification before tape out, in particular, can make microprocessor design a difficult undertaking in an academic environment. This is the primary reason that all prior multithreading and TLS research has relied on software simulators.
FAST is a flexible simulation platform that will enable chip multiprocessor (CMP) and multithreaded simulation on a real hardware platform that enables complex system design with the ability to execute millions of instructions per second. FAST is a flexible platform that enables the manipulation of the memory hierarchy and other key components. FPGAs are used to interface between the 4 processor tiles and within the processor tiles. Figure 1 below, illustrates the FAST PCB at a high level. The yellow tiles are processor tiles and the blue tile servers as the internal and external system interconnect. The initial implementation of the FAST leverages the existing Hydra Architecture components, but other CMP designs can be realized by changing the FPGA configuration.
Figure 1: Generic FAST implementation on a PCB.
Figure 2 shows an expanded view of the processor tiles. Each tile consists of an FPU, CPU, L1 memory, and FPGAs. This configuration enables the processing tile to run both floating point and integer applications, while giving it the flexibility to modify the L1 memory configuration and adding other components, like multithreading support and profiling metrics, via the FPGAs.
Figure 2: Expanded view of the FAST processor tile.
We believe it is possible to build a flexible research prototype using fixed-function processors together with FPGAs without doing any VLSI design. We intend to demonstrate this by building a flexible CMP prototyping environment with existing chips called FAST. The key idea is that by combining ten-year-old microprocessor chips with state-of-the-art FPGA chips, it is possible to build a single-board multiprocessor prototyping environment that provides support for TLS and operates at hardware speeds, yet has the latency and bandwidth characteristics equivalent to a modern CMP architecture. Figure 3 below, is a simplified scaled drawing of the FAST PCB. A processor tile occupies each corner of the FAST PCB. The L2 memory, L2 memory controller, Read/Write controller and internal and external glue components occupy the center of the FAST PCB.
Figure 3: Simplified component layout for the FAST PCB.
As Figure 2 illustrate the processor tile components, Figure
3 uses the same color coding to differentiate the CPU (red), FPU (green), FPGAs (purple) and L1 memory (yellow). In the center of the PCB resides
the L2 memory in light blue and all the internal and external glue components
in dark blue. Starting from the top of
the PCB in the center, there is the embedded Ethernet Board that enables the
external communication interface. Next
to that is the CPLD that provides more glue logic to augment the capabilities
of the microcontroller on the embedded Ethernet board. Below these components
are the 2 XC2V6000 which controller the L2 memory and external memory
interfaces. The power connector and
Figure 4: Completed FAST PCB with component labels.
The overall goals of the FAST project is to develop a hardware prototype system that allows hardware and software experimentation with fine-grain and speculative multithreading. More specifically:
Grants & Donations
This project is supported by NSF grant # CCR-0220138, as well as donations from Xilinx, Inc.