| |
Chip Multiprocessor Watch
Just a few years ago, the idea of putting multiple processors on a chip was farfetched. Now it is accepted and commonplace, and virtually every new high performance processor is a chip multiprocessor of some sort. This webpage exists as a starting point for us to organize our understanding of the landscape of chip multiprocessors.
Domain Specific Multiprocessors
Sony/Toshiba/IBM Cell Processor
A joint project of Sony, Toshiba, IBM, Cell is envisioned primarily as a computing engine for media applications. It will be the centerpiece of Sony's Playstation 3, and should ship in 2006.
Main Attributes
- One dual threaded, dual issue in-order PowerPC core @3.2 GHz
- 8 Processing Elements, each with 256k local store, a vector Single Precision FPU and a conventional Double Precision FPU @3.2 GHz
- Bi-directional ring interconnect between all 9 PEs
- Rambus XDR memory controller
Links:
IBM/Microsoft Xenon
An IBM designed processor, customized for Microsoft, Xenon is the CPU of the Xbox 360, and shipped to consumers in November 2005.
Main Attributes
- 3 dual threaded, dual issue in-order PowerPC cores @3.2 GHz
- 1 MB shared L2 memory
Links
ClearSpeed CSX600
Clearspeed has developed a highly parallel architecture for High Performance computing work, based around an array of 64-96 "poly" Processing Elements for arithmetic computation, as well as an 8 threaded "mono" processing unit designed for control tasks. Clearspeed's processing elements each contain:
- Double Precision FP Adder and Multiplier
- Integer ALU including MAC
- 6 KB SRAM & 128 Byte register file
- Next-neighbor connections to other Processing Elements
Interestingly, Clearspeed's architecture requires that the same instruction stream pass through each poly processing element, which in some ways blurs the definition of computing "core". However, each processing element can enable and disable changes to its state by pushing and popping predicate bits from a control stack, so the processing elements are capable of branching in a limited sense.
The Clearspeed architecture also contains memory controllers and external interfaces to form a complete system-on-chip.
Links
Cisco CRS-1 Metro
Cisco takes the idea of "processor as the NAND gate of the future" to an advanced level by using a massively many-core custom network processor in its highest end routers. These routers contain 192 customized Tensilica processors, each of which contains small instruction and data caches, a customized DMA engine which allows up to three outstanding DMA requests at any given time, and "Tens of KBs" of local instruction memory.
Links
Intel IXP
Intel has developed a series of network processors, the most advanced of which is the IXP2800 processor. The IXP2800 processor contains 16 multithreaded microengines specialized for networking dataplane operations, an XScale processor for controlplane operations, a Hash unit and a Crypto unit, a Scratchpad memory (16 KB) as well as network interfaces, 4 SRAM and 3 RDRAM memory controllers.
Each microengine operates at up to 1.4 GHz, and contains:
- A 128 entry register file
- An integer ALU with limited support for multiplication
- A hash unit
- Local memory (640 words)
- A small CAM
- 128 entry next neighbor registers to communicate with neighboring microengines
- Instruction memory (8KB)
Links
General Purpose Multiprocessors
Sun UltraSparc T1 - Niagara
Sun's Niagara architecture is adopted from Afara Websystems Inc, a startup
that pioneered the development of throughput-oriented microprocessor
technology optimized for commercial server applications. The 90nm Niagara
chip integrates eight cores onto one die, where each core has one
pipeline that can support four threads simultaneously with zero
context-switch overhead. It symbolizes a shift in the server microprocessor
design paradigm towards Fine Grained Chip Multi-threading.
The chip is
marketed as UltraSPARC T1 in the Sun Fire CoolThreads T1000 and T2000
servers.
The next generation Niagara2 architecture is due in 2007 in 65nm. With
eight cores, two pipelines per core, supporting eight threads per core, is
expected to double the performance of Niagara.
Links
Power5
IBM Power5 architecture is a dual core architecture that first debuted in 2003
in 0.13um at 2GHz. It is binary and structural compatible with its
predecessor Power4, and is scalable to a 64 physical processors, 128 core
systems. User can choose from a variety of packaging for the Power5 dual
core chip, ranging from the 4 core DualChipModule or DCMs (two dual core
chips on one package) to 8 cores MultiChipModule or MCMs (four chips on one
package, see picture) For high-end server systems, the
large 95mm x 95mm MCM contain four dual core chips1GHz inter-chip buses, and
144MB of L3 cache (36MB for each core).
Links
PWRficient
P.A. Semi's PWRficient family of 64-bit multicore processors is based on
the IBM Power architecture. It uses 5-13 watts of power while operating at
2GHz in 0.65um. PWRficient archetecture features two DDR2 memory
controllers, 2MB of L2 cache, and a flexible I/O subsystem for computing and
embedded applications. The I/O subsystem provides 24 configurable serdes
lanes for high-speed serial I/O. It may used for Express, XAUI, or SGMII
interconnect in a wide range of configurations. The 1682M includes 8 PCI
Express engines, supporting link widths of 1, 2, 4, 8, and 16 lanes for
general peripheral connection, with up to 4GB/s bandwidth per engine. The
two XAUI (10 Gigabit Ethernet) and four SGMII (10/100/1 Gigabit Ethernet)
protocol engines each feature packet processing, including line-rate packet
filtering, VLAN flow control, and TCP/IP acceleration.
Links
Multiprocessor Articles and News to Watch
| |