Massively Parallel Processor Arrays
The Ambric architecture is a member of an emerging class of massively-parallel chips called Massively Parallel Processor Arrays or MPPAs. MPPA is a generic name for a category of chip, much like FPGAs, ASICs, or DSPs (so Ambric does not use MPPA as a brand name.)
MPPAs are distinguished from 'multi-core' general-purpose processors. These conventional multi-core devices typically have only a few processors and a shared-memory, shared-bus architecture. Most multi-core devices have been aimed at general-purpose computing, for running huge existing applications and complex operating systems. By contrast, MPPAs employ massive parallelism of at least hundreds of peer-to-peer processing elements such as complete processors, ALUs, finite state machines and distributed memories, and a rich word-wide flexible interconnect fabric which can be statically, or sometimes dynamically, reconfigured. MPPAs are usually aimed at embedded computing, where they run high-performance dedicated applications such as media or network processing, which have strict cost, power and real-time requirements.
Starting with some standard semiconductor market categorization from Gartner Group, here is a diagram suggesting where MPPAs fit.

How do MPPAs differ? To start with, there are two classes of MPPA chips, SIMD (single instruction, multiple data) and MIMD (multiple instructions, multiple data).
SIMD MPPAs only have a single instruction stream (or very few) running the processors the same way on similar data. Some SIMD chips can mask off a few different parts of the array at runtime but it's still basically SIMD.
SIMD is very efficient for simple DSP filters and other regular vector processing. It used to be good for media and network processing, but modern applications like H.264 video compression and adaptive network intrusion detection get more powerful by getting more complex and less regular. H.264 video codecs for example have many components with extremely complex looping, forward and backward data dependencies and highly variable, data-dependent compute-times, which are ill-suited to SIMD MPPAs.
MIMD MPPAs are much more capable. Every processor can have its own parallel instruction stream running on its own data. (A MIMD MPPA running the same code everywhere becomes SIMD). MIMD's diversity of parallel control is most effective for today's applications. It's good for many data structures, not just vectors – its processors can stay busy on varied types of data and data sizes.
However, previous MIMD MPPA chips haven't been very practical to program, requiring the developer to deal explicitly with synchronization, and use hardware-type, timing-sensitive languages and tools, or they depend on 'system compilers' that are often regarded as open-ended research projects – as open-ended as artificial intelligence.
The Ambric chip enables a practical MIMD-parallel programming model, with innovative asynchronous channels that remove the global synchronization problem. Explicit task-scheduling is simply not necessary. Complex global state-machines don't have to be coded and debugged. Block diagram structures of objects which are written in standard high-level language single-threaded, sequential code are reasonable and practical to develop and handle even the most complex applications. |