The Ring Array Processor (RAP): Algorithms and Architecture

We have designed and implemented a Ring Array Processor (RAP) for fast implementation of our continuous speech recognition training algorithms which are currently dominated by layered neural network calculations. The RAP is a multi-DSP system with a low-latency ring interconnection scheme using programmable gate array technology and a significant amount of local memory per node (4-16 MBytes of dynamic memory and 256 KByte of fast static RAM). Theoretical peak performance is 128 MFlops/board, and test runs with the first working board show a sustained throughput of roughly 30-90 percent of this for algorithms of current interest.This report describes the motivation for the RAP design, and shows how the architecture matches the target algorithm. Technical reports from other members of the RAP team focus on the hardware and software specifics for the system.

