| |
Scalable HMM based Inference Engine in LVCSR
Jike Chong
UC Berkeley Parallel Lab
Tuesday, March 31, 2009
12:30
Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this paper we explore a
design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our
implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with
millions of states and arcs. The traversal is guided by a sequence of input audio features that continuously changes the data working
set at runtime. The challenge is not only to define a software architecture that exposes sufficient fine-grained application concurrency, but also to efficiently synchronize between an increasing number of concurrent tasks and to effectively utilize the parallelism opportunities in today's highly parallel processors. We explore two important parallelization challenges for graph traversal in the context of an inference engine: efficient synchronization between concurrent tasks and effective utilization of Single-Instruction-Multiple-Data (SIMD) parallelism. We propose two application-level implementation alternatives for each of the parallelization challenges and compose them to arrive at four unique
algorithm styles. We construct highly optimized implementations of the algorithm styles on two parallel platforms: an Intel Core i7
multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation
platform. On 44 minutes of speech data set, we demonstrate substantial speedups of 3.4x on Core i7 and 10.5x on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.
|
|