|
|
|
A2P based Media Processor
There are several key factors that determine the efficiency of SIMD systems that are often overlooked. The overall bandwidth to memory and the re-arrangement of that data within the very wide data word and data structure being operated upon. The A2P SIMD plug-in includes a self-routing, non-blocking permutation network that allows the free re-arrangement of data with the SIMD word. This is crucial for high performance of many parallel algorithms including the "zig-zag" re-ordering of many video compression algorithms. All this high performance compute power creates an incredible thirst for data and the system must be capable of supplying the data at a rate consistent with the performance goals. Through detailed modeling of the system under design, using the A2P profiling toolset, the correct amount of bandwidth can be identified and thus provided by the rest of the system. This may involve very-wide data channels to large on-chip and off-chip memory subsystems to move data in and out of the dual-port RAMs of the SIMD plug-in. The movement of data is often best controlled directly by the algorithm being executed and the A2P supports a DMA mechanism that can schedule data movement in and out of the RAMs under a synchronization protocol from the CPU core. The largest computation difficulty typical of SIMD systems is the management of dynamic range of a particular algorithm often the dynamic range or the computational speed is compromised for the sake of the other. The A2P SIMD plug-in, however, supports double width accumulation within packed words such that, for example, the accumulation of 8-bit data can be to a 16-bit accumulator. That is, in a 256-bit wide engine 32 8-bit values may be operated upon and accumulated into a double wide accumulator of 32 16-bit values or 512 bits. Operations are also provided to scale these accumulators back to their original size or to repack the data into two words of double size values. |
|
Copyright © 2005 Advanced Architectures |