Application | Machine-learning |
Technology | 65 |
Manufacturer | UMC |
Type | Semester Thesis |
Package | QFN56 |
Dimensions | 2626μm x 1252μm |
Gates | 1 MGE |
Voltage | 1.2 V |
Power | 740 mW @1.2V 700MHz |
Clock | 700 MHz |
Origami is an Application-Specific Integrated Circuit (ASIC) designed to accelerate the computation of Convolutional Neural Networks (CNNs). It is intended to be used as a coprocessor, allowing the CPU to offload the time and energy intensive calculation of the necessary convolutions, thereby speeding up the processing of the network and increasing the overall energy efficiency.
The unique architecture allows the simultaneous computation of 8 Convolutional Neural Network Channels from up to 8 Input Channels and 64 Filters. Each input image is convolved with 8 different filters (one for each of the 8 output channels), and the results for each output channel are accumulated. By moving the channel summation onto the chip, origami reduces the necessary IO bandwidth and maximizes overall throughput.
All data is streamed over a high-speed synchronous 12-bit parallel interface with separate input and output ports for maximum throughput. This allows the chip to reach a sustained performance of more than 270 GOp/s, which is fast enough to calculate a medium-sized CNN on a Full HD video stream in real-time.
Possible application scenarios include object recognition and scene labeling in mobile and embedded systems, for example in autonomous vehicles, Unmanned Aerial Vehicles, but also smartphones and tablets.