Now this post will be for some serious stuff involving video compression. Early this year I decided to make a lossless compression IP core for my camera in case one day I make it for video. And because it’s for video, the compression has to be stream operable and real time. That means, you cannot save it to DDR ram and do random lookup during compression. JPEG needs to at least buffer 8 rows as it does compression on 8×8 blocks. Other complex algorithm such as H264 requires even larger transient memory for inter frame look up. Most of these lossy compression cores consume a lot of logic resource which my cheap Zynq 7010 doesn’t have, or not up to the performance when fitting into a small device. Also I would prefer lossless than lossy video stream.
There’s an algorithm every RAW image format uses but rarely implemented in common image format. NEF, CR2, DNG, you name it. It’s the Lossless JPEG defined in 1993. The process is very simple: use the neighboring pixels’ intensity to predict the current pixel you’d like to encode. In another word, let’s record the difference instead of the full intensity. It’s so simple yet powerful (up to 50% size reduction) because most of our images are continuous tone or lack high spatial frequency details. This method is called Differential pulse-code modulation (DPCM). A Huffman code is then attached in front to record the number of digits.
Sounds easy huh? But once I decided to get it parallel and high speed, the implementation will be very hard. All the later bits have to be shifted correctly for a contiguous bit stream. Timing is especially of concern when the number of potential bits gets large when data is running in high parallel. So I smash the process into 8 pipeline stages in locked steps. 6 pixels are processed simultaneously at each clock cycle. At 12 bit, the worst possible bit length will be 144. That is 12 for Huffman and 12 for differential code each. The result needs to go into a 64bit bus by concatenating with the leftover bits from the previous clock cycle. A double buffer is inserted between the concatenator and compressor. FIFOs are employed up and downstream of the compression core to relieve pressure on the AXI data bus.
Read more: The Making of a Cooled CMOS Camera