Computational Photography for Efficient Image and Video Acquisition
Room 2463 (Lifts 25-26), 2/F Academic Building, HKUST

Thesis Examination Committee

Prof Rachel Quan ZHANG, IEDA/HKUST (Chairperson)
Prof Amine BERMAK, ECE/HKUST (Thesis Supervisor)
Prof Pedro SANDER, CSE/HKUST (Thesis Co-supervisor)
Prof Hongbo FU, School of Creative Media, City University of Hong Kong (External Examiner)
Prof Bertram SHI, ECE/HKUST
Prof Huamin QU, CSE/HKUST



Image sensors have become ubiquitous because of the increasing need for entertainment demand, mobile applications, Internet of things (IoTs) and auto-driving vehicles. In addition to the requirement for improved image quality, the power consumption is also becoming a key design factor especially for the wireless sensor networks (WSNs) where sensors need to be deployed at large scale. Substantial research work has already been proposed for low power CMOS image sensors. Unlike previous work that optimizes the power consumption through circuitry techniques, this dissertation rethinks the imaging system and introduces methods to achieve extremely low power image acquisition through computational photography techniques. 

First, we propose a lossy image compression algorithm called Microshift which achieves state-of-the-art on-chip compression performance while preserving hardware friendliness. To implement this algorithm, we propose a hardware architecture and validate it on FPGA. The results on the ASIC design further validates the power efficiency. The sensor achieves power as low as 59.7 pJ/(pixel frame) while running on 1530 frames per second. To enable high-performance decompression, we propose two methods. One is based on Markov random field optimization which provides PSNR > 34dB for a 1.25bit/pixel image. 

Second, we propose DenResUnet to enhance the bit-depth information so that ADCs for the image sensor quantize fewer bits. The DenResUnet adopts extensive residual learning structure, which greatly improves the perceptual visual quality. Furthermore, we develop an extension which decompresses the Microshift images in real-time. Extensive experiments demonstrate that high-quality results can be obtained even from 1 bit/pixel images.

Third, we propose to adaptively change the sensor sampling rate for aggressive power saving and interpolate the intermediate frames computationally. We propose to establish the dense correspondence between two frames through halfway domain optimization. To account for large displacement, sparse correspondence is jointly considered for the correspondence optimization. This method is validated on real scene images and demonstrates superior robustness to large displacement and image noises than other methods.

Last but not least, we further propose to colorize the videos by propagating the color from a reference image to the subsequent frames. Since the image sensor only needs to capture one color frame, the transmission bandwidth can be greatly reduced. To reconstruct high quality colorized videos, we propose the ColorNet. To further improve the perceptual quality we adopt generative adversarial network (GAN) technique. Our method can also generate consistent improved video results. Our work outperforms the previous methods both quantitatively and quantitatively, demonstrating photo-realistic reconstruction quality.

Room 2463 (Lifts 25-26), 2/F Academic Building, HKUST