Thesis Examination Committee
Prof Tao LIU, PHYS/HKUST (Chairperson)
Prof Chi Ying TSUI, ECE/HKUST (Thesis Supervisor)
Prof Chiu Sing CHOY, Electrical and Electronics Engineering, The Chinese University of Hong Kong (External Examiner)
Prof Wei ZHANG, ECE/HKUST
Prof Ross MURCH, ECE/HKUST
Prof Siu Wing CHENG, CSE/HKUST
With the development of artificial intelligence, the deep learning has become the main stream for a wide range of machine learning applications. However, the high computational complexity and large memory footprint of deep learning pose a challenge to the extensive deployment in the energy-stringent embedded platforms. In this thesis, we will focus on the exploration of energy-efficient solution to the deep leaning accelerators from the perspective of algorithmic and architectural ends.
We first address the issue of high computational complexity of the deep learning algorithm. The conventional low rank approximation (LRA) is adopted to bypass the redundant operations within the inference phase of deep learning. The single instruction multiple data (SIMD) architecture is modified to take advantage of the reduced operations to improve both throughput and energy efficiency. Next, we propose a novel end-to-end training algorithm to improve the performance of the conventional LRA approach with the better computation efficiency. To address the better hardware scalability and the sparsity of the deep learning algorithm, SparseNN, a fully distributed architecture using dedicated on-chip network, is proposed. Thirdly, we address the issue of large memory footprint for the deep learning algorithm by compressing the weights. To improve the hardware efficiency, we enhance the conventional hashing compression technique by introducing another level of spatial locality. An FPGA prototype on SIMD architecture is demonstrated that a faster inference throughput can be achieved compared with the direct implementation on CPU and GPU. Fourthly, we enhance SparseNN by combining triple levels of sparsity together: the input activation sparsity, the output activation sparsity and the weight sparsity. A new compression algorithm, SparserNN, is presented to further reduce the computational complexity of neural networks.