Thesis Examination Committee
Prof Yang LU, ECON/HKUST (Chairperson)
Prof Jiang XU, ECE/HKUST (Thesis Supervisor)
Prof Fung Yu YOUNG, Department of Computer Science and Engineering, The Chinese University of Hong Kong (External Examiner)
Prof Ross MURCH, ECE/HKUST
Prof Ming LIU, ECE/HKUST
Prof Kai CHEN, CSE/HKUST
Rack-scale computing systems are expected to meet the computation and energy requirements of big data and emerging large-scale applications. They need to efficiently coordinate both on-chip and off-chip resources from hundreds of multi-core processors and memory/storage. The intra-chip and inter-chip communication networks are critical to improving the coordination efficiency and computing system performance. Optical interconnects are promising to address these challenges due to their superiority in bandwidth, latency, and energy consumption compared to electrical interconnects.
For this end, in this dissertation, we investigate various design concerns of optical networks, including the intra/inter-chip optical network architectural designs, path reservation, and control in the multi-domain circuit switching. To take advantages of optical interconnects for both intra-chip and inter-chip communication and break the performance gap between on-chip and off-chip network, we propose the architectural design of unified intra/inter-chip optical network for a single server node and an intra/inter-chip optical network architecture for rack-scale computing systems (RSON). The inter-chip communication flows and circuit switching control for optical networks can cause severe performance degradation if not properly designed. This is especially true when multiple domains involve in communication. We propose a forward propagation strategy that parallels the path reservation process with the application level inter-chip connection setup for the underlying optical network fabric. This can optimize the connection setup and path reservation procedure. A preemptive chain feedback (PCF) scheme is proposed to manage the network resource reservation and release effectively. This solution increases the network resources utilization while minimizing overheads during path reservations. The proposed architecture and techniques are holistically evaluated via cycle-accurate full-system simulator driven by statistic application models. Experimental results show that RSON with the PCF control scheme can greatly improve network throughput and reduce the energy consumption per unit performance compared to the baseline architecture and control schemes.