As recently studied, serialized competition overhead for entering critical section is more dominant than critical section execution itself in limiting performance of multi-threaded shared variable applications on NoC-based many-cores. We illustrate that the invalidation-acknowledgement delay for cache coherency between the home node storing the critical section lock and the cores running competing threads is the leading factor to high competition overhead in lock spinning, which is realized in various spin-lock primitives (such as the ticket lock, ABQL, MCS lock, etc.) and the spinning phase of queue spin-lock (QSL) in advanced operating systems. To reduce such high lock coherence overhead, we propose in-network packet generation (iNPG) to turn passive "normal" NoC routers which only transmit packets into active "big" ones that can generate packets. Instead of performing all coherence maintenance at the home node, big routers which are deployed nearer to competing threads can generate packets to perform early invalidation-acknowledgement for failing threads before their requests reach the home node, shortening the protocol round-trip delay and thus significantly reducing competition overhead in various locking primitives. We evaluate iNPG in Gem5 using PARSEC and SPEC OMP2012 programs with five different locking primitives. Compared to a state-of-the-art technique accelerating critical section access, experimental results show that iNPG can effectively reduce lock coherence overhead, expediting critical section access by 1.35x on average and 2.03x at maximum and consequently improving the program Region-of-Interest (ROI) runtime by 7.8% on average and 14.7% at maximum.
Zhonghai Lu is an associate professor with the School of Electrical Engineering and Computer Science (EECS), KTH Royal Institute of Technology, Stockholm, Sweden. He received the BSc. degree in Radio & Electronics from Beijing Normal University, China, in 1989, the MSc. degree in System-on-Chip Design and the PhD degree in Electronic and Computer Systems from KTH in 2002 and 2007, respectively. From 1989 to 2000, he was an engineer in the area of Electronic and Embedded Systems. His research interests include interconnection networks, performance analysis, design automation and real-time systems. He has published over 160 peer-reviewed journal and international conference papers including more than 30 prestigious IEEE/ACM transaction papers. He received the Best Paper Award at NOCS 2015 and was nominated for the Best Paper Award at HPCA’2018 and ICCAD’2009. He also received EU HiPEAC Paper Awards in year 2016 and 2018.