Neural Networks (NNs) have been widely employed in modern artificial intelligence (AI) systems due to their unprecedented capability in classification, recognition and detection. However, the massive data communication between the processing units and the memory has been proven to be the main bottleneck to improve the efficiency of NNs based hardware. Furthermore, the significant power demand for massive addition and multiplication limits its adoption at the edge devices. In addition, the cost is another major concern for an edge device. Therefore, an edge neural processing chip with simultaneous low power, high performance, low cost is in urgent need for the fast-growing AI-and-IoT (AIoT) market. In this talk, we will introduce an ultra-low-power neural processing SoC chip in 40nm with computing-in-memory technology. We have designed, fabricated, and tested this chip based on 40nm eFlash technology. It solves the data processing and communication bottlenecks in NNs with computing-in-memory technology. Furthermore, It combines classic digital solution together with the analog computing-in-memory macro to achieve 12-bit high-precession computing. To enable a sub-mW system in AIoT applications, a Risc-V micro-processor with DSP instruction was designed with dynamic-voltage-and-frequency-scaling (DVFS) to adapt with various low-power and real-time computing tasks. The chip supports multiple NNs including DNN, TDNN, and RNN for different applications, e.g., smart voice, and health monitoring.