A recent line of research on deep learning shows that the training of extremely wide neural networks can be characterized by a kernel function called neural tangent kernel (NTK). However, it is known that this type of result does not perfectly match the practice, as NTK-based analysis requires the network weights to stay very close to their initialization throughout training, and cannot handle regularizers or gradient noises. In this talk, I will present a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a ``kernel-like'' behavior. This implies that the training loss converges linearly up to a certain accuracy. I will also discuss the generalization error of an infinitely wide two-layer neural network trained by noisy gradient descent with weight decay.