ThunderKittens is a framework to make it easy to write fast deep learning kernels in CUDA (and, soon, ROCm and others, too!) ThunderKittens is built around three key principles: Simplicity.