ThunderKittens is a framework to make it easy to write fast deep learning kernels in CUDA (and, soon, ROCm and others, too!) ThunderKittens is built around three key principles: Simplicity.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results