Tag: triton
All the articles with the tag "triton".
-
Inside a Blackwell 2-CTA GEMM: A Gluon Kernel Tour
A line-by-line, interactive walk through a 2-CTA Gluon GEMM for NVIDIA Blackwell (B200): TMA, Tensor Memory, tcgen05 MMA, mbarriers, and software pipelining.
-
RMSNorm Backward: From Derivation to a Triton Kernel
Derive RMSNorm backward step by step and implement a Triton kernel with PyTorch, numerics tips