Tanmay D. Patil
RSS FeedHey , I am ML engineer from India loves to work on MLOps and optimization.
Read the blog posts or check Github for more info.
Featured
-
Inside a Blackwell 2-CTA GEMM: A Gluon Kernel Tour
A line-by-line, interactive walk through a 2-CTA Gluon GEMM for NVIDIA Blackwell (B200): TMA, Tensor Memory, tcgen05 MMA, mbarriers, and software pipelining.
-
RMSNorm Backward: From Derivation to a Triton Kernel
Derive RMSNorm backward step by step and implement a Triton kernel with PyTorch, numerics tips