<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Tanmay Patil&apos;s Blog</title><description>A blog about programming, technology, and software development.</description><link>https://tanmaypatil123.github.io/</link><item><title>Inside a Blackwell 2-CTA GEMM: A Gluon Kernel Tour</title><link>https://tanmaypatil123.github.io/posts/2026/blackwell-2cta-gluon-gemm/</link><guid isPermaLink="true">https://tanmaypatil123.github.io/posts/2026/blackwell-2cta-gluon-gemm/</guid><description>A line-by-line, interactive walk through a 2-CTA Gluon GEMM for NVIDIA Blackwell (B200): TMA, Tensor Memory, tcgen05 MMA, mbarriers, and software pipelining.</description><pubDate>Wed, 17 Jun 2026 10:00:00 GMT</pubDate></item><item><title>RMSNorm Backward: From Derivation to a Triton Kernel</title><link>https://tanmaypatil123.github.io/posts/2025/rmsnorm-backward-derivation-triton-kernel/</link><guid isPermaLink="true">https://tanmaypatil123.github.io/posts/2025/rmsnorm-backward-derivation-triton-kernel/</guid><description>Derive RMSNorm backward step by step and implement a Triton kernel with PyTorch, numerics tips</description><pubDate>Sat, 01 Nov 2025 13:30:00 GMT</pubDate></item></channel></rss>