Hi, I’m a PhD student in Computer Science at Georgia Tech (SAIL Lab), advised by Prof. Alexey Tumanov. My research focuses on efficient LLM serving, communication/computation overlap, and hardware-aware scheduling across NVIDIA and AMD stacks. I’m also building Project Vajra, a next-gen AI inference system aimed at practical, scalable, and energy-aware deployment. I’ll be a GTA for ML-Sys this semester.
Previously at AMD Research, I co-authored the paper Optimizing ML Concurrent Computation and Communication with GPU DMA Engines, working under the guidance of Shaizeen Aga, Suchita Pati, and Mahazabeen Islam. This work leveraged GPU SDMA engines to accelerate concurrent computation-communication, translating theory into measurable throughput gains.
Before that, I completed my Master’s in Data Science at UC San Diego and operated Spark/Kubernetes infrastructure at Sahaj AI, honing my ability to scale models in fast-paced, reliability-critical environments. Through systems-level optimizations, robust distributed pipelines, and hardware-aware design, I aim to turn ML breakthroughs into impactful, real-world solutions at scale.
Open Source
[GitHub] Systems For ML Reading List
[GitHub] Academic project for Cifar-100 classification.
[GitHub] Academic project for PASCAL-VOC-2007 Segmentation.
[GitHub] MNIST classfication from Scratch.





