Hi, I’m a researcher currently at AMD Research, where I focus on optimizing large-scale machine learning systems by refining concurrent computation and communication on GPUs—often through techniques like offloading tasks to SDMA engines. Under the guidance of Shaizeen Aga, Suchita Pati, and Mahazabeen Islam, I co-authored the paper Optimizing ML Concurrent Computation and Communication with GPU DMA Engines, bridging the gap between theoretical and real-world performance.
Previously, I completed my Master’s in Data Science at UC San Diego, which strengthened my foundation in distributed computing and systems-level optimization. Before that, at Sahaj AI, I managed Spark clusters and Kubernetes deployments, honing my ability to scale models in fast-paced, reliability-critical environments.
Looking ahead, I plan to design efficient, context-rich AI architectures that integrate extended user context while remaining computationally feasible and energy-efficient. By combining systems-level optimizations, robust distributed pipelines, and hardware-aware design, I aim to turn ML breakthroughs into impactful, real-world solutions at scale.
Open Source
[GitHub] Systems For ML Reading List
[GitHub] Academic project for Cifar-100 classification.
[GitHub] Academic project for PASCAL-VOC-2007 Segmentation.
[GitHub] MNIST classfication from Scratch.