Work Experience
                    
                      Cartesia.AI, San Francisco - Research Intern  
                      Core Research and Modelling Team     May '25 - Aug '25  
                       
                        - Distilling Sonic, Cartesia's multimodal TTS foundation model, into compact variants: Turbo and Smol (on-device).
 
                        - Exploring distillation techniques including forward/reverse KL, patient teacher, and progressive distillation.
 
                        - Researching cross-architecture distillation from full-attention to efficient decoding heads such as SWA and Mamba.
 
                       
                    
                    
                      Samsung Electronics, South Korea - Research Engineer  
                      Generative AI Team     Sept '23 - Aug '24  
                       
                        - Fine-tuned Samsung Gauss-i using LORA and Dreambooth for generating themed images for TV backgrounds.
 
                        - Generated text-to-image datasets using prompt engineering and SDXL, filtered using CLIPScore and OpenCV tools.
 
                        - Leveraged LLMs with Retrieval-Augmented Generation to improve user experience on Bixby (AI bot).
 
                        - Received the Best SW Project Award of the Year 2023/24, given to the top 2 projects in the GenAI division.
 
                       
                    
                    
                      Samsung Electronics, South Korea - Research Associate  
                      Visual Display, AI     May - Jul '22  
                       
                        - Designed supervised Time Series and unsupervised Clustering models to predict television replacement.
 
                        - Incorporated Key Performance Indicators from sensors with RFM values, improving performance by 6%.
 
                       
                    
                    
                      KnowDis, Delhi - Data Science Intern  
                      Query Search and Recommendation Engine     May - Jul '21  
                       
                        - Worked on a multi-classification problem, fine-tuned fastText model to generate embeddings.
 
                        - Implemented a framework using faiss and annoy libraries to optimize nearest neighbor search for query embedding.
 
                        - Built a scalable architecture where a batch of queries was processed in 55 secs compared to the baseline's 140 mins.
 
                       
                    
                   | 
                 
              
             
          
              
              
                Research
                
                    I'm passionate about machine learning, multimodal AI, computer vision, reinforcement learning, and graph neural networks. My research focuses on enhancing reasoning in Vision Language Models (VLMs) and building robust multimodal agents.  
                 
                
                    I have worked on diverse topics, including adversarial robustness of multimodal agents, NeuroCUT—a GNN-based method for graph cut extraction, and retrieval-augmented generation (RAG) systems. My goal is to develop intelligent, reliable AI systems that improve human efficiency and enable new possibilities.
                 
             |  
           
          
            
            
              | 
                
               | 
              
                
                  Dissecting Adversarial Robustness of Multimodal LM Agents
                
                 
                Chen Wu,
                Rishi Shah,
                Jing Yu Koh,
                Ruslan Salakhutdinov,
                Daniel Fried,
                Aditi Raghunathan
                 
                ICLR'25 (Oral at NeurIPS 2024 Open World Agents Workshop), advised by Prof. Aditi Raghunathan
                 
                project page
                /
                arXiv
                
                
                  Explored adversarial robustness of Vision-enabled Language Models (VLMs) in multimodal agent tasks, exposing vulnerabilities to imperceptible image perturbations (<5% pixels), enabling targeted attacks with 67% success rates.  
      Proposed the Agent Robustness Evaluation (ARE) framework, which models agents as graphs, decomposing robustness and pinpointing vulnerabilities.  
      Released VisualWebArena-Adv, a benchmark with 200 adversarial tasks, revealing that evaluators and value functions reduce ASR, while advanced agent techniques like tree search amplify vulnerabilities.
                 
               | 
               
            
              | 
                
               | 
              
                
                  NEUROCUT: Neural Approach for Robust Graph Partitioning
                
                 
                Rishi Shah,
                Krishnanshu Jain,
                Sahil Manchanda,
                Sourav Medya,
		Sayan Ranu
                 
                KDD'24, advised by Prof. Sayan Ranu
                 
                GitHub
                /
                arXiv
                
                
                  Designed a novel GNN-based inductive architecture to find near-optimal solutions for cut-related NP-hard problems.  
                  Developed an RL-based auto-regressive framework using policy gradient methods to optimize graph partitions.  
                  The framework generalizes to any cut objective and specified number of partitions, outperforming all baselines.
                 
               | 
             
            
            
            
              | 
                
               | 
              
                
                  Packet Routing using Multi-Agent Reinforcement Learning
                
                 
                Aniket Modi,
                Rishi Shah,
                Krishnanshu Jain,
                Rajeev Shorey
                 
                COMSNETS'23, advised by Prof. Rajeev Shorey
                 
                GitHub
                /
                IEEE Page
                
                
                  Developed a Multi-Agent RL framework to optimize packet routing in UAV-based IoT networks.  
                  Designed and trained DDQNs with a novel cross-agent reward function to enhance collaboration among agents.  
                  Achieved a 48.7% throughput gain over baseline methods, demonstrating significant efficiency improvements.
                 
               | 
             
            
              
                
                  
                    Education
            
                    
                      Carnegie Mellon University, Pittsburgh  
                      Masters in Machine Learning, GPA - 4.29/4     Aug '24 - Dec '25  
                     
                    
                      - Machine Learning Department Top 1%, highest GPA among all master's in machine learning students.
 
                      - Key Courses: Multimodal Machine Learning, Deep Learning Systems, Probabilistic Graphical Models
 
                     
            
                    
                      Indian Institute of Technology, Delhi  
                      Bachelors in Computer Science and Engineering, GPA - 9.564     Jul '19 - May '23  
                     
                    
                      - All India Rank 75 in JEE Advanced 2019 among 200,000+ applicants from all over India.
 
                      - Computer Science Department Top 10% in 5th, 6th, 7th, and 8th semester with SGPA of 9.7+ at IITD.
 
                      - Key Courses: Data Structures and Algorithms, Probability and Stochastic Processes, Linear Algebra, Machine Learning, NLP, AI, Computer Vision, Parallel Programming, Operating Systems.
 
                      - Competitive Programming: Secured Country Rank 12, Institute Rank 1 in HashCode; Country Rank 47, Institute Rank 1 in ICPC Regionals; Candidate Master on Codeforces.
 
                     
            
                   | 
                 
              
             
            
            
          
					
          
          
            
              
                 
                
                  Borrowed from Jon Barron website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
                 
               | 
             
           
        
      
     
   |