Rishi Shah

I'm a Master's student in Machine Learning at Carnegie Mellon University, working with Prof. Aditi Raghunathan on enhancing reasoning in Vision Language Models (VLMs). My research focuses on building robust multimodal agents and advancing AI systems.

Previously, I worked at Samsung HQ in South Korea, where I developed LoRA-tuned Stable Diffusion models and retrieval-augmented generation (RAG) systems. During my undergraduate studies at IIT-Delhi, I developed NeuroCUT, a novel GNN-based method for graph cut extraction, under the guidance of Prof. Sayan Ranu and Prof. Sourav Medya.

Email  /  CV  /  Scholar  /  Twitter  /  Github

profile photo

Research

I'm passionate about machine learning, multimodal AI, computer vision, reinforcement learning, and graph neural networks. My research focuses on enhancing reasoning in Vision Language Models (VLMs) and building robust multimodal agents.

I have worked on diverse topics, including adversarial robustness of multimodal agents, NeuroCUT—a GNN-based method for graph cut extraction, and retrieval-augmented generation (RAG) systems. My goal is to develop intelligent, reliable AI systems that improve human efficiency and enable new possibilities.

Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan
ICLR'25 (Oral at NeurIPS 2024 Open World Agents Workshop), advised by Prof. Aditi Raghunathan
project page / arXiv

Explored adversarial robustness of Vision-enabled Language Models (VLMs) in multimodal agent tasks, exposing vulnerabilities to imperceptible image perturbations (<5% pixels), enabling targeted attacks with 67% success rates. Proposed the Agent Robustness Evaluation (ARE) framework, which models agents as graphs, decomposing robustness and pinpointing vulnerabilities. Released VisualWebArena-Adv, a benchmark with 200 adversarial tasks, revealing that evaluators and value functions reduce ASR, while advanced agent techniques like tree search amplify vulnerabilities.

NEUROCUT: Neural Approach for Robust Graph Partitioning
Rishi Shah, Krishnanshu Jain, Sahil Manchanda, Sourav Medya, Sayan Ranu
KDD’24, advised by Prof. Sayan Ranu
GitHub / arXiv

Designed a novel GNN-based inductive architecture to find near-optimal solutions for cut-related NP-hard problems. Developed an RL-based auto-regressive framework using policy gradient methods to optimize graph partitions. The framework generalizes to any cut objective and specified number of partitions, outperforming all baselines.

Packet Routing using Multi-Agent Reinforcement Learning
Aniket Modi, Rishi Shah, Krishnanshu Jain, Rajeev Shorey
COMSNETS’23, advised by Prof. Rajeev Shorey
GitHub / IEEE Page

Developed a Multi-Agent RL framework to optimize packet routing in UAV-based IoT networks. Designed and trained DDQNs with a novel cross-agent reward function to enhance collaboration among agents. Achieved a 48.7% throughput gain over baseline methods, demonstrating significant efficiency improvements.

Work Experience

Samsung Electronics, South Korea - Research Engineer Generative AI Team     Sept '23 - Aug '24

  • Fine-tuned Samsung Gauss-i using LORA and Dreambooth for generating themed images for TV backgrounds.
  • Generated text-to-image datasets using prompt engineering and SDXL, filtered using CLIPScore and OpenCV tools.
  • Leveraged LLMs with Retrieval-Augmented Generation to improve user experience on Bixby (AI bot).
  • Received the Best SW Project Award of the Year 2023/24, given to the top 2 projects in the GenAI division.

Samsung Electronics, South Korea - Research Associate Visual Display, AI     May - Jul '22

  • Designed supervised Time Series and unsupervised Clustering models to predict television replacement.
  • Incorporated Key Performance Indicators from sensors with RFM values, improving performance by 6%.

KnowDis, Delhi - Data Science Intern Query Search and Recommendation Engine     May - Jul '21

  • Worked on a multi-classification problem, fine-tuned fastText model to generate embeddings.
  • Implemented a framework using faiss and annoy libraries to optimize nearest neighbor search for query embedding.
  • Built a scalable architecture where a batch of queries was processed in 55 secs compared to the baseline's 140 mins.

Education

Carnegie Mellon University, Pittsburgh Masters in Machine Learning, GPA - 4.25/4     Aug '24 - Dec '25

  • Machine Learning Department Top 1%, highest GPA among all master's in machine learning students.
  • Key Courses: Multimodal Machine Learning, Deep Learning Systems, Probabilistic Graphical Models

Indian Institute of Technology, Delhi Bachelors in Computer Science and Engineering, GPA - 9.564     Jul '19 - May '23

  • All India Rank 75 in JEE Advanced 2019 among 200,000+ applicants from all over India.
  • Computer Science Department Top 10% in 5th, 6th, 7th, and 8th semester with SGPA of 9.7+ at IITD.
  • Key Courses: Data Structures and Algorithms, Probability and Stochastic Processes, Linear Algebra, Machine Learning, NLP, AI, Computer Vision, Parallel Programming, Operating Systems.
  • Competitive Programming: Secured Country Rank 12, Institute Rank 1 in HashCode; Country Rank 47, Institute Rank 1 in ICPC Regionals; Candidate Master on Codeforces.

Miscellanea

Teaching

Teaching Assistant, 11868 LLM Systems, Spring'25, CMU
Teaching Assistant, COL106 Data Structures and Algorithms, Spring'23, IITD

Borrowed from Jon Barron website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.