Rishi Shah

I'm a Master's student in Machine Learning at Carnegie Mellon University, working with Prof. Aditi Raghunathan on enhancing reasoning in Vision Language Models (VLMs). My research focuses on building robust multimodal agents and advancing AI systems.

Previously, I worked at Samsung HQ in South Korea, where I developed LoRA-tuned Stable Diffusion models and retrieval-augmented generation (RAG) systems. During my undergraduate studies at IIT-Delhi, I developed NeuroCUT, a novel GNN-based method for graph cut extraction, under the guidance of Prof. Sayan Ranu and Prof. Sourav Medya.

Email  /  CV  /  Scholar  /  Twitter  /  Github

profile photo

Research

I'm passionate about machine learning, multimodal AI, computer vision, reinforcement learning, and graph neural networks. My research focuses on enhancing reasoning in Vision Language Models (VLMs) and building robust multimodal agents.

I have worked on diverse topics, including adversarial robustness of multimodal agents, NeuroCUT—a GNN-based method for graph cut extraction, and retrieval-augmented generation (RAG) systems. My goal is to develop intelligent, reliable AI systems that improve human efficiency and enable new possibilities.

Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, David Fried, Aditi Raghunathan
ICLR'25, advised by Prof. Aditi Raghunathan
project page / arXiv

Explored adversarial robustness of Vision-enabled Language Models (VLMs) in multimodal agent tasks, exposing vulnerabilities to imperceptible image perturbations (<5% pixels), enabling targeted attacks with 67% success rates. Proposed the Agent Robustness Evaluation (ARE) framework, which models agents as graphs, decomposing robustness and pinpointing vulnerabilities. Released VisualWebArena-Adv, a benchmark with 200 adversarial tasks, revealing that evaluators and value functions reduce ASR, while advanced agent techniques like tree search amplify vulnerabilities.

NEUROCUT: Neural Approach for Robust Graph Partitioning
Rishi Shah, Krishnanshu Jain, Sahil Manchanda Sayan Ranu, Sourav Medya
KDD’24, advised by Prof. Sayan Ranu
GitHub / arXiv

Designed a novel GNN-based inductive architecture to find near-optimal solutions for cut-related NP-hard problems. Developed an RL-based auto-regressive framework using policy gradient methods to optimize graph partitions. The framework generalizes to any cut objective and specified number of partitions, outperforming all baselines.

Packet Routing using Multi-Agent Reinforcement Learning
Aniket Modi, Rishi Shah, Krishnanshu Jain, Rajeev Shorey
COMSNETS’23, advised by Prof. Rajeev Shorey
GitHub / IEEE Page

Developed a Multi-Agent RL framework to optimize packet routing in UAV-based IoT networks. Designed and trained DDQNs with a novel cross-agent reward function to enhance collaboration among agents. Achieved a 48.7% throughput gain over baseline methods, demonstrating significant efficiency improvements.

Work Experience

Samsung Electronics, South Korea - Research Engineer Generative AI Team     Sept '23 - Aug '24

  • Fine-tuned Samsung Gauss-i using LORA and Dreambooth for generating themed images for TV backgrounds.
  • Generated text-to-image datasets using prompt engineering and SDXL, filtered using CLIPScore and OpenCV tools.
  • Leveraged LLMs with Retrieval-Augmented Generation to improve user experience on Bixby (AI bot).
  • Received the Best SW Project Award of the Year 2023/24, given to the top 2 projects in the GenAI division.

Samsung Electronics, South Korea - Research Associate Visual Display, AI     May - Jul '22

  • Designed supervised Time Series and unsupervised Clustering models to predict television replacement.
  • Incorporated Key Performance Indicators from sensors with RFM values, improving performance by 6%.

KnowDis, Delhi - Data Science Intern Query Search and Recommendation Engine     May - Jul '21

  • Worked on a multi-classification problem, fine-tuned fastText model to generate embeddings.
  • Implemented a framework using faiss and annoy libraries to optimize nearest neighbor search for query embedding.
  • Built a scalable architecture where a batch of queries was processed in 55 secs compared to the baseline's 140 mins.

Miscellanea

Teaching

Teaching Assistant, 11868 LLM Systems, Spring'25, CMU
Teaching Assistant, COL106 Data Structures and Algorithms, Spring'23, IITD

Borrowed from Jon Barron website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.