Research
I'm passionate about machine learning, multimodal AI, computer vision, reinforcement learning, and graph neural networks. My research focuses on enhancing reasoning in Vision Language Models (VLMs) and building robust multimodal agents.
I have worked on diverse topics, including adversarial robustness of multimodal agents, NeuroCUT—a GNN-based method for graph cut extraction, and retrieval-augmented generation (RAG) systems. My goal is to develop intelligent, reliable AI systems that improve human efficiency and enable new possibilities.
|
|
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu,
Rishi Shah,
Jing Yu Koh,
Ruslan Salakhutdinov,
David Fried,
Aditi Raghunathan
ICLR'25, advised by Prof. Aditi Raghunathan
project page
/
arXiv
Explored adversarial robustness of Vision-enabled Language Models (VLMs) in multimodal agent tasks, exposing vulnerabilities to imperceptible image perturbations (<5% pixels), enabling targeted attacks with 67% success rates.
Proposed the Agent Robustness Evaluation (ARE) framework, which models agents as graphs, decomposing robustness and pinpointing vulnerabilities.
Released VisualWebArena-Adv, a benchmark with 200 adversarial tasks, revealing that evaluators and value functions reduce ASR, while advanced agent techniques like tree search amplify vulnerabilities.
|
|
NEUROCUT: Neural Approach for Robust Graph Partitioning
Rishi Shah,
Krishnanshu Jain,
Sahil Manchanda
Sayan Ranu,
Sourav Medya
KDD’24, advised by Prof. Sayan Ranu
GitHub
/
arXiv
Designed a novel GNN-based inductive architecture to find near-optimal solutions for cut-related NP-hard problems.
Developed an RL-based auto-regressive framework using policy gradient methods to optimize graph partitions.
The framework generalizes to any cut objective and specified number of partitions, outperforming all baselines.
|
|
Packet Routing using Multi-Agent Reinforcement Learning
Aniket Modi,
Rishi Shah,
Krishnanshu Jain,
Rajeev Shorey
COMSNETS’23, advised by Prof. Rajeev Shorey
GitHub
/
IEEE Page
Developed a Multi-Agent RL framework to optimize packet routing in UAV-based IoT networks.
Designed and trained DDQNs with a novel cross-agent reward function to enhance collaboration among agents.
Achieved a 48.7% throughput gain over baseline methods, demonstrating significant efficiency improvements.
|
Work Experience
Samsung Electronics, South Korea - Research Engineer
Generative AI Team Sept '23 - Aug '24
- Fine-tuned Samsung Gauss-i using LORA and Dreambooth for generating themed images for TV backgrounds.
- Generated text-to-image datasets using prompt engineering and SDXL, filtered using CLIPScore and OpenCV tools.
- Leveraged LLMs with Retrieval-Augmented Generation to improve user experience on Bixby (AI bot).
- Received the Best SW Project Award of the Year 2023/24, given to the top 2 projects in the GenAI division.
Samsung Electronics, South Korea - Research Associate
Visual Display, AI May - Jul '22
- Designed supervised Time Series and unsupervised Clustering models to predict television replacement.
- Incorporated Key Performance Indicators from sensors with RFM values, improving performance by 6%.
KnowDis, Delhi - Data Science Intern
Query Search and Recommendation Engine May - Jul '21
- Worked on a multi-classification problem, fine-tuned fastText model to generate embeddings.
- Implemented a framework using faiss and annoy libraries to optimize nearest neighbor search for query embedding.
- Built a scalable architecture where a batch of queries was processed in 55 secs compared to the baseline's 140 mins.
|
Borrowed from Jon Barron website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
|
|