Anurag Ghosh

I'm an AI Researcher specializing in vision foundation models and robotics, passionate about building full-stack embodied systems, from novel algorithms published at CVPR, NeurIPS, and ICCV, to deployed systems impacting millions of users.

Currently, I'm a final year Robotics PhD at Carnegie Mellon, advised by Srinivasa Narasimhan. I work on open-world perception and planning. I was co-advised by Srinivasa Narasimhan and Christoph Mertz when I got my Masters here.

Earlier, I spent a few wonderful years at Microsoft Research working at the intersection of scalable perception and distributed systems. In a different era, I studied at IIIT Hyderabad and worked on structured perception from broadcast sports videos.

Here's my Resume.

Open to opportunities: I'm on the industry job market. If you think we'd be a good fit, let's chat!
Anurag Ghosh profile photo

Research

I love building intelligent systems that work to delight millions and hopefully, billions of users. I'm broadly interested in closing the perception-action loop for robotics in our extremely messy and beautiful open physical world, while ensuring that the systems we build are efficient, frugal, accessible, and deployable at scale.

RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time

Anurag Ghosh, Srinivasa Narasimhan, Manmohan Chandraker, Francesco Pittaluga

A real-time Language-Action planner combining grounded rule-based reasoning with Large Language Models (LLMs) to handle complex, long-tailed autonomous driving scenarios.

ROADWork Dataset

ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Anurag Ghosh, Shen Zheng, Robert Tamburo, Khiem Vuong, Juan R. Alvarez Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa Narasimhan

International Conference on Computer Vision (ICCV), 2025

A comprehensive open-source dataset and benchmark for studying long-tail, complex autonomous driving scenarios in work zones.

AerialMegaDepth

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Khiem Vuong, Anurag Ghosh, Deva Ramanan*, Srinivasa Narasimhan*, Shubham Tulsiani*

Conference on Computer Vision and Pattern Recognition (CVPR), 2025

A scalable data generation framework that combines mesh-renderings with real images, enabling robust 3D reconstruction across extreme viewpoint variations (e.g., aerial-ground).

Saliency Guided Image Warping

Saliency Guided Image Warping for Unsupervised Domain Adaptation

Shen Zheng★, Anurag Ghosh★, Srinivasa Narasimhan

Winter Conference on Applications of Computer Vision (WACV), 2025

An unsupervised domain adaptation approach that oversamples salient regions via in-place image warping during self-distillation. It improves model robustness across diverse geographies, lighting, and weather conditions—facilitating reliable real-world deployment.

Two-Plane Perspective Prior

Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection

Anurag Ghosh, N Dinesh Reddy, Christoph Mertz, Srinivasa Narasimhan

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

A learnable, geometry-aware image resampling approach that incorporates 3D scene priors (ground and sky planes) into the perception pipeline, significantly improving both the efficiency and accuracy of detecting small, far-away objects.

Chanakya

Chanakya: Learning Runtime Decisions for Adaptive Real-Time Perception

Anurag Ghosh, Vaibhav Balloli, Akshay Nambi, Aditya Singh, Tanuja Ganu

Conference on Neural Information Processing Systems (NeurIPS), 2023

Honorable Mention, Streaming Perception Challenge, CVPR 2021.

An RL-based control policy that observes scene and system state to make adaptive runtime decisions (model selection, resolution, compute allocation). It jointly optimizes accuracy and latency under strict real-time constraints for both server and on-device hardware.

REACT

REACT: Streaming Video Analytics On The Edge With Asynchronous Cloud Support

Anurag Ghosh, Srinivasan Iyengar, Stephen Lee, Anuj Rathore, Venkat Padmanabhan

International Conference on Internet of Things Design and Implementation (IoTDI), 2023

A fusion framework for video analytics that fuses on-device and cloud predictions asynchronously, improving object detection accuracy by up to 50% over device-only or cloud-only approaches.

Holistic Energy Awareness for Intelligent Drones

Holistic Energy Awareness for Intelligent Drones

Srinivasan Iyengar, Ravi Raj Saxena, Joydeep Pal, Bhawana Chhaglani, Anurag Ghosh, Venkat Padmanabhan, Prabhakar T. Venkata

International Conference on Systems for Energy-Efficient Built Environments (BuildSys), 2021

Best Paper Runner-Up

A holistic energy management framework for intelligent drones that jointly optimizes compute, communication, and flight energy. Also appeared in Transactions on Sensor Networks.

HAMS Driver License Testing

Smartphone-based Driver License Testing

Watch Microsoft CEO Satya Nadella explain the project!

Read about our work on PM Awards Innovations Coffee Table Book! (Extracted here)

Deployed in multiple states/10+ cities in India, automatically testing hundreds of thousands of drivers at a low-cost with >99% accuracy (test verified by human operator). See Overview and Dashboard.

A deployed nationwide automated driver's license testing platform (100K+ users) replacing expensive pole-mounted infrastructure with a $500 smartphone. Implements robust monocular 3D localization (SfM/SLAM), IMU-based jerk recognition, and multi-modal perception for accurate trajectory estimation and driver state monitoring.

Relevant Publications

Smartphone-based Driver License Testing
Anurag Ghosh, Vijay Lingam, Ishit Mehta, Akshay Nambi, Venkat Padmanabhan, Satish Sangameswaran
Conference on Embedded Networked Sensor Systems (SenSys Demo), 2019
ALT: Towards Automating Driver License Testing using Smartphones
Akshay Nambi, Ishit Mehta, Anurag Ghosh, Vijay Lingam, Venkat Padmanabhan
Conference on Embedded Networked Sensor Systems (SenSys), 2019
Racket Sports Analysis — Federer, Nadal, Djokovic

Analyzing Racket Sports From Broadcast Videos

Anurag Ghosh

IIIT Hyderabad (Master's Thesis), 2019

A geometry-aware perception system for broadcast racket sports. Automatically performs court-mapping, player tracking, and shot classification from monocular TV footage to quantitatively compare the playing styles of Federer, Nadal, and Djokovic.

Broadcast Badminton Analysis

Towards Structured Analysis of Broadcast Badminton Videos

Anurag Ghosh, Suriya Singh, C.V. Jawahar

Winter Conference On Applications of Computer Vision (WACV), 2018

Piloted with ESPN/Star Sports at Premier Badminton League, watched by tens of millions in South East Asia.

An end-to-end framework for real-time player analysis from live broadcast badminton videos using only visual cues — computing on-court distance, speed, and heatmaps without multi-modal sensors.

SmartTennisTV

SmartTennisTV: An Automatic Indexing System for Tennis

Anurag Ghosh, C.V. Jawahar

National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017

Best Paper Award

An automatic system for indexing and retrieving key events from broadcast tennis videos, enabling structured query-based access to match highlights.

Signals Matter

Signals Matter: Understanding Popularity and Impact on Stack Overflow

Arpit Merchant, Daksh Shah, Gurpreet Singh Bhatia, Anurag Ghosh, Ponnurangam Kumaraguru

The Web Conference (WWW), 2019

Dynamic Narratives Heritage Tour

Dynamic narratives for heritage tour

Anurag Ghosh ★, Yash Patel ★, Mohak Sukhwani, C.V. Jawahar

VisArt Workshop, European Conference on Computer Vision (ECCV), 2016

Storytelling from visual inputs before it was cool, building dynamic text narratives from egocentric heritage site videos.

Press

Automated Driver License Testing

Badminton Analytics