Anurag Ghosh
I'm an AI Researcher specializing in vision foundation models and robotics, passionate about building full-stack embodied systems, from novel algorithms published at CVPR, NeurIPS, and ICCV, to deployed systems impacting millions of users.
Currently, I'm a final year Robotics PhD at Carnegie Mellon, advised by Srinivasa Narasimhan. I work on open-world perception and planning. I was co-advised by Srinivasa Narasimhan and Christoph Mertz when I got my Masters here.
Earlier, I spent a few wonderful years at Microsoft Research working at the intersection of scalable perception and distributed systems. In a different era, I studied at IIIT Hyderabad and worked on structured perception from broadcast sports videos.
Here's my Resume.
Research
I love building intelligent systems that work to delight millions and hopefully, billions of users. I'm broadly interested in closing the perception-action loop for robotics in our extremely messy and beautiful open physical world, while ensuring that the systems we build are efficient, frugal, accessible, and deployable at scale.
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
A scalable data generation framework that combines mesh-renderings with real images, enabling robust 3D reconstruction across extreme viewpoint variations (e.g., aerial-ground).
Saliency Guided Image Warping for Unsupervised Domain Adaptation
Winter Conference on Applications of Computer Vision (WACV), 2025
An unsupervised domain adaptation approach that oversamples salient regions via in-place image warping during self-distillation. It improves model robustness across diverse geographies, lighting, and weather conditions—facilitating reliable real-world deployment.
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
Conference on Computer Vision and Pattern Recognition (CVPR), 2023
A learnable, geometry-aware image resampling approach that incorporates 3D scene priors (ground and sky planes) into the perception pipeline, significantly improving both the efficiency and accuracy of detecting small, far-away objects.
Chanakya: Learning Runtime Decisions for Adaptive Real-Time Perception
Conference on Neural Information Processing Systems (NeurIPS), 2023
Honorable Mention, Streaming Perception Challenge, CVPR 2021.
An RL-based control policy that observes scene and system state to make adaptive runtime decisions (model selection, resolution, compute allocation). It jointly optimizes accuracy and latency under strict real-time constraints for both server and on-device hardware.
REACT: Streaming Video Analytics On The Edge With Asynchronous Cloud Support
International Conference on Internet of Things Design and Implementation (IoTDI), 2023
A fusion framework for video analytics that fuses on-device and cloud predictions asynchronously, improving object detection accuracy by up to 50% over device-only or cloud-only approaches.
Holistic Energy Awareness for Intelligent Drones
International Conference on Systems for Energy-Efficient Built Environments (BuildSys), 2021
Best Paper Runner-Up
A holistic energy management framework for intelligent drones that jointly optimizes compute, communication, and flight energy. Also appeared in Transactions on Sensor Networks.
Smartphone-based Driver License Testing
Watch Microsoft CEO Satya Nadella explain the project!
Read about our work on PM Awards Innovations Coffee Table Book! (Extracted here)
Deployed in multiple states/10+ cities in India, automatically testing hundreds of thousands of drivers at a low-cost with >99% accuracy (test verified by human operator). See Overview and Dashboard.
A deployed nationwide automated driver's license testing platform (100K+ users) replacing expensive pole-mounted infrastructure with a $500 smartphone. Implements robust monocular 3D localization (SfM/SLAM), IMU-based jerk recognition, and multi-modal perception for accurate trajectory estimation and driver state monitoring.
Relevant Publications
Anurag Ghosh, Vijay Lingam, Ishit Mehta, Akshay Nambi, Venkat Padmanabhan, Satish Sangameswaran
Conference on Embedded Networked Sensor Systems (SenSys Demo), 2019
Akshay Nambi, Ishit Mehta, Anurag Ghosh, Vijay Lingam, Venkat Padmanabhan
Conference on Embedded Networked Sensor Systems (SenSys), 2019
Analyzing Racket Sports From Broadcast Videos
IIIT Hyderabad (Master's Thesis), 2019
A geometry-aware perception system for broadcast racket sports. Automatically performs court-mapping, player tracking, and shot classification from monocular TV footage to quantitatively compare the playing styles of Federer, Nadal, and Djokovic.
Towards Structured Analysis of Broadcast Badminton Videos
Winter Conference On Applications of Computer Vision (WACV), 2018
Piloted with ESPN/Star Sports at Premier Badminton League, watched by tens of millions in South East Asia.
An end-to-end framework for real-time player analysis from live broadcast badminton videos using only visual cues — computing on-court distance, speed, and heatmaps without multi-modal sensors.
SmartTennisTV: An Automatic Indexing System for Tennis
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017
Best Paper Award
An automatic system for indexing and retrieving key events from broadcast tennis videos, enabling structured query-based access to match highlights.
Signals Matter: Understanding Popularity and Impact on Stack Overflow
The Web Conference (WWW), 2019
Dynamic narratives for heritage tour
VisArt Workshop, European Conference on Computer Vision (ECCV), 2016
Storytelling from visual inputs before it was cool, building dynamic text narratives from egocentric heritage site videos.
Press
Interesting/Inspiring Links
- Making computer vision systems that work: Boujou, Kinect, HoloLens
- A New Kind of Science - A 15 Year View
- How I ran the length of every street in Pittsburgh: PAC TOM
- Frugal Innovations for a Developing World
- Hints and Principles for Computer System Design
- The Advent of Actionable Tennis Analytics
- Automatic Pool Stick vs Strangers