Anirudh S Chakravarthy
I'm a Senior Machine Learning Engineer at Cruise, working on multi-task learning and large models for long-tail recognition in autonomous driving.
I graduated with my Master of Science in Computer Vision (MSCV) at the Robotics Institute, Carnegie Mellon University (CMU), where I was advised by Prof. Deva Ramanan.
My research at CMU focused on extending panoptic segmentation into the open world and discovering novel objects without explicit supervision.
Want to chat? Feel free to reach out!
Email  | 
Resume  | 
Google Scholar  | 
LinkedIn
|
|
|
|
|
|
|
|
Education
-
M.S in Computer Vision, 2022 Carnegie Mellon University, USA
-
B.E in Computer Science, 2021 BITS Pilani, India
|
|
Lidar Panoptic Segmentation in an Open World
Anirudh Chakravarthy,
Meghana Reddy Ganesina,
Peiyun Hu,
Laura Leal-Taixé,
Shu Kong,
Deva Ramanan,
Aljosa Osep
International Journal of Computer Vision, 2024
[pdf]
[arxiv]
[project page]
[code]
Current Lidar Panoptic Segmentation (LPS) methods make an unrealistic assumption that the semantic class vocabulary is fixed in the real world, but in fact, class ontologies usually evolve over time as robots encounter instances of novel classes. To address this unrealistic assumption, we study LPS in the Open World (LiPSOW).
|
|
PROFIT: A Specialized Optimizer for Deep Fine Tuning
Anirudh Chakravarthy,
Shuai Kyle Zheng,
Xin Huang,
Sachithra Hemachandra,
Xiao Zhang,
Yuning Chai,
Zhao Chen
Under Review at ICML, 2025
[arxiv]
Fine-tuning pre-trained models has become invaluable in computer vision and robotics. We present PROFIT, one of the first optimizers specifically designed for incrementally fine-tuning converged models on new tasks or datasets.
|
|
YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset
Donglai Wei,
Siddhant Kharbanda,
Sarthak Arora,
Roshan Roy,
Nishant Jain,
Akash Palrecha,
Tanav Shah,
Shray Mathur,
Ritik Mathur,
Abhijay Kemkar,
Anirudh Chakravarthy,
Zudi Lin,
Won-Dong Jang,
Yansong Tang,
Song Bai,
James Tompkin,
Philip H.S. Torr,
Hanspeter Pfister
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[project page]
[pdf]
We introduce a new dataset and benchmark, YouMVOS, for multi-shot video object segmentation.
|
|
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation
Anirudh Chakravarthy,
Won-Dong Jang,
Zudi Lin,
Donglai Wei,
Song Bai,
Hanspeter Pfister
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021
[pdf]
[arXiv]
[code]
We identify mask quality as a bottleneck for video instance segmentation. To overcome this, we propose an attention-based network to propagate missing object instances. Our method significantly outperforms previous state-of-the-art algorithms using the Mask R-CNN backbone, by achieving 36.0% mAP on the YouTube-VIS benchmark.
|
|
MRSCAtt: A Spatio-Channel Attention-Guided Network for Mars Rover Image Classification
Anirudh Chakravarthy*,
Roshan Roy*,
Praveen Ravirathinam*
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021
[pdf]
[code]
We propose a network, MRSCAtt (Mars Rover Spatial and Channel Attention), which jointly uses spatial and channel attention to accurately classify images. We use images taken by NASA's Curiosity rover on Mars as a dataset to show the superiority of our approach by achieving state-of-the-art results with 81.53% test set accuracy on the MSL Surface Dataset, outperforming other methods.
|
|
Cruise LLC
Machine Learning Engineer Feb 2023 - Present
Large Models for Long-tail Perception
|
|
Cruise LLC
Machine Learning Engineer Intern May 2022 - Aug 2022
Multi-task Learning for Long-tail Perception
|
|
Visual Computing Group, Harvard University
Research Intern May 2020 - July 2021
Video Instance Segmentation
|
|
Computer Vision and Robotics Lab, University of Illinois Urbana-Champaign
Research Intern May 2020 - Dec 2020
Vital Parameter Estimation
|
|
Self-Supervised Camera Pose Estimation with Geometric Consistency
[pdf]
[code]
Existing camera pose estimation methods make use of ground-truth odometry as supervision, which may be expensive to obtain.
In this work, we train a transformer-based pose estimation network in a self-supervised manner, leveraging advances in monocular depth estimation.
|
|
Is Monocular Vision Sufficient for Multi-View Visual Odometry?
[pdf]
[code]
For visual localization, we often have multiple cameras mounted onto a robot which can be used to infer odometry (known as multi-view visual odometry).
Existing works either heavily rely on the scene geometry or use complicated networks posing challenges for real-world generalization.
In this work, we aim to develop simple yet strong baselines for multi-view visual odometry, by fusing estimates using monocular visual odometry.
|
|
Constrained Humanification: Improving Multi-Person Reconstruction Using Temporal Constraints
[pdf]
[code]
Multi-person 3D reconstruction is challenging, yet no prior work aims to disambiguate inter-person occlusions using temporal information.
Motivated by this, we leverage optical flow as a cue to improve 3D human pose estimation in crowded scenes.
|
|
Latent Space Robustness of Generative Models
[project page]
[code]
Generative models such as StyleGAN have shown very promising results. However, while using such GANs for face generation, we often encounter cases of non-photorealistic generations (e.g: artifacts, not face-like, etc.).
In this project, we aim to formally establish the existence of such failure modes in GANs.
|
This page has been accessed at least
times since 30th Dec 2022.
|