Saketh Rambhatla

I am a Research Scientist at Meta AI. Previously I was a postdoctoral researcher working with Dr. Ishan Misra. I obtained my Ph.D. in ECE at University of Maryland (UMD), College Park, where I worked on in-the-wild visual understanding with Dr. Rama Chellappa and Dr. Abhinav Shrivastava.

I completed my bachelor's and master's at Indian Institute of Technology, Kharagpur.

Email  /  CV  /  Google Scholar  

profile photo
Research

At Meta, I spend most of my time working on Generative AI. My research has resulted in state-of-the-art image and video generation models and received wide media coverage. During my Ph.D. I worked on object tracking, person re-identification, object detection and discovery and multi-modal inconsistency detection tasks.

Publications

Moviegen: A cast of media foundation models
GenAI, Meta

A cast of foundation models that can generate HD videos and synchronized audio. Enables additional capabilities like precise instruction-based video editing and generation of personalized videos.

Trajectory-aligned Space-time tokens for Few-Shot Action Recognition
ECCV, 2024

Leveraging point trajectories and self-supervised representations for few-shot action recogntion.

SelfEval: Leveraging the discriminative nature of generative models for evaluation.
Sai Saketh Rambhatla, Ishan Misra
Under Submission

Repurpose generative models as discriminative models to evaluate generative performance.

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning.
ECCV, 2024

State-of-the-art Diffusion-based Text-to-Video generation method.

InstanceDiffusion: Instance-level Control for Image Generation.
CVPR, 2024

A novel and effective method to enable precise instance-level control for text-to-image generation.

UVIS: Unsupervised Video Instance Segmentation.
CVPR Workshops, 2024

Unsupervised Video Instance Segmentation without any video annotations or densely labeled pre-training.

MOST: Multiple Object localization with Self-supervised Transformers for object discovery.
ICCV, 2023 (Oral Presentation)

Localize multiple objects in a real world images in an unsupervised fashion without any training using Self-supervised Transformers.

SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining.
ICCV, 2023

Improve performance of detectors trained using missing annotations by posing the problem as region-based semi-supervised learning.

The Pursuit of Knowledge: Discovering and Localizing new concepts using Dual Memory
Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava
ICCV, 2021

Equip machines with capabilities to automatically discover and learn models for new categories from a large unlabeled dataset.

Towards Discovery and Attribution of Open-world GAN Generated Images.
Saksham Suri*, Sharath Girish*, Sai Saketh Rambhatla, Abhinav Shrivastava
ICCV, 2021

Automatically discover and attribute open-world GAN images.

Self-Denoising neural networks for few shot learning
Steven Schwarcz, Sai Saketh Rambhatla, Rama Chellappa

Novel architecture based on denoising auto-encoders to improve few shot learning.

An Empirical analysis of Boosting Deep Networks
Sai Saketh Rambhatla, Michael Jones, Rama Chellappa
IJCNN, 2022

Empirical evidence that a single large neural network is usually more accurate than a boosted ensemble of neural networks with the same number of total parameters

Towards Accurate Visual and Natural language-based vehicle retrieval systems.
Pirazh Khorramshahi*, Sai Saketh Rambhatla*, Rama Chellappa
NVIDIA AI City Challenge, CVPR Workshops, 2021

Proposed a real-time system for image-based vehicle re-identification and natural language-based vehicle retrieval.

Towards Real-Time Systems for Vehicle Re-Identification, Multi-Camera Tracking, and Anomaly Detection.
NVIDIA AI City Challenge, CVPR Workshops, 2020

Proposed a Real-Time system for Vehicle Re-identification, Multi-Camera Tracking, and Anomaly Detection in a network of traffic cameras.

Detecting Human-Object Interactions via Functional Generalization.
AAAI, 2020

Humans interact with functionally similar objects in a similar manner.

Spatial Priming for Detecting Human-Object Interactions
arxiv

A method for exploiting the spatial layout information of a human and an object for detecting HOIs in images.

A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification
ICCV, 2019   (Oral Presentation)

Proposed a novel dual-path adaptive attention model for Vehicle re-identification.

Body Part Alignment and Temporal Attention Pooling for Video-Based Person ReIdentification
Sai Saketh Rambhatla, Michael Jones
BMVC, 2019

Training deep networks with the ability to align features achieves state of the art performance on Person Re-identification.



Service

Reviewer, International Journal of Computer Vision

Reviewer, Pattern Recognition Letters

Reviewer, IEEE Access

Reviewer, ECCV 2020, 2022, 2024

Reviewer, ICCV 2021, 2023

Reviewer, Neurips, 2023

Reviewer, ICLR, 2023, 2024

Reviewer, ICML 2024

Reviewer, CVPR 2021, 2022, 2023, 2024

Reviewer, AAAI 2021, 2022

cs188

Graduate Teaching Assistant, ENEE222 Fall 2016

Graduate Teaching Assistant, ENEE324 Spring 2017

Graduate Teaching Assistant, ENEE425 Fall 2017

Graduate Teaching Assistant, ENEE630 Fall 2017


website template credits