The Pursuit of Knowledge:
Discovering and Localizing novel concepts using Dual Memory

ICCV 2021

 Saketh Rambhatla Rama Chellappa Abhinav Shrivastava

 [Paper] [Supplementary] [Poster] [Workshop Challenge]

 A detector trained on 20 VOC (known) classes struggles on out-of-distribution images (e.g., COCO) in the presence of novel (unknown) objects (e.g., bear). Our discovery and localization frame-work builds on this detector and can reliably localize and group semantically meaningful "patterns" in images with both known and novel objects in challenging images. Novel objects belonging to the same class are assigned the same bounding box color. Best viewed in color

# Abstract

 We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset. While existing methods show results on datasets with less cluttered scenes and fewer object instances per image, we present our results on the challenging COCO dataset. Moreover, we argue that, rather than discovering new categories from scratch, discovery algorithms can benefit from identifying what is already known and focusing their attention on the unknown. We propose a method that exploits prior knowledge about certain object types to discover new categories by leveraging two memory modules, namely Working and Semantic memory. We show the performance of our detector on the COCO minival dataset to demonstrate its in-the-wild capabilities.

# Approach Overview

 Our system operates sequentially, processing one image at a time. First, the Encoding module processes the image and outputs candidate regions and features. The Retrieval module assigns each region to either Semantic or Working memory. In the demonstration, after the first iteration, two regions (one humans and a car) have been assigned to the two slots of the Semantic memory and the remaining regions have been assigned to four different slots (Slot 1-4) in the Working memory. In the next iteration, the retrieval module assigns four regions (two humans and two cars) to the Semantic memory while, the remaining regions have been assigned to two previously created slots (Slot 1, 3) and three new slots (Slot 5-7). Note that the retrieval module can either decide to populate an existing slot in the Working memory or create new patterns if necessary. This capability eliminates the need to know the number of "unknown" objects apriori. More details about the approach can be found here.

# Qualitative results on Object Discovery

 Concepts discovered by our method in COCO 2014 train set that can be evaluated using ground truth annotations

 Concepts discovered by our method in COCO 2014 train set that cannot be evaluated using ground truth annotations. Check this out for more qualitative results.

# Qualitative results on Object Detection

 To demonstrate performance of our approach on unseen data and to demonstrate its practical utility, we evaluate detectors obtained from our approach on COCO-minival. The detectors display a lot of intra-class variation. We achieve the highest AP of 17.38% forthe bear class and a lowest mAP of 0.08% for traffic lights.

# Workshop and Challenge Information

 The discovery setup and evaluation protocol described in this paper will be hosted as a challenge on the Visual Perception and Learning in an Open World workshop (CVPR 2022). For more details of the challenge, visit this doc. Baseline code to perform discovery is available here. Teams can submit an entry to the leaderboard by emailing their results to anubhav[AT]umd[DOT]edu or pulkit[AT]umd[DOT]edu with the subject "[VPLOW-CHALLENGE-SUBMISSION]; Team Name: ". The leaderboard will be updated everyday at 11pm ET. Clarification of ranking metric: Teams will be ranked based on the Normalized AuC metric. The formula for Normalized AuC is given by $$AuC * e^{-C/N}$$, where $$AuC$$ is the Area under curve of cumulative purity and coverage. $$C$$ is the number of clusters generated by a method and $$N$$ is the total number of annotations in COCO 2014 train set. The AuC in its current form doesn't penalize overclustering. The Normalized AuC on the other hand penalizes methods which overcluster.