Complex-Valued Autoencoders for Object Discovery

Created: February 14, 2024

Tags: object discovery, unsupervised learning, object-centric learning, neuroscience

Link: https://arxiv.org/pdf/2204.02075.pdf

Status: Reading

What?

Discovering objects in unsupervised manner is crucial for solving binding problem, that is, the human understanding of the world in terms of objects.

Following a coding scheme theorized to underlie object representations in biological neurons, its complex-valued activations represent two messages: their magnitudes express the presence of a feature, while the relative phase differences between neurons express which features should be bound together to create joint object representations.

Why?

Slot-based models, although competive and are becoming applicable in real world systems, still has some issues. Slot-based methods require elaborate structural biases, intricate training schemes to achieve a good separation of object features into slots. Second, they limit information flow and expressiveness of models, which lead to failure cases for complex objects, e.g with textured objects

The question is : “can we find simpler alternative to slot-based methods?”

How?

Inspired by temporal correlation hypothesis (similar to spiking NNs, maybe?), introduce new complex-valued models for object discovery. In short,

Brain binds information from different neurons by synchronizing their firing patterns, while desynchronized firing patterns represents information that should be processed separately

Two types of messages that neuron sends:

Rate code (discharge frequency) of a neuron encodes whether a particular feature that it is tuned/specilized is present or not. Real-valued activations (normal activations we know) in usual NNs can be interpreted as the technical implementation of this message.
Relative timing between two neuron’s spikes encodes whether features of these neurons should be bound together or not. When firing in synchrony, the features they represent will be evaluated jointly by the target neuron, and thus are combined in a flexible and dynamic way.

This second type of message is not explored well in current NNs. And this paper exactly explores this message via complex-valued activations

Remember, complex numbers have magnitude and phase

Creating discrete object assignments from continous phase values

How to extract object-wise representations from the latent space?