Research

Our research teams investigate object-centric AI and computer vision applied to fashion — building systems that perceive, understand, and generate clothing the way humans do: as structured compositions of meaningful parts.

Object-Centric Representation Fine-Grained Visual Understanding Generative Fashion AI Multimodal & Language-Grounded Vision

Research Team 01

Object-Centric Representation

Building structured, compositional world models that understand fashion as a scene of discrete, meaningful objects.

Our core research direction is object-centric representation learning — teaching neural networks to decompose complex fashion imagery into structured, interpretable object slots. Rather than treating an outfit as a single monolithic feature vector, our models learn to bind attributes such as garments, textures, colors, and accessories to separate representational slots. This compositional structure enables systematic generalization: models trained on seen combinations can reason about unseen ones, a critical requirement for the long-tail complexity of fashion.

Focus Areas

Slot Attention and iterative routing for garment decomposition
Unsupervised object discovery in cluttered fashion scenes
Compositional generation and disentangled latent spaces
Cross-image object correspondence and part matching

Research Team 02

Fine-Grained Visual Understanding

Pushing the limits of what vision models can recognize and describe in fashion imagery.

Fine-grained recognition in fashion is exceptionally challenging due to subtle inter-class differences, extreme intra-class variation, and long-tail class distributions. Our team develops specialized architectures and training strategies — including hierarchical attention, part-aware pooling, and contrastive objectives — to achieve human-level discrimination between similar garments across diverse real-world imaging conditions.

Focus Areas

Part-aware and region-guided attention mechanisms
Hierarchical attribute classification (category → style → detail)
Few-shot and zero-shot recognition across fashion domains
Robust recognition under pose, lighting, and occlusion variance

Research Team 03

Generative Fashion AI

Controllable image synthesis grounded in structured object representations.

We research generative models that leverage object-centric representations to enable fine-grained, controllable fashion synthesis. By conditioning diffusion and flow-matching models on structured object slots rather than global embeddings, we achieve precise attribute-level editing — change the sleeve style without altering the collar, or swap textures while preserving silhouette — capabilities that unstructured generative models fundamentally lack.

Focus Areas

Slot-conditioned diffusion models for garment editing
Virtual try-on via object-level appearance transfer
Compositional outfit generation and style mixing
Controllable texture and pattern synthesis

Research Team 04

Multimodal & Language-Grounded Vision

Connecting natural language to structured visual representations of fashion.

Fashion understanding requires bridging visual and linguistic modalities. Our multimodal research focuses on grounding natural language descriptions to specific object slots, enabling applications like language-guided search, detailed product captioning, and conversational outfit recommendation. We build on large vision-language models while introducing object-centric bottlenecks that enforce compositional alignment between text and image regions.

Focus Areas

Slot-grounded vision-language pre-training
Dense captioning of garment attributes and styling details
Language-guided part-level image editing
Compositional text-to-outfit retrieval

Interested in our research?

We publish our findings and release open-source tools. Follow our latest updates in the News section.