OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

Producció científica: Capítol de llibreContribució a congrés/conferènciaAvaluat per experts

1 Citació (Scopus)

Resum

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.
Idioma originalAnglès
Títol de la publicacióImage Analysis And Processing, Iciap 2023, Pt I
EditorsGL Foresti, A Fusiello, E Hancock
EditorSpringer Nature
Pàgines245-256
Nombre de pàgines12
Volum14233
ISBN (electrònic)978-3-031-43148-7
ISBN (imprès)978-3-031-43147-0
DOIs
Estat de la publicacióPublicada - 2023
Publicat externament
Esdeveniment22nd International Conference on Image Analysis and Processing (ICIAP) - Udine, Italy
Durada: 11 de set. 202315 de set. 2023

Sèrie de publicacions

NomLecture Notes In Computer Science

Conferència

Conferència22nd International Conference on Image Analysis and Processing (ICIAP)
País/TerritoriItaly
CiutatUdine
Període11/09/2315/09/23

Fingerprint

Navegar pels temes de recerca de 'OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data'. Junts formen un fingerprint únic.

Com citar-ho