OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Giuseppe Cartella; Alberto Baldrati; Davide Morelli; Marcella Cornia; Marco Bertini; Rita Cucchiara

doi:10.1007/978-3-031-43148-7_21

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

Research output: Book chapter › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

Original language	English
Title of host publication	Image Analysis And Processing, Iciap 2023, Pt I
Editors	GL Foresti, A Fusiello, E Hancock
Publisher	Springer Nature
Pages	245-256
Number of pages	12
Volume	14233
ISBN (Electronic)	978-3-031-43148-7
ISBN (Print)	978-3-031-43147-0
DOIs	https://doi.org/10.1007/978-3-031-43148-7_21
Publication status	Published - 2023
Externally published	Yes
Event	22nd International Conference on Image Analysis and Processing (ICIAP) - Udine, Italy Duration: 11 Sept 2023 → 15 Sept 2023

Publication series

Name	Lecture Notes In Computer Science

Conference

Conference	22nd International Conference on Image Analysis and Processing (ICIAP)
Country/Territory	Italy
City	Udine
Period	11/09/23 → 15/09/23

Keywords

Fashion Domain
Open-Source Datasets
Vision-and-Language Pre-Training

Access to Document

10.1007/978-3-031-43148-7_21

Cite this

Cartella, G., Baldrati, A., Morelli, D., Cornia, M., Bertini, M., & Cucchiara, R. (2023). OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data. In GL. Foresti, A. Fusiello, & E. Hancock (Eds.), Image Analysis And Processing, Iciap 2023, Pt I (Vol. 14233, pp. 245-256). (Lecture Notes In Computer Science). Springer Nature. https://doi.org/10.1007/978-3-031-43148-7_21

@inproceedings{6e147666551c4b869298dc35e01ffba6,

title = "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data",

abstract = "The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.",

keywords = "Fashion Domain, Open-Source Datasets, Vision-and-Language Pre-Training",

author = "Giuseppe Cartella and Alberto Baldrati and Davide Morelli and Marcella Cornia and Marco Bertini and Rita Cucchiara",

year = "2023",

doi = "10.1007/978-3-031-43148-7_21",

language = "English",

isbn = "978-3-031-43147-0",

volume = "14233",

series = "Lecture Notes In Computer Science",

publisher = "Springer Nature",

pages = "245--256",

editor = "GL Foresti and A Fusiello and E Hancock",

booktitle = "Image Analysis And Processing, Iciap 2023, Pt I",

address = "United States",

note = "22nd International Conference on Image Analysis and Processing (ICIAP) ; Conference date: 11-09-2023 Through 15-09-2023",

}

Cartella, G, Baldrati, A, Morelli, D, Cornia, M, Bertini, M & Cucchiara, R 2023, OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data. in GL Foresti, A Fusiello & E Hancock (eds), Image Analysis And Processing, Iciap 2023, Pt I. vol. 14233, Lecture Notes In Computer Science, Springer Nature, pp. 245-256, 22nd International Conference on Image Analysis and Processing (ICIAP), Udine, Italy, 11/09/23. https://doi.org/10.1007/978-3-031-43148-7_21

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data. / Cartella, Giuseppe; Baldrati, Alberto; Morelli, Davide et al.
Image Analysis And Processing, Iciap 2023, Pt I. ed. / GL Foresti; A Fusiello; E Hancock. Vol. 14233 Springer Nature, 2023. p. 245-256 (Lecture Notes In Computer Science).

Research output: Book chapter › Conference contribution › peer-review

TY - GEN

T1 - OpenFashionCLIP

T2 - 22nd International Conference on Image Analysis and Processing (ICIAP)

AU - Cartella, Giuseppe

AU - Baldrati, Alberto

AU - Morelli, Davide

AU - Cornia, Marcella

AU - Bertini, Marco

AU - Cucchiara, Rita

PY - 2023

Y1 - 2023

N2 - The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

AB - The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

KW - Fashion Domain

KW - Open-Source Datasets

KW - Vision-and-Language Pre-Training

UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=pure_univeritat_ramon_llull&SrcAuth=WosAPI&KeyUT=WOS:001156196000021&DestLinkType=FullRecord&DestApp=WOS_CPL

U2 - 10.1007/978-3-031-43148-7_21

DO - 10.1007/978-3-031-43148-7_21

M3 - Conference contribution

SN - 978-3-031-43147-0

VL - 14233

T3 - Lecture Notes In Computer Science

SP - 245

EP - 256

BT - Image Analysis And Processing, Iciap 2023, Pt I

A2 - Foresti, GL

A2 - Fusiello, A

A2 - Hancock, E

PB - Springer Nature

Y2 - 11 September 2023 through 15 September 2023

ER -

Cartella G, Baldrati A, Morelli D, Cornia M, Bertini M, Cucchiara R. OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data. In Foresti GL, Fusiello A, Hancock E, editors, Image Analysis And Processing, Iciap 2023, Pt I. Vol. 14233. Springer Nature. 2023. p. 245-256. (Lecture Notes In Computer Science). doi: 10.1007/978-3-031-43148-7_21

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this