Label embedding for text recognition

Jose A. Rodriguez-Serrano, Florent Perronnin

Producció científica: Contribució a una conferènciaContribucióAvaluat per experts

57 Cites (Scopus)


The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields (CRF). This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM (SSVM) framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents the following advantages: it does not require costly pre- or post-processing operations, it allows for the recognition of never-seen-before words and the recognition process is efficient. Experiments are performed on two challenging datasets (one of license plates and one of scene text) and show that the proposed method is competitive with standard bottom-up approaches to text recognition.

Idioma originalAnglès
Estat de la publicacióPublicada - 2013
Publicat externament
Esdeveniment2013 24th British Machine Vision Conference, BMVC 2013 - Bristol, United Kingdom
Durada: 9 de set. 201313 de set. 2013


Conferència2013 24th British Machine Vision Conference, BMVC 2013
País/TerritoriUnited Kingdom


Navegar pels temes de recerca de 'Label embedding for text recognition'. Junts formen un fingerprint únic.

Com citar-ho