On the use of semantic blocking techniques for data cleansing and integration

J. Nin*, Victor Muntés-Mulero, Norbert Martínez-Bazan, Josep L. Larriba-Pey

*Autor corresponent d’aquest treball

Producció científica: Capítol de llibreContribució a congrés/conferènciaAvaluat per experts

19 Cites (Scopus)

Resum

Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.

Idioma originalAnglès
Títol de la publicació11th International Database Engineering and Applications Symposium Proceedings, IDEAS
Pàgines190-198
Nombre de pàgines9
DOIs
Estat de la publicacióPublicada - 2007
Publicat externament
Esdeveniment11th International Database Engineering and Applications Symposium - IDEAS'2007 - Banff, AB, Canada
Durada: 6 de set. 20078 de set. 2007

Sèrie de publicacions

NomProceedings of the International Database Engineering and Applications Symposium, IDEAS
ISSN (imprès)1098-8068

Conferència

Conferència11th International Database Engineering and Applications Symposium - IDEAS'2007
País/TerritoriCanada
CiutatBanff, AB
Període6/09/078/09/07

Fingerprint

Navegar pels temes de recerca de 'On the use of semantic blocking techniques for data cleansing and integration'. Junts formen un fingerprint únic.

Com citar-ho