TY - GEN
T1 - Semantic blocking for Record Linkage
AU - Nin, Jordi
AU - Muntés-Mulero, Víctor
AU - MartíNez-Bazan, Norbert
AU - Larriba-Pey, Josep L.
PY - 2007
Y1 - 2007
N2 - Record Linkage (RL) is an important component of data cleaning and integration and data processing in general. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or reducing the number of attribute comparisons, which reduces the computational time, but increases the amount of error. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper we show that exploiting the semantic relationships (e.g. foreign key), established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.
AB - Record Linkage (RL) is an important component of data cleaning and integration and data processing in general. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or reducing the number of attribute comparisons, which reduces the computational time, but increases the amount of error. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper we show that exploiting the semantic relationships (e.g. foreign key), established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.
KW - Blocking algorithms
KW - Data cleansing
KW - Data integration
KW - Data processing
KW - Record Linkage
KW - Semantic information
UR - http://www.scopus.com/inward/record.url?scp=84878050721&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878050721
SN - 9781586037987
T3 - Frontiers in Artificial Intelligence and Applications
SP - 141
EP - 149
BT - Artificial Intelligence Research and Development
T2 - 10th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2007
Y2 - 25 October 2007 through 26 October 2007
ER -