TY - GEN
T1 - On the use of semantic blocking techniques for data cleansing and integration
AU - Nin, J.
AU - Muntés-Mulero, Victor
AU - Martínez-Bazan, Norbert
AU - Larriba-Pey, Josep L.
PY - 2007
Y1 - 2007
N2 - Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.
AB - Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.
KW - Blocking algorithms
KW - Data cleansing
KW - Data integration
KW - Record linkage
KW - Semantic information
UR - http://www.scopus.com/inward/record.url?scp=47949115568&partnerID=8YFLogxK
U2 - 10.1109/IDEAS.2007.4318104
DO - 10.1109/IDEAS.2007.4318104
M3 - Conference contribution
AN - SCOPUS:47949115568
SN - 076952947X
SN - 9780769529479
T3 - Proceedings of the International Database Engineering and Applications Symposium, IDEAS
SP - 190
EP - 198
BT - 11th International Database Engineering and Applications Symposium Proceedings, IDEAS
T2 - 11th International Database Engineering and Applications Symposium - IDEAS'2007
Y2 - 6 September 2007 through 8 September 2007
ER -