TY - JOUR
T1 - Analysis of the univariate microaggregation disclosure risk
AU - Nin, J.
AU - Torra, Vicenç
N1 - Funding Information:
ordiJ Nin .:Ph.D He received his BSc. in 2004, MSc. in 2007, and Ph.D degree in 2008, all in Computer Science. He is a postdoctoral researcher at the Artificial Intelligence Research Institute (IIIA-CSIC) near Barcelona, Catalonia, Spain. His fields of interests are privacy technologies, machine learning and soft computing tools. He has been involved in several research projects funded by the Catalan and Spanish governments and the European Community. His research has been published in specialized journals and major conferences (around 30 papers).
Funding Information:
Partial support by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02) and by the Government of Catalunya (grant 2005-SGR-00093) is acknowledged. Jordi Nin thanks the Spanish National Research Council (CSIC) for his I3P grant.
PY - 2009/5
Y1 - 2009/5
N2 - Microaggregation is a protection method used by statistical agencies to limit the disclosure risk of confidential information. Formally, microaggregation assigns each original datum to a small cluster and then replaces the original data with the centroid of such cluster. As clusters contain at least k records, microaggregation can be considered as preserving k-anonymity. Nevertheless, this is only so when multivariate microaggregation is applied and, moreover, when all variables are microaggregated at the same time. When different variables are protected using univariate microaggregation, k-anonymity is only ensured at the variable level. Therefore, the real k-anonymity decreases for most of the records and it is then possible to cause a leakage of privacy. Due to this, the analysis of the disclosure risk is still meaningful in microaggregation. This paper proposes a new record linkage method for univariate microaggregation based on finding the optimal alignment between the original and the protected sorted variables. We show that our method, which uses a DTW distance to compute the optimal alignment, provides the intruder with enough information in many cases to to decide if the link is correct or not. Note that, standard record linkage methods never ensure the correctness of the linkage. Furthermore, we present some experiments using two well-known data sets, which show that our method has better results (larger number of correct links) than the best standard record linkage method.
AB - Microaggregation is a protection method used by statistical agencies to limit the disclosure risk of confidential information. Formally, microaggregation assigns each original datum to a small cluster and then replaces the original data with the centroid of such cluster. As clusters contain at least k records, microaggregation can be considered as preserving k-anonymity. Nevertheless, this is only so when multivariate microaggregation is applied and, moreover, when all variables are microaggregated at the same time. When different variables are protected using univariate microaggregation, k-anonymity is only ensured at the variable level. Therefore, the real k-anonymity decreases for most of the records and it is then possible to cause a leakage of privacy. Due to this, the analysis of the disclosure risk is still meaningful in microaggregation. This paper proposes a new record linkage method for univariate microaggregation based on finding the optimal alignment between the original and the protected sorted variables. We show that our method, which uses a DTW distance to compute the optimal alignment, provides the intruder with enough information in many cases to to decide if the link is correct or not. Note that, standard record linkage methods never ensure the correctness of the linkage. Furthermore, we present some experiments using two well-known data sets, which show that our method has better results (larger number of correct links) than the best standard record linkage method.
KW - DTW Distance
KW - Microaggregation
KW - Privacy Preserving Data Mining
KW - Privacy on Statistical Databases
KW - Record Linkage
UR - http://www.scopus.com/inward/record.url?scp=68949158563&partnerID=8YFLogxK
U2 - 10.1007/s00354-007-0061-1
DO - 10.1007/s00354-007-0061-1
M3 - Article
AN - SCOPUS:68949158563
SN - 0288-3635
VL - 27
SP - 197
EP - 214
JO - New Generation Computing
JF - New Generation Computing
IS - 3
ER -