TY - GEN
T1 - Attribute selection in multivariate microaggregation
AU - Nin, J.
AU - Herranz, Javier
AU - Torra, Vicenç
PY - 2008
Y1 - 2008
N2 - Microaggregation is one of the most employed microdata protection methods. The idea is to build clusters of at least k original records, and then replace them with the centroid of the cluster. When the number of attributes of the dataset is large, a common practice is to split the dataset into smaller blocks of attributes. Microaggregation is successively and independently applied to each block. In this way, the effect of the noise introduced by microaggregation is reduced, but at the cost of losing the k-anonymity property. The goal of this work is to show that, besides of the specific microaggregation method employed, the value of the parameter k, and the number of blocks in which the dataset is split, there exists another factor which can influence the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and, so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.
AB - Microaggregation is one of the most employed microdata protection methods. The idea is to build clusters of at least k original records, and then replace them with the centroid of the cluster. When the number of attributes of the dataset is large, a common practice is to split the dataset into smaller blocks of attributes. Microaggregation is successively and independently applied to each block. In this way, the effect of the noise introduced by microaggregation is reduced, but at the cost of losing the k-anonymity property. The goal of this work is to show that, besides of the specific microaggregation method employed, the value of the parameter k, and the number of blocks in which the dataset is split, there exists another factor which can influence the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and, so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.
KW - attribute selection
KW - microaggregation
KW - statistical disclosure control
UR - http://www.scopus.com/inward/record.url?scp=63749093602&partnerID=8YFLogxK
U2 - 10.1145/1379287.1379299
DO - 10.1145/1379287.1379299
M3 - Conference contribution
AN - SCOPUS:63749093602
SN - 9781595939654
T3 - ACM International Conference Proceeding Series
SP - 51
EP - 60
BT - PAIS'08 - Post-proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, Co-located with EDBT 2008
T2 - 1st International Workshop on Privacy and Anonymity in the Information Society, PAIS 2008, Collocated with 11th International Conference on Extending Databse Technology, EDBT 2008
Y2 - 29 March 2008 through 29 March 2008
ER -