TY - JOUR
T1 - How to group attributes in multivariate microaggregation
AU - Nin, J.
AU - Herranz, Javier
AU - Torra, Vicenç
N1 - Funding Information:
Partial support by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02) and by the Government of Catalonia (grant 2005-SGR-00093) is acknowledged. Jordi Nin wants to thank the Spanish Council for Scientific Research (CSIC) for his I3P grant.
PY - 2008/4
Y1 - 2008/4
N2 - Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the k-anonymity property. In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.
AB - Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the k-anonymity property. In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.
KW - Attribute selection
KW - Microaggregation
KW - Statistical disclosure control
UR - http://www.scopus.com/inward/record.url?scp=44349134440&partnerID=8YFLogxK
U2 - 10.1142/S0218488508005285
DO - 10.1142/S0218488508005285
M3 - Article
AN - SCOPUS:44349134440
SN - 0218-4885
VL - 16
SP - 121
EP - 138
JO - International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems
JF - International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems
IS - SUPPL. 1
ER -