TY - JOUR
T1 - Classifying data from protected statistical datasets
AU - Herranz, Javier
AU - Matwin, Stan
AU - Nin, J.
AU - Torra, Vicen
N1 - Funding Information:
Partial support by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02 ) is acknowledged. Javier Herranz enjoys a Ramón y Cajal grant, partially funded by the European Social Fund (ESF) , from Spanish MICINN Ministry. Jordi Nin is supported by the European Community through the 7th Framework Programme Marie Curie Intra-European fellowship, contract No 235226 . Stan Matwin's research is supported in part by the Natural Sciences and Engineering Research Council of Canada.
Funding Information:
Jordi Nin (Barcelona, Catalonia, 1979; BSc 2004, MSc 2007, PhD 2008 all in Computer Science) is a post-doctoral researcher at the Laboratoire d’Analyse et d’Architecture des Systèmes (LAAS-CNRS) Toulouse, France. His fields of interests are privacy technologies, machine learning and soft computing tools. He has been involved in several research projects funded by the Catalan and Spanish governments and the European Community. His research has been published in specialized journals and major conferences (around 50 papers).
PY - 2010/11
Y1 - 2010/11
N2 - Statistical Disclosure Control (SDC) is an active research area in the recent years. The goal is to transform an original dataset X into a protected one X′, such that X′ does not reveal any relation between confidential and (quasi-)identifier attributes and such that X′ can be used to compute reliable statistical information about X. Many specific protection methods have been proposed and analyzed, with respect to the levels of privacy and utility that they offer. However, when measuring utility, only differences between the statistical values of X and X′ are considered. This would indicate that datasets protected by SDC methods can be used only for statistical purposes. We show in this paper that this is not the case, because a protected dataset X′ can be used to construct good classifiers for future data. To do so, we describe an extensive set of experiments that we have run with different SDC protection methods and different (real) datasets. In general, the resulting classifiers are very good, which is good news for both the SDC and the Privacy-preserving Data Mining communities. In particular, our results question the necessity of some specific protection methods that have appeared in the privacy-preserving data mining (PPDM) literature with the clear goal of providing good classification.
AB - Statistical Disclosure Control (SDC) is an active research area in the recent years. The goal is to transform an original dataset X into a protected one X′, such that X′ does not reveal any relation between confidential and (quasi-)identifier attributes and such that X′ can be used to compute reliable statistical information about X. Many specific protection methods have been proposed and analyzed, with respect to the levels of privacy and utility that they offer. However, when measuring utility, only differences between the statistical values of X and X′ are considered. This would indicate that datasets protected by SDC methods can be used only for statistical purposes. We show in this paper that this is not the case, because a protected dataset X′ can be used to construct good classifiers for future data. To do so, we describe an extensive set of experiments that we have run with different SDC protection methods and different (real) datasets. In general, the resulting classifiers are very good, which is good news for both the SDC and the Privacy-preserving Data Mining communities. In particular, our results question the necessity of some specific protection methods that have appeared in the privacy-preserving data mining (PPDM) literature with the clear goal of providing good classification.
KW - Classification methods
KW - Disclosure risk
KW - Information loss
KW - Statistical disclosure control
KW - WEKA experiments
UR - http://www.scopus.com/inward/record.url?scp=78449301618&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2010.05.005
DO - 10.1016/j.cose.2010.05.005
M3 - Article
AN - SCOPUS:78449301618
SN - 0167-4048
VL - 29
SP - 875
EP - 890
JO - Computers and Security
JF - Computers and Security
IS - 8
ER -