TY - GEN
T1 - Ordered data set vectorization for linear regression on data privacy
AU - Medrano-Gracia, Pau
AU - Pont-Tuset, Jordi
AU - Nin, Jordi
AU - Muntés-Mulero, Victor
PY - 2007
Y1 - 2007
N2 - Many situations demand from publishing data without revealing the confidential information in it. Among several data protection methods proposed in the literature, those based on linear regression are widely used for numerical data. The main objective of these methods is to minimize both the disclosure risk (DR) and the information lost (IL). However, most of these techniques try to protect the nonconfidential attributes based on the values of the confidential attributes in the data set. In this situation, when these two sets of attributes are strongly correlated, the possibility of an intruder to reveal confidential data increases, making these methods unsuitable for many typical scenarios. In this paper we propose a new type of methods called LiROP-k methods that, based on linear regression, avoid the problems derived from the correlation between attributes in the data set. We propose the vectorization, sorting and partitioning of all values in the attributes to be protected in the data set, breaking the semantics of these attributes inside the record. We present two different protection methods: a synthetic protection method called LiROPs-k and a perturbative method, called LiROPp-k. We show that, when the attributes in the data set are highly correlated, our methods present lower DR than other protection methods based on linear regression.
AB - Many situations demand from publishing data without revealing the confidential information in it. Among several data protection methods proposed in the literature, those based on linear regression are widely used for numerical data. The main objective of these methods is to minimize both the disclosure risk (DR) and the information lost (IL). However, most of these techniques try to protect the nonconfidential attributes based on the values of the confidential attributes in the data set. In this situation, when these two sets of attributes are strongly correlated, the possibility of an intruder to reveal confidential data increases, making these methods unsuitable for many typical scenarios. In this paper we propose a new type of methods called LiROP-k methods that, based on linear regression, avoid the problems derived from the correlation between attributes in the data set. We propose the vectorization, sorting and partitioning of all values in the attributes to be protected in the data set, breaking the semantics of these attributes inside the record. We present two different protection methods: a synthetic protection method called LiROPs-k and a perturbative method, called LiROPp-k. We show that, when the attributes in the data set are highly correlated, our methods present lower DR than other protection methods based on linear regression.
KW - Linear regression masking methods
KW - Privacy in statistical databases
KW - Privacy preserving data mining
KW - Statistical disclosure risk
UR - http://www.scopus.com/inward/record.url?scp=37249091183&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73729-2_34
DO - 10.1007/978-3-540-73729-2_34
M3 - Conference contribution
AN - SCOPUS:37249091183
SN - 9783540737285
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 361
EP - 372
BT - Modeling Decisions for Artificial Intelligence - 4th International Conference, MDAI 2007, Proceedings
PB - Springer Verlag
T2 - 4th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2007
Y2 - 16 August 2007 through 18 August 2007
ER -