TY - GEN
T1 - Biogeographical Ancestry Inference from Genotype
T2 - 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020
AU - Qu, Yue
AU - Tran, Dat
AU - Martinez-Marroquin, Elisa
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - The biogeographical ancestry (BGA) information can provide supporting information in epidemiology and leading intelligence in forensics. Several sets of ancestral informative markers (AIM) have been proposed to facilitate the BGA inference. A small set of markers can improve efficiency though, it has limitations in their ability of balancing different populations and differentiating sub-populations. Genome-wide SNPs provide much more comprehensive information of an individual's ancestral information. In this paper, we study the problem of BGA inference under the abundance of genome-wide high density data. We studied 1043 individuals from 7 continental populations of the Human Genome Diversity Panel at 32212 genome-wide autosomal single nucleotide polymorphism (SNP) loci. We detected the population structure and compared the BGA inference accuracy using three widely used genetic sequence analysis algorithms through AIMs and genome-wide SNPs. Our results show that genome-wide SNPs reveal population structure with clearer clusterness and provide more accurate BGA inference, confirming the rich information carried by genome-wide SNPs. The findings help to give a clearer picture of candidate ancestral population groups of an individual, and potentially help the BGA inference in a fine population scale.
AB - The biogeographical ancestry (BGA) information can provide supporting information in epidemiology and leading intelligence in forensics. Several sets of ancestral informative markers (AIM) have been proposed to facilitate the BGA inference. A small set of markers can improve efficiency though, it has limitations in their ability of balancing different populations and differentiating sub-populations. Genome-wide SNPs provide much more comprehensive information of an individual's ancestral information. In this paper, we study the problem of BGA inference under the abundance of genome-wide high density data. We studied 1043 individuals from 7 continental populations of the Human Genome Diversity Panel at 32212 genome-wide autosomal single nucleotide polymorphism (SNP) loci. We detected the population structure and compared the BGA inference accuracy using three widely used genetic sequence analysis algorithms through AIMs and genome-wide SNPs. Our results show that genome-wide SNPs reveal population structure with clearer clusterness and provide more accurate BGA inference, confirming the rich information carried by genome-wide SNPs. The findings help to give a clearer picture of candidate ancestral population groups of an individual, and potentially help the BGA inference in a fine population scale.
KW - Biogeographical ancestry (BGA)
KW - Convolutional Neural network (CNN)
KW - Genome-wide analysis
KW - Hidden Markov Model (HMM)
KW - Support Vector Machine (SVM)
UR - http://www.scopus.com/inward/record.url?scp=85099677180&partnerID=8YFLogxK
U2 - 10.1109/SSCI47803.2020.9308171
DO - 10.1109/SSCI47803.2020.9308171
M3 - Conference contribution
AN - SCOPUS:85099677180
T3 - 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020
SP - 64
EP - 70
BT - 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 December 2020 through 4 December 2020
ER -