Abstract: The assignment of an individual to the true population of origin using a low-panel of discriminant SNP markers is one of the most important applications of genomic data for practical use. The aim of this study was to evaluate the potential of different Artificial Neural Networks (ANNs) approaches consisting Deep Neural Networks (DNN), Garson and Olden methods for feature selection of informative SNP markers from high-throughput genotyping data, that would be able to trace the true breed of unknown samples. The total of 795 animals from 37 breeds, genotyped by using the Illumina SNP 50k Bead chip were used in the current study and principal component analysis (PCA), log-likelihood ratios (LLR) and Neighbor-Joining (NJ) were applied to assess the performance of different assignment methods. The results revealed that the DNN, Garson, and Olden methods are able to assign individuals to true populations with 4270, 4937, and 7999 SNP markers, respectively. The PCA was used to determine how the animals allocated to the groups using all genotyped markers available on 50k Bead chip and the subset of SNP markers identified with different methods. The results indicated that all SNP panels are able to assign individuals into their true breeds. The success percentage of genetic assignment for different methods assessed by different levels of LLR showed that the success rate of 70% in the analysis was obtained by three methods with the number of markers of 110, 208, and 178 tags for DNN, Garson, and Olden methods, respectively. Also the results showed that DNN performed better than other two approaches by achieving 93% accuracy at the most stringent threshold. Finally, the identified SNPs were successfully used in independent out-group breeds consisting 120 individuals from eight breeds and the results indicated that these markers are able to correctly allocate all unknown samples to true population of origin. Furthermore, the NJ tree of allele-sharing distances on the validation dataset showed that the DNN has a high potential for feature selection. In general, the results of this study indicated that the DNN technique represents an efficient strategy for selecting a reduced pool of highly discriminant markers for assigning individuals to the true population of origin.
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
This research study seeks to accurately identify the breed of horses through the use of genomic data and machine learning models like Deep Neural Networks. The studies offers details on the efficiency of these models in selecting relevant SNP markers that can be used for precise breed identification.
Study Objective and Data
The goal of this research was to assess the capabilities of various Artificial Neural Networks (ANNs), specifically Deep Neural Networks (DNN), Garson and Olden methods, in selecting relevant SNP (Single Nucleotide Polymorphisms) markers from comprehensive genotyping data.
A total of 795 horses from 37 different breeds were genotyped using the Illumina SNP 50k Bead chip for this investigation.
Implementation and Results
The researchers applied Principal Component Analysis (PCA), log-likelihood ratios (LLR), and Neighbor-Joining (NJ) to examine the efficiency of the different assignment methods.
These methods showed that there are a possible 4270, 4937, and 7999 SNP markers for the DNN, Garson, and Olden methods respectively that could accurately assign individuals to true populations.
The effectiveness of genetic assignment for different methods was gauged at different LLR levels, with a 70% success rate achieved with SNP markers of 110, 208, and 178 for the DNN, Garson, and Olden methods respectively.
The DNN method outperformed the others, achieving a 93% accuracy rate at the most stringent threshold.
Conclusion and Further Applications
The discovered SNPs were successfully applied to independent out-group breeds consisting of 120 individuals from eight breeds, effectively confirming their potential in correctly allocating all unknown samples to their true origin population.
From the validation dataset, it was also found that the DNN model has a high potential for feature selection as highlighted in the NJ tree of allele-sharing distances.
In conclusion, the study showed that the DNN technique serves as an efficient strategy in selecting a reduced highly discriminant set of markers for assigning individuals to their correct population of origin.
Cite This Article
APA
Manzoori S, Farahani AHK, Moradi MH, Kazemi-Bonchenari M.
(2023).
Detecting SNP markers discriminating horse breeds by deep learning.
Sci Rep, 13(1), 11592.
https://doi.org/10.1038/s41598-023-38601-z
Gautier M, Flori L, Riebler A, Jaffrézic F, Laloé D, Gut I, Moazami-Goudarzi K, Foulley JL. A whole genome Bayesian scan for adaptive genetic divergence in West African cattle.. BMC Genomics 2009 Nov 21;10:550.
Dimauro C, Cellesi M, Steri R, Gaspa G, Sorbolini S, Stella A, Macciotta NP. Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes.. Anim Genet 2013 Aug;44(4):377-82.
Petersen JL, Mickelson JR, Cothran EG, Andersson LS, Axelsson J, Bailey E, Bannasch D, Binns MM, Borges AS, Brama P, da Câmara Machado A, Distl O, Felicetti M, Fox-Clipsham L, Graves KT, Guérin G, Haase B, Hasegawa T, Hemmann K, Hill EW, Leeb T, Lindgren G, Lohi H, Lopes MS, McGivney BA, Mikko S, Orr N, Penedo MC, Piercy RJ, Raekallio M, Rieder S, Røed KH, Silvestrelli M, Swinburne J, Tozaki T, Vaudin M, M Wade C, McCue ME. Genetic diversity in the modern horse illustrated from genome-wide SNP data.. PLoS One 2013;8(1):e54997.
Boutorh A, Guessoum A. Complex diseases SNP selection and classification by hybrid association rule mining and artificial neural network-based evolutionary algorithms. Eng. Appl. Artif. Intell. 2016;51:58–70.
Lewis J, Abas Z, Dadousis C, Lykidis D, Paschou P, Drineas P. Tracing cattle breeds with principal components analysis ancestry informative SNPs.. PLoS One 2011 Apr 7;6(4):e18007.
Paul D, Saha S, Mathew J. Fusion of evolvable genome structure and multi-objective optimization for subspace clustering. Pattern Recogn. 2019;95:58–71.
Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning.. Bioinformatics 2018 Mar 1;34(5):760-769.
Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning.. Bioinformatics 2019 Aug 15;35(16):2766-2773.
Torres M, Hervás C, Amador F. Approximating the sheep milk production curve through the use of artificial neural networks and genetic algorithms. Comput. Oper. Res. 2005;32:2653–2670.
Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 2003;160:249–264.
Olden JD, Joy MK, Death RG. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol. Model. 2004;178:389–397.
Ibrahim O. A comparison of methods for assessing the relative importance of input variables in artificial neural networks. J. Appl. Sci. Res. 2013;9:5692–5700.
Fischer A. How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecol. Model. 2015;309:60–63.
Kemp SJ, Zaradic P, Hansen F. An approach for determining relative input parameter importance and significance in artificial neural networks. Ecol. Model. 2007;204:326–334.
De Oña J, Garrido C. Extracting the contribution of independent variables in neural network models: A new approach to handle instability. Neural Comput. Appl. 2014;25:859–869.
Maudet C, Luikart G, Taberlet P. Genetic diversity and assignment tests among seven French cattle breeds based on microsatellite DNA analysis.. J Anim Sci 2002 Apr;80(4):942-50.
Ciampolini R, Cetica V, Ciani E, Mazzanti E, Fosella X, Marroni F, Biagetti M, Sebastiani C, Papa P, Filippini G, Cianci D, Presciuttini S. Statistical analysis of individual assignment tests among four cattle breeds using fifteen STR loci.. J Anim Sci 2006 Jan;84(1):11-9.
Negrini R, Milanesi E, Colli L, Pellecchia M, Nicoloso L, Crepaldi P, Lenstra JA, Ajmone-Marsan P. Breed assignment of Italian cattle using biallelic AFLP markers.. Anim Genet 2007 Apr;38(2):147-53.
McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS. An assessment of population structure in eight breeds of cattle using a whole genome SNP panel.. BMC Genet 2008 May 20;9:37.
Wilkinson S, Wiener P, Archibald AL, Law A, Schnabel RD, McKay SD, Taylor JF, Ogden R. Evaluation of approaches for identifying population informative markers from high density SNP chips.. BMC Genet 2011 May 13;12:45.
Milne, L. In AI-Conference 571–571 (World Scientific Publishing).
Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.. Front Genet 2018;9:237.
Olden JD, Jackson DA. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 2002;154:135–150.
Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multilocus genotypes to select or exclude populations as origins of individuals.. Genetics 1999 Dec;153(4):1989-2000.
Li C, Xu S, Li D, Hu X, Jia B. A Boruta-SMOTE Integrated Approach for Rapid Donkey Breed Classification Using SNP Data: Addressing High-Dimensionality and Small Sample Challenges. Biochem Genet 2026 Jan 14;.
Kanaka KK, Ganguly I, Singh S, Kuralkar SV, Dixit S, Sukhija N, Goli RC. RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds. Biochem Genet 2025 Aug 22;.
Degen B, Yanbaev Y, Müller NA. Machine learning techniques for continuous genetic assignment of geographic origin of forest trees. PLoS One 2025;20(6):e0324994.