Detecting SNP markers discriminating horse breeds by deep learning.
Abstract: The assignment of an individual to the true population of origin using a low-panel of discriminant SNP markers is one of the most important applications of genomic data for practical use. The aim of this study was to evaluate the potential of different Artificial Neural Networks (ANNs) approaches consisting Deep Neural Networks (DNN), Garson and Olden methods for feature selection of informative SNP markers from high-throughput genotyping data, that would be able to trace the true breed of unknown samples. The total of 795 animals from 37 breeds, genotyped by using the Illumina SNP 50k Bead chip were used in the current study and principal component analysis (PCA), log-likelihood ratios (LLR) and Neighbor-Joining (NJ) were applied to assess the performance of different assignment methods. The results revealed that the DNN, Garson, and Olden methods are able to assign individuals to true populations with 4270, 4937, and 7999 SNP markers, respectively. The PCA was used to determine how the animals allocated to the groups using all genotyped markers available on 50k Bead chip and the subset of SNP markers identified with different methods. The results indicated that all SNP panels are able to assign individuals into their true breeds. The success percentage of genetic assignment for different methods assessed by different levels of LLR showed that the success rate of 70% in the analysis was obtained by three methods with the number of markers of 110, 208, and 178 tags for DNN, Garson, and Olden methods, respectively. Also the results showed that DNN performed better than other two approaches by achieving 93% accuracy at the most stringent threshold. Finally, the identified SNPs were successfully used in independent out-group breeds consisting 120 individuals from eight breeds and the results indicated that these markers are able to correctly allocate all unknown samples to true population of origin. Furthermore, the NJ tree of allele-sharing distances on the validation dataset showed that the DNN has a high potential for feature selection. In general, the results of this study indicated that the DNN technique represents an efficient strategy for selecting a reduced pool of highly discriminant markers for assigning individuals to the true population of origin.
© 2023. The Author(s).
Publication Date: 2023-07-18 PubMed ID: 37464049PubMed Central: PMC10354035DOI: 10.1038/s41598-023-38601-zGoogle Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
- Journal Article
Summary
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
This research study seeks to accurately identify the breed of horses through the use of genomic data and machine learning models like Deep Neural Networks. The studies offers details on the efficiency of these models in selecting relevant SNP markers that can be used for precise breed identification.
Study Objective and Data
- The goal of this research was to assess the capabilities of various Artificial Neural Networks (ANNs), specifically Deep Neural Networks (DNN), Garson and Olden methods, in selecting relevant SNP (Single Nucleotide Polymorphisms) markers from comprehensive genotyping data.
- A total of 795 horses from 37 different breeds were genotyped using the Illumina SNP 50k Bead chip for this investigation.
Implementation and Results
- The researchers applied Principal Component Analysis (PCA), log-likelihood ratios (LLR), and Neighbor-Joining (NJ) to examine the efficiency of the different assignment methods.
- These methods showed that there are a possible 4270, 4937, and 7999 SNP markers for the DNN, Garson, and Olden methods respectively that could accurately assign individuals to true populations.
- The effectiveness of genetic assignment for different methods was gauged at different LLR levels, with a 70% success rate achieved with SNP markers of 110, 208, and 178 for the DNN, Garson, and Olden methods respectively.
- The DNN method outperformed the others, achieving a 93% accuracy rate at the most stringent threshold.
Conclusion and Further Applications
- The discovered SNPs were successfully applied to independent out-group breeds consisting of 120 individuals from eight breeds, effectively confirming their potential in correctly allocating all unknown samples to their true origin population.
- From the validation dataset, it was also found that the DNN model has a high potential for feature selection as highlighted in the NJ tree of allele-sharing distances.
- In conclusion, the study showed that the DNN technique serves as an efficient strategy in selecting a reduced highly discriminant set of markers for assigning individuals to their correct population of origin.
Cite This Article
APA
Manzoori S, Farahani AHK, Moradi MH, Kazemi-Bonchenari M.
(2023).
Detecting SNP markers discriminating horse breeds by deep learning.
Sci Rep, 13(1), 11592.
https://doi.org/10.1038/s41598-023-38601-z Publication
Researcher Affiliations
- Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.
- Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran. a-farahani@araku.ac.ir.
- Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.
- Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.
MeSH Terms
- Horses / genetics
- Animals
- Polymorphism, Single Nucleotide
- Deep Learning
- Plant Breeding
- Genotype
- Alleles
Conflict of Interest Statement
The authors declare no competing interests.
References
This article includes 52 references
- Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA.. Genomics 2016 Jan;107(1):1-8.
- Dimauro C. Selection of discriminant SNP markers for breed and geographic assignment of Italian sheep. Small Ruminant Res 2015;128:27–33.
- Ganal MW, Altmann T, Röder MS. SNP identification in crop plants.. Curr Opin Plant Biol 2009 Apr;12(2):211-7.
- Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, Drineas P. PCA-correlated SNPs for structure identification in worldwide human populations.. PLoS Genet 2007 Sep;3(9):1672-86.
- Gautier M, Flori L, Riebler A, Jaffrézic F, Laloé D, Gut I, Moazami-Goudarzi K, Foulley JL. A whole genome Bayesian scan for adaptive genetic divergence in West African cattle.. BMC Genomics 2009 Nov 21;10:550.
- Dimauro C, Cellesi M, Steri R, Gaspa G, Sorbolini S, Stella A, Macciotta NP. Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes.. Anim Genet 2013 Aug;44(4):377-82.
- Petersen JL, Mickelson JR, Cothran EG, Andersson LS, Axelsson J, Bailey E, Bannasch D, Binns MM, Borges AS, Brama P, da Câmara Machado A, Distl O, Felicetti M, Fox-Clipsham L, Graves KT, Guérin G, Haase B, Hasegawa T, Hemmann K, Hill EW, Leeb T, Lindgren G, Lohi H, Lopes MS, McGivney BA, Mikko S, Orr N, Penedo MC, Piercy RJ, Raekallio M, Rieder S, Røed KH, Silvestrelli M, Swinburne J, Tozaki T, Vaudin M, M Wade C, McCue ME. Genetic diversity in the modern horse illustrated from genome-wide SNP data.. PLoS One 2013;8(1):e54997.
- Boutorh A, Guessoum A. Complex diseases SNP selection and classification by hybrid association rule mining and artificial neural network-based evolutionary algorithms. Eng. Appl. Artif. Intell. 2016;51:58–70.
- Lewis J, Abas Z, Dadousis C, Lykidis D, Paschou P, Drineas P. Tracing cattle breeds with principal components analysis ancestry informative SNPs.. PLoS One 2011 Apr 7;6(4):e18007.
- Meenachi L, Ramakrishnan S. Metaheuristic search based feature selection methods for classification of cancer. Pattern Recogn. 2021;119:108079.
- Paul D, Saha S, Mathew J. Fusion of evolvable genome structure and multi-objective optimization for subspace clustering. Pattern Recogn. 2019;95:58–71.
- He, J. & Zelikovsky, A. In The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2840–2843 (IEEE).
- Arbib MA. The Handbook of Brain Theory and Neural Networks. MIT press, 2003.
- Dean J. Large scale distributed deep networks. Advances in Neural Information Processing Systems 25 (2012).
- Min S, Lee B, Yoon S. Deep learning in bioinformatics.. Brief Bioinform 2017 Sep 1;18(5):851-869.
- Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning.. Bioinformatics 2018 Mar 1;34(5):760-769.
- Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning.. Bioinformatics 2019 Aug 15;35(16):2766-2773.
- Di Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction.. Bioinformatics 2012 Oct 1;28(19):2449-57.
- Sanzogni L, Kerr D. Milk production estimates using feed forward artificial neural networks. Comput. Electron. Agric. 2001;32:21–30.
- Torres M, Hervás C, Amador F. Approximating the sheep milk production curve through the use of artificial neural networks and genetic algorithms. Comput. Oper. Res. 2005;32:2653–2670.
- Fernández C, Soria E, Martin J, Serrano AJ. Neural networks for animal science applications: Two case studies. Expert Syst. Appl. 2006;31:444–450.
- Ince D, Sofu A. Estimation of lactation milk yield of Awassi sheep with artificial neural network modeling. Small Ruminant Res. 2013;113:15–19.
- Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 2003;160:249–264.
- Olden JD, Joy MK, Death RG. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol. Model. 2004;178:389–397.
- Ibrahim O. A comparison of methods for assessing the relative importance of input variables in artificial neural networks. J. Appl. Sci. Res. 2013;9:5692–5700.
- Fischer A. How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecol. Model. 2015;309:60–63.
- Kemp SJ, Zaradic P, Hansen F. An approach for determining relative input parameter importance and significance in artificial neural networks. Ecol. Model. 2007;204:326–334.
- Paliwal M, Kumar UA. Assessing the contribution of variables in feed forward neural network. Appl. Soft Comput. 2011;11:3690–3696.
- De Oña J, Garrido C. Extracting the contribution of independent variables in neural network models: A new approach to handle instability. Neural Comput. Appl. 2014;25:859–869.
- Ringnér M. What is principal component analysis?. Nat Biotechnol 2008 Mar;26(3):303-4.
- Paetkau D, Calvert W, Stirling I, Strobeck C. Microsatellite analysis of population structure in Canadian polar bears.. Mol Ecol 1995 Jun;4(3):347-54.
- Maudet C, Luikart G, Taberlet P. Genetic diversity and assignment tests among seven French cattle breeds based on microsatellite DNA analysis.. J Anim Sci 2002 Apr;80(4):942-50.
- Ciampolini R, Cetica V, Ciani E, Mazzanti E, Fosella X, Marroni F, Biagetti M, Sebastiani C, Papa P, Filippini G, Cianci D, Presciuttini S. Statistical analysis of individual assignment tests among four cattle breeds using fifteen STR loci.. J Anim Sci 2006 Jan;84(1):11-9.
- Negrini R, Nijman IJ, Milanesi E, Moazami-Goudarzi K, Williams JL, Erhardt G, Dunner S, Rodellar C, Valentini A, Bradley DG, Olsaker I, Kantanen J, Ajmone-Marsan P, Lenstra JA. Differentiation of European cattle by AFLP fingerprinting.. Anim Genet 2007 Feb;38(1):60-6.
- Negrini R, Milanesi E, Colli L, Pellecchia M, Nicoloso L, Crepaldi P, Lenstra JA, Ajmone-Marsan P. Breed assignment of Italian cattle using biallelic AFLP markers.. Anim Genet 2007 Apr;38(2):147-53.
- McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS. An assessment of population structure in eight breeds of cattle using a whole genome SNP panel.. BMC Genet 2008 May 20;9:37.
- Negrini R, Nicoloso L, Crepaldi P, Milanesi E, Colli L, Chegdani F, Pariset L, Dunner S, Leveziel H, Williams JL, Ajmone Marsan P. Assessing SNP markers for assigning individuals to cattle populations.. Anim Genet 2009 Feb;40(1):18-26.
- Wilkinson S, Wiener P, Archibald AL, Law A, Schnabel RD, McKay SD, Taylor JF, Ogden R. Evaluation of approaches for identifying population informative markers from high density SNP chips.. BMC Genet 2011 May 13;12:45.
- Milne, L. In AI-Conference 571–571 (World Scientific Publishing).
- Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.. Front Genet 2018;9:237.
- Schaefer RJ, Schubert M, Bailey E, Bannasch DL, Barrey E, Bar-Gal GK, Brem G, Brooks SA, Distl O, Fries R, Finno CJ, Gerber V, Haase B, Jagannathan V, Kalbfleisch T, Leeb T, Lindgren G, Lopes MS, Mach N, da Câmara Machado A, MacLeod JN, McCoy A, Metzger J, Penedo C, Polani S, Rieder S, Tammen I, Tetens J, Thaller G, Verini-Supplizi A, Wade CM, Wallner B, Orlando L, Mickelson JR, McCue ME. Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds.. BMC Genomics 2017 Jul 27;18(1):565.
- Rumelhart DE, Hinton GE, Williams RJ. Learning representation by back-propagation errors. Nature 1986.
- Cilimkovic M. Neural networks and back propagation algorithm. Institute of Technology Blanchardstown, Blanchardstown Road North Dublin15 (2015).
- Stefan Fritsch, Guenther F. neuralnet: Training of Neural Networks. https://journal.r-project.org/archive/2010/RJ-2010-006/index.html (2016).
- Beck MW. NeuralNetTools: Visualization and Analysis Tools for Neural Networks.. J Stat Softw 2018;85(11):1-20.
- R. Core, T.. R: A Language and Environment for Statistical Computing. https://www.R-project.org/ (2017).
- Garson GD. Interpreting neural-network connection weights. AI Expert 1991;6:46–51.
- Goh ATC. Back-propagation neural networks for modeling complex systems. Artif. Intell. Eng. 1995;9:143–151.
- Olden JD, Jackson DA. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 2002;154:135–150.
- Sheela KG, Deepa SN. Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013;2013:11.
- Rannala B, Mountain JL. Detecting immigration by using multilocus genotypes.. Proc Natl Acad Sci U S A 1997 Aug 19;94(17):9197-201.
- Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multilocus genotypes to select or exclude populations as origins of individuals.. Genetics 1999 Dec;153(4):1989-2000.
Citations
This article has been cited 3 times.- Li C, Xu S, Li D, Hu X, Jia B. A Boruta-SMOTE Integrated Approach for Rapid Donkey Breed Classification Using SNP Data: Addressing High-Dimensionality and Small Sample Challenges. Biochem Genet 2026 Jan 14;.
- Kanaka KK, Ganguly I, Singh S, Kuralkar SV, Dixit S, Sukhija N, Goli RC. RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds. Biochem Genet 2025 Aug 22;.
- Degen B, Yanbaev Y, Müller NA. Machine learning techniques for continuous genetic assignment of geographic origin of forest trees. PLoS One 2025;20(6):e0324994.
Use Nutrition Calculator
Check if your horse's diet meets their nutrition requirements with our easy-to-use tool Check your horse's diet with our easy-to-use tool
Talk to a Nutritionist
Discuss your horse's feeding plan with our experts over a free phone consultation Discuss your horse's diet over a phone consultation
Submit Diet Evaluation
Get a customized feeding plan for your horse formulated by our equine nutritionists Get a custom feeding plan formulated by our nutritionists