Analyze Diet
Scientific reports2023; 13(1); 11592; doi: 10.1038/s41598-023-38601-z

Detecting SNP markers discriminating horse breeds by deep learning.

Abstract: The assignment of an individual to the true population of origin using a low-panel of discriminant SNP markers is one of the most important applications of genomic data for practical use. The aim of this study was to evaluate the potential of different Artificial Neural Networks (ANNs) approaches consisting Deep Neural Networks (DNN), Garson and Olden methods for feature selection of informative SNP markers from high-throughput genotyping data, that would be able to trace the true breed of unknown samples. The total of 795 animals from 37 breeds, genotyped by using the Illumina SNP 50k Bead chip were used in the current study and principal component analysis (PCA), log-likelihood ratios (LLR) and Neighbor-Joining (NJ) were applied to assess the performance of different assignment methods. The results revealed that the DNN, Garson, and Olden methods are able to assign individuals to true populations with 4270, 4937, and 7999 SNP markers, respectively. The PCA was used to determine how the animals allocated to the groups using all genotyped markers available on 50k Bead chip and the subset of SNP markers identified with different methods. The results indicated that all SNP panels are able to assign individuals into their true breeds. The success percentage of genetic assignment for different methods assessed by different levels of LLR showed that the success rate of 70% in the analysis was obtained by three methods with the number of markers of 110, 208, and 178 tags for DNN, Garson, and Olden methods, respectively. Also the results showed that DNN performed better than other two approaches by achieving 93% accuracy at the most stringent threshold. Finally, the identified SNPs were successfully used in independent out-group breeds consisting 120 individuals from eight breeds and the results indicated that these markers are able to correctly allocate all unknown samples to true population of origin. Furthermore, the NJ tree of allele-sharing distances on the validation dataset showed that the DNN has a high potential for feature selection. In general, the results of this study indicated that the DNN technique represents an efficient strategy for selecting a reduced pool of highly discriminant markers for assigning individuals to the true population of origin.
Publication Date: 2023-07-18 PubMed ID: 37464049PubMed Central: PMC10354035DOI: 10.1038/s41598-023-38601-zGoogle Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
  • Journal Article

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

This research study seeks to accurately identify the breed of horses through the use of genomic data and machine learning models like Deep Neural Networks. The studies offers details on the efficiency of these models in selecting relevant SNP markers that can be used for precise breed identification.

Study Objective and Data

  • The goal of this research was to assess the capabilities of various Artificial Neural Networks (ANNs), specifically Deep Neural Networks (DNN), Garson and Olden methods, in selecting relevant SNP (Single Nucleotide Polymorphisms) markers from comprehensive genotyping data.
  • A total of 795 horses from 37 different breeds were genotyped using the Illumina SNP 50k Bead chip for this investigation.

Implementation and Results

  • The researchers applied Principal Component Analysis (PCA), log-likelihood ratios (LLR), and Neighbor-Joining (NJ) to examine the efficiency of the different assignment methods.
  • These methods showed that there are a possible 4270, 4937, and 7999 SNP markers for the DNN, Garson, and Olden methods respectively that could accurately assign individuals to true populations.
  • The effectiveness of genetic assignment for different methods was gauged at different LLR levels, with a 70% success rate achieved with SNP markers of 110, 208, and 178 for the DNN, Garson, and Olden methods respectively.
  • The DNN method outperformed the others, achieving a 93% accuracy rate at the most stringent threshold.

Conclusion and Further Applications

  • The discovered SNPs were successfully applied to independent out-group breeds consisting of 120 individuals from eight breeds, effectively confirming their potential in correctly allocating all unknown samples to their true origin population.
  • From the validation dataset, it was also found that the DNN model has a high potential for feature selection as highlighted in the NJ tree of allele-sharing distances.
  • In conclusion, the study showed that the DNN technique serves as an efficient strategy in selecting a reduced highly discriminant set of markers for assigning individuals to their correct population of origin.

Cite This Article

APA
Manzoori S, Farahani AHK, Moradi MH, Kazemi-Bonchenari M. (2023). Detecting SNP markers discriminating horse breeds by deep learning. Sci Rep, 13(1), 11592. https://doi.org/10.1038/s41598-023-38601-z

Publication

ISSN: 2045-2322
NlmUniqueID: 101563288
Country: England
Language: English
Volume: 13
Issue: 1
Pages: 11592

Researcher Affiliations

Manzoori, Siavash
  • Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.
Farahani, Amir Hossein Khaltabadi
  • Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran. a-farahani@araku.ac.ir.
Moradi, Mohammad Hossein
  • Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.
Kazemi-Bonchenari, Mehdi
  • Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.

MeSH Terms

  • Horses / genetics
  • Animals
  • Polymorphism, Single Nucleotide
  • Deep Learning
  • Plant Breeding
  • Genotype
  • Alleles

Conflict of Interest Statement

The authors declare no competing interests.

References

This article includes 52 references
  1. Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA.. Genomics 2016 Jan;107(1):1-8.
    doi: 10.1016/j.ygeno.2015.11.003pmc: PMC4727787pubmed: 26554401google scholar: lookup
  2. Dimauro C. Selection of discriminant SNP markers for breed and geographic assignment of Italian sheep. Small Ruminant Res 2015;128:27–33.
  3. Ganal MW, Altmann T, Röder MS. SNP identification in crop plants.. Curr Opin Plant Biol 2009 Apr;12(2):211-7.
    doi: 10.1016/j.pbi.2008.12.009pubmed: 19186095google scholar: lookup
  4. Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, Drineas P. PCA-correlated SNPs for structure identification in worldwide human populations.. PLoS Genet 2007 Sep;3(9):1672-86.
  5. Gautier M, Flori L, Riebler A, Jaffrézic F, Laloé D, Gut I, Moazami-Goudarzi K, Foulley JL. A whole genome Bayesian scan for adaptive genetic divergence in West African cattle.. BMC Genomics 2009 Nov 21;10:550.
    doi: 10.1186/1471-2164-10-550pmc: PMC2784811pubmed: 19930592google scholar: lookup
  6. Dimauro C, Cellesi M, Steri R, Gaspa G, Sorbolini S, Stella A, Macciotta NP. Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes.. Anim Genet 2013 Aug;44(4):377-82.
    doi: 10.1111/age.12021pubmed: 23347105google scholar: lookup
  7. Petersen JL, Mickelson JR, Cothran EG, Andersson LS, Axelsson J, Bailey E, Bannasch D, Binns MM, Borges AS, Brama P, da Câmara Machado A, Distl O, Felicetti M, Fox-Clipsham L, Graves KT, Guérin G, Haase B, Hasegawa T, Hemmann K, Hill EW, Leeb T, Lindgren G, Lohi H, Lopes MS, McGivney BA, Mikko S, Orr N, Penedo MC, Piercy RJ, Raekallio M, Rieder S, Røed KH, Silvestrelli M, Swinburne J, Tozaki T, Vaudin M, M Wade C, McCue ME. Genetic diversity in the modern horse illustrated from genome-wide SNP data.. PLoS One 2013;8(1):e54997.
  8. Boutorh A, Guessoum A. Complex diseases SNP selection and classification by hybrid association rule mining and artificial neural network-based evolutionary algorithms. Eng. Appl. Artif. Intell. 2016;51:58–70.
  9. Lewis J, Abas Z, Dadousis C, Lykidis D, Paschou P, Drineas P. Tracing cattle breeds with principal components analysis ancestry informative SNPs.. PLoS One 2011 Apr 7;6(4):e18007.
  10. Meenachi L, Ramakrishnan S. Metaheuristic search based feature selection methods for classification of cancer. Pattern Recogn. 2021;119:108079.
  11. Paul D, Saha S, Mathew J. Fusion of evolvable genome structure and multi-objective optimization for subspace clustering. Pattern Recogn. 2019;95:58–71.
  12. He, J. & Zelikovsky, A. In The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2840–2843 (IEEE).
  13. Arbib MA. The Handbook of Brain Theory and Neural Networks. MIT press, 2003.
  14. Dean J. Large scale distributed deep networks. Advances in Neural Information Processing Systems 25 (2012).
  15. Min S, Lee B, Yoon S. Deep learning in bioinformatics.. Brief Bioinform 2017 Sep 1;18(5):851-869.
    pubmed: 27473064doi: 10.1093/bib/bbw068google scholar: lookup
  16. Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning.. Bioinformatics 2018 Mar 1;34(5):760-769.
  17. Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning.. Bioinformatics 2019 Aug 15;35(16):2766-2773.
  18. Di Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction.. Bioinformatics 2012 Oct 1;28(19):2449-57.
  19. Sanzogni L, Kerr D. Milk production estimates using feed forward artificial neural networks. Comput. Electron. Agric. 2001;32:21–30.
  20. Torres M, Hervás C, Amador F. Approximating the sheep milk production curve through the use of artificial neural networks and genetic algorithms. Comput. Oper. Res. 2005;32:2653–2670.
    doi: 10.1016/j.cor.2004.06.025google scholar: lookup
  21. Fernández C, Soria E, Martin J, Serrano AJ. Neural networks for animal science applications: Two case studies. Expert Syst. Appl. 2006;31:444–450.
  22. Ince D, Sofu A. Estimation of lactation milk yield of Awassi sheep with artificial neural network modeling. Small Ruminant Res. 2013;113:15–19.
  23. Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 2003;160:249–264.
  24. Olden JD, Joy MK, Death RG. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol. Model. 2004;178:389–397.
  25. Ibrahim O. A comparison of methods for assessing the relative importance of input variables in artificial neural networks. J. Appl. Sci. Res. 2013;9:5692–5700.
  26. Fischer A. How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecol. Model. 2015;309:60–63.
  27. Kemp SJ, Zaradic P, Hansen F. An approach for determining relative input parameter importance and significance in artificial neural networks. Ecol. Model. 2007;204:326–334.
  28. Paliwal M, Kumar UA. Assessing the contribution of variables in feed forward neural network. Appl. Soft Comput. 2011;11:3690–3696.
  29. De Oña J, Garrido C. Extracting the contribution of independent variables in neural network models: A new approach to handle instability. Neural Comput. Appl. 2014;25:859–869.
    doi: 10.1007/s00521-014-1573-5google scholar: lookup
  30. Ringnér M. What is principal component analysis?. Nat Biotechnol 2008 Mar;26(3):303-4.
    doi: 10.1038/nbt0308-303pubmed: 18327243google scholar: lookup
  31. Paetkau D, Calvert W, Stirling I, Strobeck C. Microsatellite analysis of population structure in Canadian polar bears.. Mol Ecol 1995 Jun;4(3):347-54.
  32. Maudet C, Luikart G, Taberlet P. Genetic diversity and assignment tests among seven French cattle breeds based on microsatellite DNA analysis.. J Anim Sci 2002 Apr;80(4):942-50.
    doi: 10.2527/2002.804942xpubmed: 12002331google scholar: lookup
  33. Ciampolini R, Cetica V, Ciani E, Mazzanti E, Fosella X, Marroni F, Biagetti M, Sebastiani C, Papa P, Filippini G, Cianci D, Presciuttini S. Statistical analysis of individual assignment tests among four cattle breeds using fifteen STR loci.. J Anim Sci 2006 Jan;84(1):11-9.
    doi: 10.2527/2006.84111xpubmed: 16361486google scholar: lookup
  34. Negrini R, Nijman IJ, Milanesi E, Moazami-Goudarzi K, Williams JL, Erhardt G, Dunner S, Rodellar C, Valentini A, Bradley DG, Olsaker I, Kantanen J, Ajmone-Marsan P, Lenstra JA. Differentiation of European cattle by AFLP fingerprinting.. Anim Genet 2007 Feb;38(1):60-6.
  35. Negrini R, Milanesi E, Colli L, Pellecchia M, Nicoloso L, Crepaldi P, Lenstra JA, Ajmone-Marsan P. Breed assignment of Italian cattle using biallelic AFLP markers.. Anim Genet 2007 Apr;38(2):147-53.
  36. McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS. An assessment of population structure in eight breeds of cattle using a whole genome SNP panel.. BMC Genet 2008 May 20;9:37.
    doi: 10.1186/1471-2156-9-37pmc: PMC2408608pubmed: 18492244google scholar: lookup
  37. Negrini R, Nicoloso L, Crepaldi P, Milanesi E, Colli L, Chegdani F, Pariset L, Dunner S, Leveziel H, Williams JL, Ajmone Marsan P. Assessing SNP markers for assigning individuals to cattle populations.. Anim Genet 2009 Feb;40(1):18-26.
  38. Wilkinson S, Wiener P, Archibald AL, Law A, Schnabel RD, McKay SD, Taylor JF, Ogden R. Evaluation of approaches for identifying population informative markers from high density SNP chips.. BMC Genet 2011 May 13;12:45.
    doi: 10.1186/1471-2156-12-45pmc: PMC3118130pubmed: 21569514google scholar: lookup
  39. Milne, L. In AI-Conference 571–571 (World Scientific Publishing).
  40. Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.. Front Genet 2018;9:237.
    doi: 10.3389/fgene.2018.00237pmc: PMC6039760pubmed: 30023001google scholar: lookup
  41. Schaefer RJ, Schubert M, Bailey E, Bannasch DL, Barrey E, Bar-Gal GK, Brem G, Brooks SA, Distl O, Fries R, Finno CJ, Gerber V, Haase B, Jagannathan V, Kalbfleisch T, Leeb T, Lindgren G, Lopes MS, Mach N, da Câmara Machado A, MacLeod JN, McCoy A, Metzger J, Penedo C, Polani S, Rieder S, Tammen I, Tetens J, Thaller G, Verini-Supplizi A, Wade CM, Wallner B, Orlando L, Mickelson JR, McCue ME. Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds.. BMC Genomics 2017 Jul 27;18(1):565.
    doi: 10.1186/s12864-017-3943-8pmc: PMC5530493pubmed: 28750625google scholar: lookup
  42. Rumelhart DE, Hinton GE, Williams RJ. Learning representation by back-propagation errors. Nature 1986.
    doi: 10.1038/323533a0google scholar: lookup
  43. Cilimkovic M. Neural networks and back propagation algorithm. Institute of Technology Blanchardstown, Blanchardstown Road North Dublin15 (2015).
  44. Stefan Fritsch, Guenther F. neuralnet: Training of Neural Networks. https://journal.r-project.org/archive/2010/RJ-2010-006/index.html (2016).
  45. Beck MW. NeuralNetTools: Visualization and Analysis Tools for Neural Networks.. J Stat Softw 2018;85(11):1-20.
    pmc: PMC6262849pubmed: 30505247doi: 10.18637/jss.v085.i11google scholar: lookup
  46. R. Core, T.. R: A Language and Environment for Statistical Computing. https://www.R-project.org/ (2017).
  47. Garson GD. Interpreting neural-network connection weights. AI Expert 1991;6:46–51.
  48. Goh ATC. Back-propagation neural networks for modeling complex systems. Artif. Intell. Eng. 1995;9:143–151.
  49. Olden JD, Jackson DA. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 2002;154:135–150.
  50. Sheela KG, Deepa SN. Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013;2013:11.
    doi: 10.1155/2013/425740google scholar: lookup
  51. Rannala B, Mountain JL. Detecting immigration by using multilocus genotypes.. Proc Natl Acad Sci U S A 1997 Aug 19;94(17):9197-201.
    doi: 10.1073/pnas.94.17.9197pmc: PMC23111pubmed: 9256459google scholar: lookup
  52. Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multilocus genotypes to select or exclude populations as origins of individuals.. Genetics 1999 Dec;153(4):1989-2000.
    pmc: PMC1460843pubmed: 10581301doi: 10.1093/genetics/153.4.1989google scholar: lookup

Citations

This article has been cited 3 times.
  1. Li C, Xu S, Li D, Hu X, Jia B. A Boruta-SMOTE Integrated Approach for Rapid Donkey Breed Classification Using SNP Data: Addressing High-Dimensionality and Small Sample Challenges. Biochem Genet 2026 Jan 14;.
    doi: 10.1007/s10528-025-11316-8pubmed: 41533190google scholar: lookup
  2. Kanaka KK, Ganguly I, Singh S, Kuralkar SV, Dixit S, Sukhija N, Goli RC. RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds. Biochem Genet 2025 Aug 22;.
    doi: 10.1007/s10528-025-11230-zpubmed: 40844696google scholar: lookup
  3. Degen B, Yanbaev Y, Müller NA. Machine learning techniques for continuous genetic assignment of geographic origin of forest trees. PLoS One 2025;20(6):e0324994.
    doi: 10.1371/journal.pone.0324994pubmed: 40478860google scholar: lookup