Scientific reports2023; 13(1); 11592; doi: 10.1038/s41598-023-38601-z

Detecting SNP markers discriminating horse breeds by deep learning.

Manzoori, S
·
Farahani, AHK
·
Moradi, MH
·
Kazemi-Bonchenari, M

Abstract: The assignment of an individual to the true population of origin using a low-panel of discriminant SNP markers is one of the most important applications of genomic data for practical use. The aim of this study was to evaluate the potential of different Artificial Neural Networks (ANNs) approaches consisting Deep Neural Networks (DNN), Garson and Olden methods for feature selection of informative SNP markers from high-throughput genotyping data, that would be able to trace the true breed of unknown samples. The total of 795 animals from 37 breeds, genotyped by using the Illumina SNP 50k Bead chip were used in the current study and principal component analysis (PCA), log-likelihood ratios (LLR) and Neighbor-Joining (NJ) were applied to assess the performance of different assignment methods. The results revealed that the DNN, Garson, and Olden methods are able to assign individuals to true populations with 4270, 4937, and 7999 SNP markers, respectively. The PCA was used to determine how the animals allocated to the groups using all genotyped markers available on 50k Bead chip and the subset of SNP markers identified with different methods. The results indicated that all SNP panels are able to assign individuals into their true breeds. The success percentage of genetic assignment for different methods assessed by different levels of LLR showed that the success rate of 70% in the analysis was obtained by three methods with the number of markers of 110, 208, and 178 tags for DNN, Garson, and Olden methods, respectively. Also the results showed that DNN performed better than other two approaches by achieving 93% accuracy at the most stringent threshold. Finally, the identified SNPs were successfully used in independent out-group breeds consisting 120 individuals from eight breeds and the results indicated that these markers are able to correctly allocate all unknown samples to true population of origin. Furthermore, the NJ tree of allele-sharing distances on the validation dataset showed that the DNN has a high potential for feature selection. In general, the results of this study indicated that the DNN technique represents an efficient strategy for selecting a reduced pool of highly discriminant markers for assigning individuals to the true population of origin.

Get Full Text

Publication Date: 2023-07-18 PubMed ID: 37464049PubMed Central: PMC10354035DOI: 10.1038/s41598-023-38601-zGoogle Scholar: Lookup

The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.

Facebook X LinkedIn Email Copy

Summary
Cite This
Publication
Affiliations
MeSH
Conflict of Interest
References
Citations

Journal Article

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

This research study seeks to accurately identify the breed of horses through the use of genomic data and machine learning models like Deep Neural Networks. The studies offers details on the efficiency of these models in selecting relevant SNP markers that can be used for precise breed identification.

Study Objective and Data

The goal of this research was to assess the capabilities of various Artificial Neural Networks (ANNs), specifically Deep Neural Networks (DNN), Garson and Olden methods, in selecting relevant SNP (Single Nucleotide Polymorphisms) markers from comprehensive genotyping data.
A total of 795 horses from 37 different breeds were genotyped using the Illumina SNP 50k Bead chip for this investigation.

Implementation and Results

The researchers applied Principal Component Analysis (PCA), log-likelihood ratios (LLR), and Neighbor-Joining (NJ) to examine the efficiency of the different assignment methods.
These methods showed that there are a possible 4270, 4937, and 7999 SNP markers for the DNN, Garson, and Olden methods respectively that could accurately assign individuals to true populations.
The effectiveness of genetic assignment for different methods was gauged at different LLR levels, with a 70% success rate achieved with SNP markers of 110, 208, and 178 for the DNN, Garson, and Olden methods respectively.
The DNN method outperformed the others, achieving a 93% accuracy rate at the most stringent threshold.

Conclusion and Further Applications

The discovered SNPs were successfully applied to independent out-group breeds consisting of 120 individuals from eight breeds, effectively confirming their potential in correctly allocating all unknown samples to their true origin population.
From the validation dataset, it was also found that the DNN model has a high potential for feature selection as highlighted in the NJ tree of allele-sharing distances.
In conclusion, the study showed that the DNN technique serves as an efficient strategy in selecting a reduced highly discriminant set of markers for assigning individuals to their correct population of origin.

Cite This Article

APA

Manzoori S, Farahani AHK, Moradi MH, Kazemi-Bonchenari M. (2023). Detecting SNP markers discriminating horse breeds by deep learning. Sci Rep, 13(1), 11592. https://doi.org/10.1038/s41598-023-38601-z

Publication

Scientific reports

ISSN: 2045-2322

NlmUniqueID: 101563288

Country: England

Language: English

Volume: 13

Issue: 1

Pages: 11592

Researcher Affiliations

Manzoori, Siavash

Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.

Farahani, Amir Hossein Khaltabadi

Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran. a-farahani@araku.ac.ir.

Moradi, Mohammad Hossein

Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.

Kazemi-Bonchenari, Mehdi

Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.

MeSH Terms

Horses / genetics
Animals
Polymorphism, Single Nucleotide
Deep Learning
Plant Breeding
Genotype
Alleles

Conflict of Interest Statement

The authors declare no competing interests.

References

This article includes 52 references

Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA.. Genomics 2016 Jan;107(1):1-8.
doi: 10.1016/j.ygeno.2015.11.003pmc: PMC4727787pubmed: 26554401google scholar: lookup
Dimauro C. Selection of discriminant SNP markers for breed and geographic assignment of Italian sheep. Small Ruminant Res 2015;128:27–33.
doi: 10.1016/j.smallrumres.2015.05.001google scholar: lookup
Ganal MW, Altmann T, Röder MS. SNP identification in crop plants.. Curr Opin Plant Biol 2009 Apr;12(2):211-7.
doi: 10.1016/j.pbi.2008.12.009pubmed: 19186095google scholar: lookup
Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, Drineas P. PCA-correlated SNPs for structure identification in worldwide human populations.. PLoS Genet 2007 Sep;3(9):1672-86.
pmc: PMC1988848pubmed: 17892327doi: 10.1371/journal.pgen.0030160google scholar: lookup
Gautier M, Flori L, Riebler A, Jaffrézic F, Laloé D, Gut I, Moazami-Goudarzi K, Foulley JL. A whole genome Bayesian scan for adaptive genetic divergence in West African cattle.. BMC Genomics 2009 Nov 21;10:550.
doi: 10.1186/1471-2164-10-550pmc: PMC2784811pubmed: 19930592google scholar: lookup
Dimauro C, Cellesi M, Steri R, Gaspa G, Sorbolini S, Stella A, Macciotta NP. Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes.. Anim Genet 2013 Aug;44(4):377-82.
doi: 10.1111/age.12021pubmed: 23347105google scholar: lookup
Petersen JL, Mickelson JR, Cothran EG, Andersson LS, Axelsson J, Bailey E, Bannasch D, Binns MM, Borges AS, Brama P, da Câmara Machado A, Distl O, Felicetti M, Fox-Clipsham L, Graves KT, Guérin G, Haase B, Hasegawa T, Hemmann K, Hill EW, Leeb T, Lindgren G, Lohi H, Lopes MS, McGivney BA, Mikko S, Orr N, Penedo MC, Piercy RJ, Raekallio M, Rieder S, Røed KH, Silvestrelli M, Swinburne J, Tozaki T, Vaudin M, M Wade C, McCue ME. Genetic diversity in the modern horse illustrated from genome-wide SNP data.. PLoS One 2013;8(1):e54997.
pmc: PMC3559798pubmed: 23383025doi: 10.1371/journal.pone.0054997google scholar: lookup
Boutorh A, Guessoum A. Complex diseases SNP selection and classification by hybrid association rule mining and artificial neural network-based evolutionary algorithms. Eng. Appl. Artif. Intell. 2016;51:58–70.
doi: 10.1016/j.engappai.2016.01.004google scholar: lookup
Lewis J, Abas Z, Dadousis C, Lykidis D, Paschou P, Drineas P. Tracing cattle breeds with principal components analysis ancestry informative SNPs.. PLoS One 2011 Apr 7;6(4):e18007.
pmc: PMC3072384pubmed: 21490966doi: 10.1371/journal.pone.0018007google scholar: lookup
Meenachi L, Ramakrishnan S. Metaheuristic search based feature selection methods for classification of cancer. Pattern Recogn. 2021;119:108079.
doi: 10.1016/j.patcog.2021.108079google scholar: lookup
Paul D, Saha S, Mathew J. Fusion of evolvable genome structure and multi-objective optimization for subspace clustering. Pattern Recogn. 2019;95:58–71.
doi: 10.1016/j.patcog.2019.05.033google scholar: lookup
He, J. & Zelikovsky, A. In The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2840–2843 (IEEE).
Arbib MA. The Handbook of Brain Theory and Neural Networks. MIT press, 2003.
Dean J. Large scale distributed deep networks. Advances in Neural Information Processing Systems 25 (2012).
Min S, Lee B, Yoon S. Deep learning in bioinformatics.. Brief Bioinform 2017 Sep 1;18(5):851-869.
pubmed: 27473064doi: 10.1093/bib/bbw068google scholar: lookup
Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning.. Bioinformatics 2018 Mar 1;34(5):760-769.
doi: 10.1093/bioinformatics/btx680pmc: PMC6030869pubmed: 29069344google scholar: lookup
Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning.. Bioinformatics 2019 Aug 15;35(16):2766-2773.
doi: 10.1093/bioinformatics/bty1051pmc: PMC6691328pubmed: 30601936google scholar: lookup
Di Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction.. Bioinformatics 2012 Oct 1;28(19):2449-57.
doi: 10.1093/bioinformatics/bts475pmc: PMC3463120pubmed: 22847931google scholar: lookup
Sanzogni L, Kerr D. Milk production estimates using feed forward artificial neural networks. Comput. Electron. Agric. 2001;32:21–30.
doi: 10.1016/S0168-1699(01)00151-Xgoogle scholar: lookup
Torres M, Hervás C, Amador F. Approximating the sheep milk production curve through the use of artificial neural networks and genetic algorithms. Comput. Oper. Res. 2005;32:2653–2670.
doi: 10.1016/j.cor.2004.06.025google scholar: lookup
Fernández C, Soria E, Martin J, Serrano AJ. Neural networks for animal science applications: Two case studies. Expert Syst. Appl. 2006;31:444–450.
doi: 10.1016/j.eswa.2005.09.086google scholar: lookup
Ince D, Sofu A. Estimation of lactation milk yield of Awassi sheep with artificial neural network modeling. Small Ruminant Res. 2013;113:15–19.
doi: 10.1016/j.smallrumres.2013.01.013google scholar: lookup
Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 2003;160:249–264.
doi: 10.1016/S0304-3800(02)00257-0google scholar: lookup
Olden JD, Joy MK, Death RG. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol. Model. 2004;178:389–397.
doi: 10.1016/j.ecolmodel.2004.03.013google scholar: lookup
Ibrahim O. A comparison of methods for assessing the relative importance of input variables in artificial neural networks. J. Appl. Sci. Res. 2013;9:5692–5700.
Fischer A. How to determine the unique contributions of input-variables to the nonlinear regression function of a multilayer perceptron. Ecol. Model. 2015;309:60–63.
doi: 10.1016/j.ecolmodel.2015.04.015google scholar: lookup
Kemp SJ, Zaradic P, Hansen F. An approach for determining relative input parameter importance and significance in artificial neural networks. Ecol. Model. 2007;204:326–334.
doi: 10.1016/j.ecolmodel.2007.01.009google scholar: lookup
Paliwal M, Kumar UA. Assessing the contribution of variables in feed forward neural network. Appl. Soft Comput. 2011;11:3690–3696.
doi: 10.1016/j.asoc.2011.01.040google scholar: lookup
De Oña J, Garrido C. Extracting the contribution of independent variables in neural network models: A new approach to handle instability. Neural Comput. Appl. 2014;25:859–869.
doi: 10.1007/s00521-014-1573-5google scholar: lookup
Ringnér M. What is principal component analysis?. Nat Biotechnol 2008 Mar;26(3):303-4.
doi: 10.1038/nbt0308-303pubmed: 18327243google scholar: lookup
Paetkau D, Calvert W, Stirling I, Strobeck C. Microsatellite analysis of population structure in Canadian polar bears.. Mol Ecol 1995 Jun;4(3):347-54.
doi: 10.1111/j.1365-294X.1995.tb00227.xpubmed: 7663752google scholar: lookup
Maudet C, Luikart G, Taberlet P. Genetic diversity and assignment tests among seven French cattle breeds based on microsatellite DNA analysis.. J Anim Sci 2002 Apr;80(4):942-50.
doi: 10.2527/2002.804942xpubmed: 12002331google scholar: lookup
Ciampolini R, Cetica V, Ciani E, Mazzanti E, Fosella X, Marroni F, Biagetti M, Sebastiani C, Papa P, Filippini G, Cianci D, Presciuttini S. Statistical analysis of individual assignment tests among four cattle breeds using fifteen STR loci.. J Anim Sci 2006 Jan;84(1):11-9.
doi: 10.2527/2006.84111xpubmed: 16361486google scholar: lookup
Negrini R, Nijman IJ, Milanesi E, Moazami-Goudarzi K, Williams JL, Erhardt G, Dunner S, Rodellar C, Valentini A, Bradley DG, Olsaker I, Kantanen J, Ajmone-Marsan P, Lenstra JA. Differentiation of European cattle by AFLP fingerprinting.. Anim Genet 2007 Feb;38(1):60-6.
doi: 10.1111/j.1365-2052.2007.01554.xpubmed: 17257190google scholar: lookup
Negrini R, Milanesi E, Colli L, Pellecchia M, Nicoloso L, Crepaldi P, Lenstra JA, Ajmone-Marsan P. Breed assignment of Italian cattle using biallelic AFLP markers.. Anim Genet 2007 Apr;38(2):147-53.
doi: 10.1111/j.1365-2052.2007.01573.xpubmed: 17326802google scholar: lookup
McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS. An assessment of population structure in eight breeds of cattle using a whole genome SNP panel.. BMC Genet 2008 May 20;9:37.
doi: 10.1186/1471-2156-9-37pmc: PMC2408608pubmed: 18492244google scholar: lookup
Negrini R, Nicoloso L, Crepaldi P, Milanesi E, Colli L, Chegdani F, Pariset L, Dunner S, Leveziel H, Williams JL, Ajmone Marsan P. Assessing SNP markers for assigning individuals to cattle populations.. Anim Genet 2009 Feb;40(1):18-26.
doi: 10.1111/j.1365-2052.2008.01800.xpubmed: 19016674google scholar: lookup
Wilkinson S, Wiener P, Archibald AL, Law A, Schnabel RD, McKay SD, Taylor JF, Ogden R. Evaluation of approaches for identifying population informative markers from high density SNP chips.. BMC Genet 2011 May 13;12:45.
doi: 10.1186/1471-2156-12-45pmc: PMC3118130pubmed: 21569514google scholar: lookup
Milne, L. In AI-Conference 571–571 (World Scientific Publishing).
Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.. Front Genet 2018;9:237.
doi: 10.3389/fgene.2018.00237pmc: PMC6039760pubmed: 30023001google scholar: lookup
Schaefer RJ, Schubert M, Bailey E, Bannasch DL, Barrey E, Bar-Gal GK, Brem G, Brooks SA, Distl O, Fries R, Finno CJ, Gerber V, Haase B, Jagannathan V, Kalbfleisch T, Leeb T, Lindgren G, Lopes MS, Mach N, da Câmara Machado A, MacLeod JN, McCoy A, Metzger J, Penedo C, Polani S, Rieder S, Tammen I, Tetens J, Thaller G, Verini-Supplizi A, Wade CM, Wallner B, Orlando L, Mickelson JR, McCue ME. Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds.. BMC Genomics 2017 Jul 27;18(1):565.
doi: 10.1186/s12864-017-3943-8pmc: PMC5530493pubmed: 28750625google scholar: lookup
Rumelhart DE, Hinton GE, Williams RJ. Learning representation by back-propagation errors. Nature 1986.
doi: 10.1038/323533a0google scholar: lookup
Cilimkovic M. Neural networks and back propagation algorithm. Institute of Technology Blanchardstown, Blanchardstown Road North Dublin15 (2015).
Stefan Fritsch, Guenther F. neuralnet: Training of Neural Networks. https://journal.r-project.org/archive/2010/RJ-2010-006/index.html (2016).
Beck MW. NeuralNetTools: Visualization and Analysis Tools for Neural Networks.. J Stat Softw 2018;85(11):1-20.
pmc: PMC6262849pubmed: 30505247doi: 10.18637/jss.v085.i11google scholar: lookup
R. Core, T.. R: A Language and Environment for Statistical Computing. https://www.R-project.org/ (2017).
Garson GD. Interpreting neural-network connection weights. AI Expert 1991;6:46–51.
Goh ATC. Back-propagation neural networks for modeling complex systems. Artif. Intell. Eng. 1995;9:143–151.
doi: 10.1016/0954-1810(94)00011-Sgoogle scholar: lookup
Olden JD, Jackson DA. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 2002;154:135–150.
doi: 10.1016/S0304-3800(02)00064-9google scholar: lookup
Sheela KG, Deepa SN. Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013;2013:11.
doi: 10.1155/2013/425740google scholar: lookup
Rannala B, Mountain JL. Detecting immigration by using multilocus genotypes.. Proc Natl Acad Sci U S A 1997 Aug 19;94(17):9197-201.
doi: 10.1073/pnas.94.17.9197pmc: PMC23111pubmed: 9256459google scholar: lookup
Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multilocus genotypes to select or exclude populations as origins of individuals.. Genetics 1999 Dec;153(4):1989-2000.
pmc: PMC1460843pubmed: 10581301doi: 10.1093/genetics/153.4.1989google scholar: lookup

Citations

This article has been cited 3 times.

Li C, Xu S, Li D, Hu X, Jia B. A Boruta-SMOTE Integrated Approach for Rapid Donkey Breed Classification Using SNP Data: Addressing High-Dimensionality and Small Sample Challenges. Biochem Genet 2026 Jan 14;.
doi: 10.1007/s10528-025-11316-8pubmed: 41533190google scholar: lookup
Kanaka KK, Ganguly I, Singh S, Kuralkar SV, Dixit S, Sukhija N, Goli RC. RASEL: An Ensemble Model for Selection of Core SNPs and Its Application for Identification and Classification of Cattle Breeds. Biochem Genet 2025 Aug 22;.
doi: 10.1007/s10528-025-11230-zpubmed: 40844696google scholar: lookup
Degen B, Yanbaev Y, Müller NA. Machine learning techniques for continuous genetic assignment of geographic origin of forest trees. PLoS One 2025;20(6):e0324994.
doi: 10.1371/journal.pone.0324994pubmed: 40478860google scholar: lookup

Find The Right Product For Your Horse

Backed By Science

Nutrition Consult

We're Here to Help

My Horses

Detecting SNP markers discriminating horse breeds by deep learning.

Study Objective and Data

Implementation and Results

Conclusion and Further Applications

MeSH Terms

Citations

Detecting SNP markers discriminating horse breeds by deep learning.

Summary

Study Objective and Data

Implementation and Results

Conclusion and Further Applications

Cite This Article

Publication

Researcher Affiliations

MeSH Terms

Conflict of Interest Statement

References

Citations

Omneity® Pellets

Visceral+

w-3 Oil

Omneity® Pellets

Omneity® Premix

AminoTrace+

Visceral+

Optimum Digestive Health

Optimum Probiotic

Omneity® Pellets

AminoTrace+

Biotin

MSM

w-3 Oil

Biotin

w-3 Oil

Omneity® Pellets

3:1 Zinc Copper

AminoTrace+

Chasteberry

MagneChrome

Chasteberry

Magnesium Oxide

Visceral+

Visceral+

NOCR

Three Amigos

Three Amigos

Magnesium Oxide

Acetyl-L-Carnitine

NOCR

w-3 Oil

Spirulina