Analyze Diet
Journal of applied genetics2019; 60(2); 187-198; doi: 10.1007/s13353-019-00495-x

Comparing assignment-based approaches to breed identification within a large set of horses.

Abstract: Considering the extensive data sets and statistical techniques, animal breeding embodies a branch of machine learning that has a constantly increasing impact on breeding. In our study, information regarding the potential of machine learning and data mining within a large set of horses and breeds is presented. The individual assignment methods and factors influencing the success rate of the procedure are compared at the Czech population scale. The fixation index values ranged from 0.057 (HMS1) to 0.144 (HTG6), and the overall genetic differentiation amounted to 8.9% among the breeds. The highest genetic divergence (FST = 0.378) was established between the Friesian and Equus przewalskii; the highest degree of gene migration was obtained between the Czech and Bavarian Warmblood (Nm = 14,302); and the overall global heterozygote deficit across the populations was 10.4%. The eight standard methods (Bayesian, frequency, and distance) using GeneClass software and almost all mainstream classification algorithms (Bayes Net, Naive Bayes, IB1, IB5, KStar, JRip, J48, Random Forest, Random Tree, PART, MLP, and SVM) from the WEKA machine learning workbench were compared by utilizing 314,874 real allelic data sets. The Bayesian method (GeneClass, 89.9%) and Bayesian network algorithm (WEKA, 84.8%) outperformed the other techniques. The breed genomic prediction accuracy reached the highest value in the cold-blooded horses. The overall proportion of individuals correctly assigned to a population depended mainly on the breed number and genetic divergence. These statistical tools could be used to assess breed traceability systems, and they exhibit the potential to assist managers in decision-making as regards breeding and registration.
Publication Date: 2019-04-08 PubMed ID: 30963515DOI: 10.1007/s13353-019-00495-xGoogle Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
  • Journal Article

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

The research articles focuses on the potential of machine learning and data mining in breed identification among a large population of horses. It involves a comparative analysis of various methods for assignment and determination of factors affecting the success rate of these methods at the Czech population scale.

Objective of the Study

  • The study aims to assess how machine learning and data mining could be employed in breed identification among a large population of horses.
  • It seeks to compare different methods for assignment and establish the factors that affect the success rate of these methods in the context of the Czech horse population.

Methodology

  • The researchers used eight standard methods such as Bayesian, frequency, and distance approaches utilizing GeneClass software. They also used numerous classification algorithms like Bayes Net, Naive Bayes, IB1, IB5, KStar, JRip, J48, Random Forest, Random Tree, PART, MLP, and SVM from the WEKA machine learning workbench.
  • The comparison was done using 314,874 real allelic data sets.

Findings

  • The researchers found variations in fixation index values which ranged from 0.057 (HMS1) to 0.144 (HTG6). They also observed that the overall genetic differentiation among the breeds was 8.9%
  • They discovered the highest genetic divergence between the Friesian and Equus przewalskii breeds, noting a figure of 0.378. On the other hand, they noticed the highest degree of gene migration between the Czech and Bavarian Warmblood breeds, which was indicated by a figure of 14,302.
  • They found the overall global heterozygote deficit across the populations to be 10.4%.
  • Of all the methods and algorithms tested, the Bayesian method (GeneClass, 89.9%) and Bayesian network algorithm (WEKA, 84.8%) outperformed the others. The accuracy of breed genomic prediction was highest in the cold-blooded horses.
  • The overall proportion of individuals correctly assigned to a population mainly depended on the breed number and genetic divergence.

Implications of the Study

  • The findings of the study show that the statistical tools compared in the research can be used to assess breed traceability systems. They also have potential applications in assisting managers in making decisions regarding breeding and registration.
  • The study might pave the way for future use of machine learning and data mining in animal breeding, thereby revolutionizing agricultural practices and enhancing productivity.

Cite This Article

APA
Putnová L, Štohl R. (2019). Comparing assignment-based approaches to breed identification within a large set of horses. J Appl Genet, 60(2), 187-198. https://doi.org/10.1007/s13353-019-00495-x

Publication

ISSN: 2190-3883
NlmUniqueID: 9514582
Country: England
Language: English
Volume: 60
Issue: 2
Pages: 187-198

Researcher Affiliations

Putnová, Lenka
  • Laboratory of Agrogenomics, Department of Morphology, Physiology and Animal Genetics, Faculty of Agronomy, Mendel University in Brno, Zemědělská 1665/1, 613 00, Brno, Czech Republic. putnova@email.cz.
Štohl, Radek
  • Department of Control and Instrumentation, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technická 3082/12, 616 00, Brno, Czech Republic.

MeSH Terms

  • Algorithms
  • Alleles
  • Animals
  • Breeding
  • Gene Frequency
  • Genetic Variation
  • Genomics
  • Genotype
  • Heterozygote
  • Horses / classification
  • Horses / genetics
  • Microsatellite Repeats / genetics
  • Software
  • Species Specificity

Grant Funding

  • QH92277 / Národní Agentura pro Zemědělsk Vzkum
  • LO1210 / Ministerstvo Školství, Mládeže a Tělovýchovy
  • 2108 / Mendelova Univerzita v Brně

References

This article includes 22 references
  1. Genetics. 1999 Dec;153(4):1989-2000
    pubmed: 10581301
  2. Anim Genet. 2002 Aug;33(4):264-70
    pubmed: 12139505
  3. Anim Genet. 2003 Aug;34(4):297-301
    pubmed: 12873219
  4. J Hered. 2004 Nov-Dec;95(6):536-9
    pubmed: 15475402
  5. Bioinformatics. 2005 May 1;21(9):2128-9
    pubmed: 15705655
  6. Mol Ecol. 2006 Oct;15(11):3157-73
    pubmed: 16968262
  7. Mol Ecol. 2007 Mar;16(5):1099-106
    pubmed: 17305863
  8. Mol Ecol Resour. 2008 Jan;8(1):103-6
    pubmed: 21585727
  9. Anim Genet. 2011 Dec;42(6):627-33
    pubmed: 22035004
  10. Meat Sci. 2008 Oct;80(2):389-95
    pubmed: 22063344
  11. BMC Genet. 2013 Dec 09;14:118
    pubmed: 24320218
  12. Anim Genet. 2014 Dec;45(6):898-902
    pubmed: 25183434
  13. J Anim Breed Genet. 2017 Apr;134(2):85-86
    pubmed: 28297136
  14. Evolution. 1984 Nov;38(6):1358-1370
    pubmed: 28563791
  15. J Anim Breed Genet. 2018 Feb;135(1):73-83
    pubmed: 29345072
  16. Proc Natl Acad Sci U S A. 1973 Dec;70(12):3321-3
    pubmed: 4519626
  17. Am J Hum Genet. 1967 May;19(3 Pt 1):233-57
    pubmed: 6026583
  18. J Mol Evol. 1983;19(2):153-70
    pubmed: 6571220
  19. Proc Natl Acad Sci U S A. 1995 Jul 18;92(15):6723-7
    pubmed: 7624310
  20. Mol Ecol. 1995 Jun;4(3):347-54
    pubmed: 7663752
  21. Proc Natl Acad Sci U S A. 1997 Aug 19;94(17):9197-201
    pubmed: 9256459
  22. Anim Genet. 1997 Dec;28(6):397-400
    pubmed: 9616104

Citations

This article has been cited 7 times.
  1. Jafari H, Abebe BK, Cong L, Ahmed Z, Zhaofei W, Sun M, Muhatai G, Chuzhao L, Dang R. Review: Genomic insights into the adaptive traits and stress resistance in modern horses. Stress Biol 2026 Jan 12;6(1):5.
    doi: 10.1007/s44154-025-00274-1pubmed: 41521281google scholar: lookup
  2. Toky RFM, Sukhamsri S, Medhasi S, Budi T, Panthum T, Singchat W, Srikulnath K. High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework. Biology (Basel) 2025 Dec 22;15(1).
    doi: 10.3390/biology15010021pubmed: 41514862google scholar: lookup
  3. Liang H, He Y, Si J, Su X, Wang X, Mao H, Zhang Y. Machine learning-based discovery of informative SNPs for population assignment through whole genome sequencing. BMC Genomics 2025 Nov 18;26(1):1119.
    doi: 10.1186/s12864-025-12322-1pubmed: 41257573google scholar: lookup
  4. Zhang Z, Fang Z, Du Y, He Y, Qian C, Ye W, Zhang N, Zhang J, Ding X. A deep learning strategy for accurate identification of purebred and hybrid pigs across SNP chips. J Anim Sci Biotechnol 2025 Aug 14;16(1):116.
    doi: 10.1186/s40104-025-01249-ypubmed: 40813701google scholar: lookup
  5. Jasielczuk I, Gurgul A, Szmatoła T, Radko A, Majewska A, Sosin E, Litwińczuk Z, Rubiś D, Ząbek T. The use of SNP markers for cattle breed identification. J Appl Genet 2024 Sep;65(3):575-589.
    doi: 10.1007/s13353-024-00857-0pubmed: 38568414google scholar: lookup
  6. Reinoso-Peláez EL, Gianola D, González-Recio O. Genome-Enabled Prediction Methods Based on Machine Learning. Methods Mol Biol 2022;2467:189-218.
    doi: 10.1007/978-1-0716-2205-6_7pubmed: 35451777google scholar: lookup
  7. Askarov A, Kuznetsova A, Gusmanov R, Askarova A, Kovshov V. Cost-effective horse breeding in the Republic of Bashkortostan, Russia. Vet World 2020 Oct;13(10):2039-2045.