Analyze Diet
Genetics, selection, evolution : GSE2016; 48; 13; doi: 10.1186/s12711-016-0192-2

Genomic prediction of unordered categorical traits: an application to subpopulation assignment in German Warmblood horses.

Abstract: Categorical traits without ordinal representation of classes do not qualify for threshold models. Alternatively, the multinomial problem can be assessed by a sequence of independent binary contrasts using schemes such as one-vs-all or one-vs-one. Class probabilities can be arrived at by normalization or pair-wise coupling strategies. We assessed the predictive ability of whole-genome regression models and support vector machines for the classification of horses into four German Warmblood breeds. Results: Prediction accuracies of leave-one-out cross-validation were high and ranged from 0.75 to 0.97 depending on the binary classifier and breeds incorporated in the training. An analysis of the population structure using eigenvectors of the genomic relationship matrix revealed clustering of individuals beyond the given breed labels. Admixture between two breeds became apparent which had substantial impact on the prediction accuracies between those two breeds and also influenced the contrasts between other breeds. Conclusions: Genomic prediction of unordered categorical traits was successfully applied to subpopulation assignment of German Warmblood horses. The applied methodology is a straightforward extension of existing binary threshold models for genomic prediction.
Publication Date: 2016-02-11 PubMed ID: 26867647PubMed Central: PMC4751658DOI: 10.1186/s12711-016-0192-2Google Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
  • Journal Article
  • Research Support
  • Non-U.S. Gov't

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

The study explores the use of genomic prediction of unordered categorical traits to successfully assign German Warmblood horses into subpopulations. The method employed can be seen as a straightforward extension of existing binary threshold models for the same.

Introduction and Methodology

  • The research presents an examination of categorical traits that do not have an ordinal representation of classes, hence threshold models cannot be applied.
  • To navigate this issue, the authors propose an alternative, in the form of a sequence of independent contrasts using one-vs-all or one-vs-one schemes.
  • Class probabilities can be reached using normalization or pair-wise coupling strategies. The main focus of the study was to assess the predictive ability of whole-genome regression models and support vector machines for horse classification into four German Warmblood breeds.

Results

  • The prediction accuracies obtained through leave-one-out cross-validation were high, with a range of 0.75 to 0.97. These values were dependent on the breed involved in the training and the chosen binary classifier.
  • The researchers also performed an analysis of the population structure, using eigenvectors of the genomic relationship matrix, and found that the clustering of individuals extended beyond the given breed labels.
  • It was evident that an admixture, or genetic blend, between two breeds significantly impacted the prediction accuracies between those two breeds and influenced contrasts with other breeds.

Conclusions

  • The research managed to perform genomic prediction of unordered categorical traits and successfully apply it to the subpopulation assignment of German Warmblood horses.
  • The study thereby demonstrates a practical extension of the existing binary threshold models for genomic prediction—employing a series of independent binary contrasts for categorical traits that do not have ordinal representation.

Cite This Article

APA
Heuer C, Scheel C, Tetens J, Kühn C, Thaller G. (2016). Genomic prediction of unordered categorical traits: an application to subpopulation assignment in German Warmblood horses. Genet Sel Evol, 48, 13. https://doi.org/10.1186/s12711-016-0192-2

Publication

ISSN: 1297-9686
NlmUniqueID: 9114088
Country: France
Language: English
Volume: 48
Pages: 13
PII: 13

Researcher Affiliations

Heuer, Claas
  • Institute of Animal Breeding and Husbandry, University of Kiel, Hermann-Rodewald-Strasse 6, 24098, Kiel, Germany. cheuer@tierzucht.uni-kiel.de.
Scheel, Christoph
  • Institute of Animal Breeding and Husbandry, University of Kiel, Hermann-Rodewald-Strasse 6, 24098, Kiel, Germany. cscheel@tierzucht.uni-kiel.de.
Tetens, Jens
  • Institute of Animal Breeding and Husbandry, University of Kiel, Hermann-Rodewald-Strasse 6, 24098, Kiel, Germany. jtetens@tierzucht.uni-kiel.de.
Kühn, Christa
  • Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee 2, 18196, Dummerstorf, Germany. kuehn@fbn-dummerstorf.de.
  • Faculty of Agricultural and Environmental Sciences, University Rostock, Justus-von-Liebig-Weg 6, 18059, Rostock, Germany. kuehn@fbn-dummerstorf.de.
Thaller, Georg
  • Institute of Animal Breeding and Husbandry, University of Kiel, Hermann-Rodewald-Strasse 6, 24098, Kiel, Germany. gthaller@tierzucht.uni-kiel.de.

MeSH Terms

  • Animals
  • Bayes Theorem
  • Breeding
  • Genetics, Population
  • Genome
  • Genomics / methods
  • Genotype
  • Germany
  • Horses / genetics
  • Models, Genetic
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Quantitative Trait, Heritable
  • Regression Analysis
  • Support Vector Machine

References

This article includes 45 references
  1. WRIGHT S. The genetical structure of populations.. Ann Eugen 1951 Mar;15(4):323-54.
  2. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Natural selection has driven population differentiation in modern humans.. Nat Genet 2008 Mar;40(3):340-5.
    doi: 10.1038/ng.78pubmed: 18246066google scholar: lookup
  3. Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look into the black box of genomic prediction.. Genetics 2013 Jul;194(3):597-607.
    doi: 10.1534/genetics.113.152207pmc: PMC3697966pubmed: 23640517google scholar: lookup
  4. Technow F, Bürger A, Melchinger AE. Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups.. G3 (Bethesda) 2013 Feb;3(2):197-203.
    doi: 10.1534/g3.112.004630pmc: PMC3564980pubmed: 23390596google scholar: lookup
  5. Kizilkaya K, Tait RG, Garrick DJ, Fernando RL, Reecy JM. Whole genome analysis of infectious bovine keratoconjunctivitis in Angus cattle using Bayesian threshold models.. BMC Proc 2011 Jun 3;5 Suppl 4(Suppl 4):S22.
    doi: 10.1186/1753-6561-5-S4-S22pmc: PMC3108217pubmed: 21645302google scholar: lookup
  6. Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, de Los Campos G, Eskridge K, Crossa J. Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding.. G3 (Bethesda) 2014 Dec 23;5(2):291-300.
    doi: 10.1534/g3.114.016188pmc: PMC4321037pubmed: 25538102google scholar: lookup
  7. Biscarini F, Stevanato P, Broccanello C, Stella A, Saccomani M. Genome-enabled predictions for binomial traits in sugar beet populations.. BMC Genet 2014 Jul 22;15:87.
    doi: 10.1186/1471-2156-15-87pmc: PMC4113669pubmed: 25053450google scholar: lookup
  8. Tf Wu, Lin CJ, Weng RC. Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 2003;5:975–1005.
  9. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps.. Genetics 2001 Apr;157(4):1819-29.
    pmc: PMC1461589pubmed: 11290733doi: 10.1093/genetics/157.4.1819google scholar: lookup
  10. Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the Bayesian alphabet.. Genetics 2009 Sep;183(1):347-63.
    doi: 10.1534/genetics.109.103952pmc: PMC2746159pubmed: 19620397google scholar: lookup
  11. Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. COLT ’92. New York; 1992. p. 144–152.
  12. de Los Campos G, Gianola D, Rosa GJ. Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation.. J Anim Sci 2009 Jun;87(6):1883-7.
    doi: 10.2527/jas.2008-1259pubmed: 19213705google scholar: lookup
  13. Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1.
  14. Cheng W, Hüllermeier E. Probability estimation for multi-class classification based on label ranking. In: Bie TD, Cristianini N, Flach PA, editors. Machine learning and knowledge discovery in databases. Lecture notes in computer science. Berlin: Springer Verlag; 2012. pp. 83–98.
  15. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures.. Epidemiology 2010 Jan;21(1):128-38.
  16. Teegen R, Edel C, Thaller G. Population structure of the Trakehner Horse breed.. Animal 2009 Jan;3(1):6-15.
    doi: 10.1017/S1751731108003273pubmed: 22444167google scholar: lookup
  17. Roos L, Hinrichs D, Nissen T, Krieter J. Investigations into genetic variability in Holstein horse breed using pedigree data. Livest Sci 2015;177:25–32.
  18. Hamann H, Distl O. Genetic variability in Hanoverian warmblood horses using pedigree analysis.. J Anim Sci 2008 Jul;86(7):1503-13.
    doi: 10.2527/jas.2007-0382pubmed: 18310493google scholar: lookup
  19. Patterson N, Price AL, Reich D. Population structure and eigenanalysis.. PLoS Genet 2006 Dec;2(12):e190.
  20. VanRaden PM. Efficient methods to compute genomic predictions.. J Dairy Sci 2008 Nov;91(11):4414-23.
    doi: 10.3168/jds.2007-0980pubmed: 18946147google scholar: lookup
  21. Janss L, de Los Campos G, Sheehan N, Sorensen D. Inferences from genomic models in stratified populations.. Genetics 2012 Oct;192(2):693-704.
    doi: 10.1534/genetics.112.141143pmc: PMC3454890pubmed: 22813891google scholar: lookup
  22. Kuo L, Mallick B. Variable selection for regression models. Sankhya Ser B 1998;60:65–81.
  23. Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection.. BMC Bioinformatics 2011 May 23;12:186.
    doi: 10.1186/1471-2105-12-186pmc: PMC3144464pubmed: 21605355google scholar: lookup
  24. Fernando RL, Garrick D. Bayesian methods applied to GWAS. In: Gondro C, van der Werf J, Hayes B, editors. Genome-wide association studies and genomic prediction. Methods Mol Biol; 2013. p. 237–274.
  25. Fernando RL, Toosi A, Garrick DJ, Dekkers JCM. Application of whole-genome prediction methods for genome-wide association studies: a Bayesian approach. In: Proceedings of the 10th World Congress of Genetics Applied to Livestock Production. Vancouver; 2014.
  26. Holsinger KE, Weir BS. Genetics in geographically structured populations: defining, estimating and interpreting F(ST).. Nat Rev Genet 2009 Sep;10(9):639-50.
    doi: 10.1038/nrg2611pmc: PMC4687486pubmed: 19687804google scholar: lookup
  27. Forneris NS, Legarra A, Vitezica ZG, Tsuruta S, Aguilar I, Misztal I, Cantet RJ. Quality control of genotypes using heritability estimates of gene content at the marker.. Genetics 2015 Mar;199(3):675-81.
    doi: 10.1534/genetics.114.173559pmc: PMC4349063pubmed: 25567991google scholar: lookup
  28. Gengler N, Mayeres P, Szydlowski M. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle.. Animal 2007 Feb;1(1):21-8.
    doi: 10.1017/S1751731107392628pubmed: 22444206google scholar: lookup
  29. R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014.
  30. Pérez P, de los Campos G. Genome-wide regression and prediction with the BGLR statistical package.. Genetics 2014 Oct;198(2):483-95.
    doi: 10.1534/genetics.114.164442pmc: PMC4196607pubmed: 25009151google scholar: lookup
  31. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, Chih-Chung C. e1071: Misc functions of the Department of Statistics (e1071), TU Wien. 2014.
  32. Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011;2:27:1–27:27.
    doi: 10.1145/1961189.1961199google scholar: lookup
  33. Gilmour A. ASReml User Guide. Release 3.0; 2008.
  34. Plummer M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. 2003.
  35. Plummer M, Stukalov A. rjags: Bayesian graphical models using MCMC. 2014.
  36. Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: variational inference of population structure in large SNP data sets.. Genetics 2014 Jun;197(2):573-89.
    doi: 10.1534/genetics.114.164350pmc: PMC4063916pubmed: 24700103google scholar: lookup
  37. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data.. Genetics 2000 Jun;155(2):945-59.
    pmc: PMC1461096pubmed: 10835412doi: 10.1093/genetics/155.2.945google scholar: lookup
  38. Meuwissen TH, Odegard J, Andersen-Ranberg I, Grindflek E. On the distance of genetic relationships and the accuracy of genomic prediction in pig breeding.. Genet Sel Evol 2014 Aug 1;46(1):49.
    doi: 10.1186/1297-9686-46-49pmc: PMC4237822pubmed: 25158793google scholar: lookup
  39. Gianola D. Priors in whole-genome regression: the bayesian alphabet returns.. Genetics 2013 Jul;194(3):573-96.
    doi: 10.1534/genetics.113.151753pmc: PMC3697965pubmed: 23636739google scholar: lookup
  40. Wallace BC, Dahabreh IJ. Class probability estimates ere unreliable for imbalanced data (and how to fix them). In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining. ICDM ’12. Washington; 2012. p. 695–704.
  41. Zhang X, Misztal I, Heidaritabar M, Bastiaansen JWM, Borg R, Okimoto R. Prior genetic architecture impacting genomic regions under selection: an example using genomic selection in two poultry breeds. Livest Sci 2015;171:1–11.
  42. Leinonen T, McCairns RJ, O'Hara RB, Merilä J. Q(ST)-F(ST) comparisons: evolutionary and ecological insights from genomic heterogeneity.. Nat Rev Genet 2013 Mar;14(3):179-90.
    doi: 10.1038/nrg3395pubmed: 23381120google scholar: lookup
  43. de los Campos G, Sorensen D. On the genomic analysis of data from structured populations.. J Anim Breed Genet 2014 Jun;131(3):163-4.
    doi: 10.1111/jbg.12091pmc: PMC4460790pubmed: 24838115google scholar: lookup
  44. Congdon P. Bayesian models for categorical data. Wiley series in probability and statistics. New York: Wiley; 2005.
  45. Götz KU, Thaller G. Assignment of individuals to populations using microsatellites. J Anim Breed Genet 1998;115:53–61.

Citations

This article has been cited 6 times.
  1. Lindsay-McGee V, Sanchez-Molano E, Banos G, Clark EL, Piercy RJ, Psifidi A. Genetic characterisation of the Connemara pony and the Warmblood horse using a within-breed clustering approach.. Genet Sel Evol 2023 Aug 17;55(1):60.
    doi: 10.1186/s12711-023-00827-wpubmed: 37592264google scholar: lookup
  2. Reinoso-Peláez EL, Gianola D, González-Recio O. Genome-Enabled Prediction Methods Based on Machine Learning.. Methods Mol Biol 2022;2467:189-218.
    doi: 10.1007/978-1-0716-2205-6_7pubmed: 35451777google scholar: lookup
  3. Jiang Y, Weise S, Graner A, Reif JC. Using Genome-Wide Predictions to Assess the Phenotypic Variation of a Barley (Hordeum sp.) Gene Bank Collection for Important Agronomic Traits and Passport Information.. Front Plant Sci 2020;11:604781.
    doi: 10.3389/fpls.2020.604781pubmed: 33505414google scholar: lookup
  4. Nolte W, Thaller G, Kuehn C. Selection signatures in four German warmblood horse breeds: Tracing breeding history in the modern sport horse.. PLoS One 2019;14(4):e0215913.
    doi: 10.1371/journal.pone.0215913pubmed: 31022261google scholar: lookup
  5. Chen T, Brewster P, Tuttle KR, Dworkin LD, Henrich W, Greco BA, Steffes M, Tobe S, Jamerson K, Pencina K, Massaro JM, D'Agostino RB Sr, Cutlip DE, Murphy TP, Cooper CJ, Shapiro JI. Prediction of cardiovascular outcomes with machine learning techniques: application to the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study.. Int J Nephrol Renovasc Dis 2019;12:49-58.
    doi: 10.2147/IJNRD.S194727pubmed: 30962703google scholar: lookup
  6. Montesinos-López OA, Montesinos-López A, Luna-Vázquez FJ, Toledo FH, Pérez-Rodríguez P, Lillemo M, Crossa J. An R Package for Bayesian Analysis of Multi-environment and Multi-trait Multi-environment Data for Genome-Based Prediction.. G3 (Bethesda) 2019 May 7;9(5):1355-1369.
    doi: 10.1534/g3.119.400126pubmed: 30819822google scholar: lookup