Modelling bluetongue and African horse sickness vector (Culicoides spp.) distribution in the Western Cape in South Africa using random forest machine learning.
Abstract: Culicoides biting midges exhibit a global spatial distribution and are the main vectors of several viruses of veterinary importance, including bluetongue (BT) and African horse sickness (AHS). Many environmental and anthropological factors contribute to their ability to live in a variety of habitats, which have the potential to change over the years as the climate changes. Therefore, as new habitats emerge, the risk for new introductions of these diseases of interest to occur increases. The aim of this study was to model distributions for two primary vectors for BT and AHS (Culicoides imicola and Culicoides bolitinos) using random forest (RF) machine learning and explore the relative importance of environmental and anthropological factors in a region of South Africa with frequent AHS and BT outbreaks. Methods: Culicoides capture data were collected between 1996 and 2022 across 171 different capture locations in the Western Cape. Predictor variables included climate-related variables (temperature, precipitation, humidity), environment-related variables (normalised difference vegetation index-NDVI, soil moisture) and farm-related variables (livestock densities). Random forest (RF) models were developed to explore the spatial distributions of C. imicola, C. bolitinos and a merged species map, where both competent vectors were combined. The maps were then compared to interpolation maps using the same capture data as well as historical locations of BT and AHS outbreaks. Results: Overall, the RF models performed well with 75.02%, 61.6% and 74.01% variance explained for C. imicola, C. bolitinos and merged species models respectively. Cattle density was the most important predictor for C. imicola and water vapour pressure the most important for C. bolitinos. Compared to interpolation maps, the RF models had higher predictive power throughout most of the year when species were modelled individually; however, when merged, the interpolation maps performed better in all seasons except winter. Finally, midge densities did not show any conclusive correlation with BT or AHS outbreaks. Conclusions: This study yielded novel insight into the spatial abundance and drivers of abundance of competent vectors of BT and AHS. It also provided valuable data to inform mathematical models exploring disease outbreaks so that Culicoides-transmitted diseases in South Africa can be further analysed.
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
Overview
This study used machine learning to model the geographic distribution of two main biting midge species that transmit bluetongue and African horse sickness viruses in South Africa’s Western Cape region.
The researchers identified key environmental and farm-related factors influencing the species’ presence and compared model predictions with historical outbreak data.
Introduction and Background
Culicoides biting midges are tiny flying insects playing a crucial role as vectors spreading viral diseases like bluetongue (BT) and African horse sickness (AHS) which affect livestock.
These midges have a worldwide distribution and live in diverse habitats influenced by environmental and human-related factors, which may shift due to climate change, potentially altering disease risk zones.
The primary vectors studied are Culicoides imicola and Culicoides bolitinos due to their importance in transmitting BT and AHS viruses.
Study Aim and Approach
Objective: To model the spatial distribution of C. imicola and C. bolitinos in the Western Cape region, an area frequently affected by BT and AHS outbreaks.
Method: Utilized a machine learning approach called random forest (RF) to analyze how environmental and farm-related variables relate to the abundance and distribution of these vectors.
Compared the RF model results with traditional spatial interpolation maps created from the same insect capture data and checked correlations with historical disease outbreak locations.
Data and Variables
Data Collection: Midge capture data accumulated over nearly 26 years (1996-2022) from 171 different locations in the Western Cape.
Predictor Variables:
Climate-related: temperature, precipitation, humidity, water vapour pressure.
Environment-related: Normalized Difference Vegetation Index (NDVI), which indicates vegetation density and health; soil moisture levels.
Farm-related: livestock densities, specifically cattle density, which relates to the availability of hosts.
Modelling and Analysis
Developed separate RF models for each midge species (C. imicola and C. bolitinos) and also a combined model merging both species’ data.
Assessed model performance using variance explained by the models:
C. imicola: 75.02% variance explained
C. bolitinos: 61.6% variance explained
Merged species model: 74.01% variance explained
Compared model outputs seasonally to spatial interpolation maps to evaluate predictive power throughout the year.
Key Findings
Important predictors:
Cattle density was the most important variable influencing C. imicola distribution, likely due to cattle being a key blood source.
Water vapour pressure was most significant for C. bolitinos, indicating sensitivity to atmospheric moisture conditions.
Model comparison:
RF models showed higher prediction accuracy than interpolation maps when modeling species individually for most seasons.
However, for the combined species model, interpolation maps outperformed RF models in all seasons except winter.
Disease outbreaks correlation: No clear, conclusive correlation was found between midge density predictions and the historical locations of BT and AHS outbreaks, suggesting other factors influence disease dynamics.
Conclusions and Implications
The study provides novel spatial insights into where competent vector midges are abundant in the Western Cape and which environmental and anthropogenic factors drive their distribution.
The findings improve understanding of vector ecology that can feed into mathematical and epidemiological models aiming to predict and manage BT and AHS outbreaks more effectively.
It highlights the value of machine learning approaches like random forests in vector distribution modeling over traditional interpolation techniques, with some limitations for combined species modeling.
While vector density alone did not explain outbreaks, the models facilitate future research incorporating additional ecological and epidemiological variables for comprehensive disease risk assessment.
Cite This Article
APA
de Klerk J, Tildesley M, Labuschagne K, Gorsich E.
(2024).
Modelling bluetongue and African horse sickness vector (Culicoides spp.) distribution in the Western Cape in South Africa using random forest machine learning.
Parasit Vectors, 17(1), 354.
https://doi.org/10.1186/s13071-024-06446-8
The Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK. jo.de-klerk@warwick.ac.uk.
Tildesley, Michael
The Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK.
Labuschagne, Karien
Epidemiology, Parasites and Vectors, Agricultural Research Council, Onderstepoort Veterinary Research, Onderstepoort, 0110, South Africa.
Gorsich, Erin
The Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK.
MeSH Terms
Animals
Cattle
African Horse Sickness / epidemiology
African Horse Sickness / transmission
African Horse Sickness / virology
Bluetongue / epidemiology
Bluetongue / transmission
Bluetongue / virology
Bluetongue virus
Ceratopogonidae / virology
Climate
Disease Outbreaks
Ecosystem
Horses
Insect Vectors / virology
Machine Learning
Random Forest
South Africa / epidemiology
Sheep
Grant Funding
BB/M01116X/1 / Biotechnology and Biological Sciences Research Council
Conflict of Interest Statement
The authors declare no competing interests.
References
This article includes 67 references
Mellor PS, Boorman J, Baylis M. biting midges: their role as arbovirus vectors. Annu Rev Entomol 2000;45:307–40.
Guichard S, Guis H, Tran A, Garros C, Balenghien T, Kriticos DJ. Worldwide niche and future potential distribution of , a major vector of bluetongue and African horse sickness viruses. PLoS ONE 2014;9:e112491.
Purse BV, Mellor PS, Rogers DJ, Samuel AR, Mertens PP, Baylis M. Climate change and the recent emergence of bluetongue in Europe. Nat Rev Microbiol 2005;3:171–81.
Howell PG. The 1960 epizootic of African Horsesickness in the Middle East and SW Asia (268KB)(268KB). J S Afr Vet Assoc 1960;31:329–34.
Baylis M, Hasnaoui HE, Bouayoune H, Touti J, Mellor PS. The spatial and seasonal distribution of African horse sickness and its potential vectors in Morocco. Med Vet Entomol 1997;11:203–12.
de Klerk JN, Gorsich EE, Grewar JD, Atkins BD, Tennant WS, Labuschagne K. Modelling African horse sickness emergence and transmission in the South African control area using a deterministic metapopulation approach. PLoS Comput Biol 2023;19:e1011448.
Baylis M, Meiswinkel R, Venter GJ. A preliminary attempt to use climate data and satellite imagery to model the abundance and distribution of (Diptera: Ceratopogonidae) in southern Africa. J S Afr Vet Assoc 1999;70:80–9.
Baylis M, Rawlings P. Modelling the distribution and abundance of in Morocco and Iberia using climatic data and satellite imagery. In African horse sickness 1998 (pp. 137-153). Vienna: Springer Vienna.
Calistri P, Goffredo M, Caporale V, Meiswinkel R. The distribution of in Italy: application and evaluation of current Mediterranean models based on climate. J Vet Med Ser B 2003;50:132–8.
Calvete C, Estrada R, Miranda MA, Borrás D, Calvo JH, Lucientes J. Modelling the distributions and spatial coincidence of bluetongue vectors and the obsoletus group throughout the Iberian Peninsula. Med Vet Entomol 2008;22:124–34.
Cuéllar AC, Kjær LJ, Baum A, Stockmarr A, Skovgard H, Nielsen SA. Modelling the monthly abundance of biting midges in nine European countries using Random Forests machine learning. Parasit Vectors 2020;13:1–8.
Del Lesto I, Magliano A, Casini R, Ermenegildi A, Rombolà P, De Liberato C, Romiti F. Ecological niche modelling of and future range shifts under climate change scenarios in Italy. Med Vet Entomol 2024.
Diarra M, Fall M, Fall AG, Diop A, Lancelot R, Seck MT. Spatial distribution modelling of (Diptera: Ceratopogonidae) biting midges, potential vectors of African horse sickness and bluetongue viruses in Senegal. Parasit Vectors 2018;11:1–5.
Eksteen S, Breetzke GD. Predicting the abundance of African horse sickness vectors in South Africa using GIS and artificial neural networks. S Afr J Sci 2011;107:1–8.
Tatem AJ, Baylis M, Mellor PS, Purse BV, Capela R, Pena I. Prediction of bluetongue vector distribution in Europe and north Africa using satellite imagery. Vet Microbiol 2003;97:13–29.
Wittmann EJ, Mellor PS, Baylis M. Using climate data to map the potential distribution of (Diptera: Ceratopogonidae) in Europe. Revue Scientifique et Technique-Office International des Epizooties 2001;20:731–40.
Evans JS, Murphy MA, Holden ZA, Cushman SA. Modeling species distribution and change using random forest. In: Predictive species and habitat modeling in landscape ecology: Concepts and applications 2010 (pp. 139–159). New York, NY: Springer New York.
Drake JM, Randin C, Guisan A. Modelling ecological niches with support vector machines. J Appl Ecol 2006;43:424–32.
Guisan A, Edwards TC Jr, Hastie T. Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecol Model 2002;157:89–100.
Deneu B, Servajean M, Bonnet P, Botella C, Munoz F, Joly A. Convolutional neural networks improve species distribution modelling by capturing the spatial structure of the environment. PLoS Comput Biol 2021;17:e1008856.
Mateo RG, Felicísimo ÁM, Muñoz J. Effects of the number of presences on reliability and stability of MARS species distribution models: the importance of regional niche variation and ecological heterogeneity. J Veg Sci 2010;21:908–22.
Yackulic CB, Chandler R, Zipkin EF, Royle JA, Nichols JD, Campbell Grant EH. Presence-only modelling using MAXENT: when can we trust the inferences?. Methods Ecol Evol 2013;4:236–43.
Tymoteusz M, Kozlovska P, Krzemińska A, Lewita K, Biedrzycka J, Geroch K. XGBOOST in environmental ecology: a powerful tool for sustainable insights. Grail Sci 2023;16:163–70.
Meiswinkel R, Labuschagne K, Baylis M, Mellor PS. Multiple vectors and their differing ecologies: observations on two bluetongue and African horse sickness vector species in South Africa. Vet Ital 2004;1:296–302.
Veronesi E, Venter GJ, Labuschagne K, Mellor PS, Carpenter S. Life-history parameters of (Avaritia) imicola Kieffer in the laboratory at different rearing temperatures. Vet Parasitol 2009;163:370–3.
Barceló C, Miranda MA. Bionomics of livestock-associated (biting midge) bluetongue virus vectors under laboratory conditions. Med Vet Entomol 2018;32:216–25.
Van Doninck J, De Baets B, Peters J, Hendrickx G, Ducheyne E, Verhoest NE. Modelling the spatial distribution of : climatic versus remote sensing data. Remote Sens 2014;6:6604–19.
Didan K. MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061. 2021. https://lpdaac.usgs.gov/products/mod13q1v061/. Accessed 5 Jun 2024.
Copernicus Climate Change Service, C. D. S. Soil moisture gridded data from 1978 to present. 2021. https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-soil-moisture?tab=overview. Accessed 5 Jun 2024.
WCDoA. (2023). CapeFarmMapper 3. 2023. https://gis.elsenburg.com/apps/cfm/. Accessed 5 Jun 2024.
DFFE. 2020 South African National Landcover Data and the CALC system. 2020. https://egis.environment.gov.za/sa_national_land_cover_datasets. Accessed 5 Jun 2024.
Wouters H, Berckmans J, Maes R, Vanuytrecht E, De Ridder K. Global bioclimatic indicators from 1950 to 2100 derived from climate projections. 2021.
Asuero AG, Sayago A, González AG. The correlation coefficient: an overview. Crit Rev Anal Chem 2006;36:41–59.
Cianci D, Hartemink N, Ibáñez-Justicia A. Modelling the potential spatial distribution of mosquito species using three different techniques. Int J Health Geogr 2015;14:1.
Ducheyne E, Charlier J, Vercruysse J, Rinaldi L, Biggeri A, Demeler J. Modelling the spatial distribution of Fasciola hepatica in dairy cattle in Europe. Geospat Health 2015;9:261–70.
Peters J, Waegeman W, Ducheyne E, Calvete C, Lucientes J, Verhoest NE. Predicting spatio-temporal distributions in Spain based on environmental habitat characteristics and species dispersal. Ecol Inform 2014;1:69–80.
Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009.
DAFF. Disease Database. 2024. http://webapps.daff.gov.za/VetWeb/dieaseDatabase.do. Accessed 5 Jun 2024.
Kameke D, Kampen H, Walther D. Activity of spp. (Diptera: Ceratopogonidae) inside and outside of livestock stables in late winter and spring. Parasitol Res 2017;116:881–9.
Mellor PS, Prrzous G. Observations on breeding sites and light-trap collections of during an outbreak of bluetongue in Cyprus. Bull Entomol Res 1979;69:229–34.
Nevill EM, Edwardes M, Pajor IT, Meiswinkel R, Van Gas JH, Venter GJ. species associated with livestock in the Stellenbosch area of the Western Cape Province, Republic of South Africa (Diptera: Ceratopogonidae). .
Braverman Y, Phelps RJ. Species composition and blood-meal identification in samples of (Diptera: Ceratopogonidae) collected near Salisbury, Zimbabwe in 1976–77. J Entomol Soc South Afr 1981;44:315–23.
Slama D, Haouas N, Mezhoud H, Babba H, Chaker E. Blood meal analysis of (Diptera: Ceratopogonidae) in central Tunisia. PLoS ONE 2015;10:e0120528.
Venter GJ, Boikanyo SN, de Beer CJ. The influence of temperature and humidity on the flight activity of imicola both under laboratory and field conditions. Parasit Vectors 2019;12:1–3.
Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;10:1–26.
Peters J, De Baets B, Calvete C, Lucientes J, De Clercq EM, Ducheyne E. Absence reduction in entomological surveillance data to improve niche-based distribution models for. Prev Vet Med 2011;100:15–28.
VanDerWal J, Shoo LP, Johnson CN, Williams SE. Abundance and the environmental niche: environmental suitability estimated from niche models predicts the upper limit of local abundance. Am Nat 2009;174:282–91.
Alkhamis MA, Fountain-Jones NM, Aguilar-Vega C, Sánchez-Vizcaíno JM. Environment, vector, or host? Using machine learning to untangle the mechanisms driving arbovirus outbreaks. Ecol Appl 2021;31:e02407.
Ayuti SR, Khairullah AR, Lamid M, Warsito SH, Arif MAA, Kim EJ, Moses IB, Shin S, Wardhani BWK, Wasito W, Khalisa AT, Ahmad RZ. Bluetongue in ruminants: Global epidemiology, pathogenesis, and advances in diagnostic and control strategies within a One Health framework.. Vet World 2025 Oct;18(10):3070-3093.
Cetintav B, Yalcin A. From Prediction to Precision: Explainable AI-Driven Insights for Targeted Treatment in Equine Colic.. Animals (Basel) 2025 Jan 8;15(2).