Abstract: Several induction quality scoring systems (IQSS) have been described to evaluate drugs and risk factors of this anaesthetic period in horses, but no attempts to compare their reliability have been performed. Objective: To elucidate the reliability of three IQSS: the visual analogue scale (VAS), a simple descriptive scale (SDS), and a composite grading scale (CGS) proposed by the authors. Methods: Reliability study. Methods: Eight randomly selected video-recorded anaesthetic inductions from horses that underwent general anaesthesia were evaluated twice by four blinded evaluators with experience in equine anaesthesia, with a 1-month interval between assessments using the three aforementioned IQSS. A total of 64 evaluations per scale were generated. To assess reliability, intra- and inter-rater intraclass correlation coefficient (ICC), and their 95% confidence intervals (CI) were calculated based on a mean rating (k = 4), absolute agreement, 2-way random-effects model. Results: The inter-rater agreement was classified as moderate to good inter-rater reliability for all the scales, with the highest ICC found for the VAS (0.74 ± 0.11), followed by the CGS and the SDS (0.65 ± 0.22 and 0.63 ± 0.21, respectively). Intra-rater agreement results demonstrated very good reliability for both VAS and SDS (0.82 ± 0.08; 0.81 ± 0.18, respectively) and excellent reliability for the CGS (0.91 ± 0.08). Conclusions: The use of video-recordings instead of in situ evaluations, as the absence of audio may affect the assessment. Additionally, these findings are applicable only when free inductions are evaluated. Conclusions: The VAS and the novel CGS are reliable IQSS in horses, as are the widely used SDS. As the SDS are inconsistent across the literature, the VAS would be advised if multiple evaluators assess induction quality for research purposes, whereas the CGS would be selected for studies involving a single observer. We suggest routine inclusion of the VAS in the evaluation of the anaesthetic induction in horses.
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
Overview
This study evaluates and compares the reliability of three different scoring systems used to assess the quality of anesthetic induction in horses.
The three scoring systems examined are the Visual Analogue Scale (VAS), Simple Descriptive Scale (SDS), and a newly proposed Composite Grading Scale (CGS).
Background and Purpose
Induction quality scoring systems (IQSS) help veterinarians objectively assess how well anesthesia is induced in horses, influencing drug choice and management to reduce risks.
Despite several existing IQSS, no prior research has compared their reliability directly, which is crucial for standardizing evaluations in research and clinical practice.
The study aims to fill this gap by comparing intra-rater (same evaluator over time) and inter-rater (different evaluators) reliability of the three scales.
Methods
Eight video-recorded anesthetic inductions of horses undergoing general anesthesia were randomly selected for assessment.
Four experienced equine anesthesia evaluators, blinded to each other’s ratings, scored these videos twice with a 1-month interval between sessions.
Each evaluator applied all three IQSS (VAS, SDS, CGS) to every video, generating 64 evaluations per scale (8 videos × 4 evaluators × 2 sessions).
Reliability was quantified using Intraclass Correlation Coefficient (ICC), which measures consistency/agreement across ratings:
Inter-rater ICC assessed agreement between different evaluators.
Intra-rater ICC assessed consistency of each evaluator’s ratings over time.
95% confidence intervals were calculated to show precision of the ICC estimates.
A two-way random-effects model with absolute agreement was used for the ICC calculation, appropriate when raters are randomly sampled and absolute score agreement matters.
Results
Inter-rater reliability (agreement between different evaluators):
VAS demonstrated the highest reliability with ICC = 0.74 ± 0.11, classified as moderate to good reliability.
CGS reliability was slightly lower at ICC = 0.65 ± 0.22.
SDS showed comparable inter-rater reliability to CGS with ICC = 0.63 ± 0.21.
Intra-rater reliability (consistency of same evaluator over time):
VAS and SDS showed very good intra-rater reliability with ICC values around 0.81–0.82.
The CGS demonstrated excellent intra-rater reliability with ICC = 0.91 ± 0.08, indicating highly consistent ratings by individual evaluators over time.
Conclusions and Implications
The study confirms that all three scoring systems have acceptable reliability, but their strengths vary according to application:
VAS: Most reliable across multiple evaluators; should be preferred in research where several observers assess induction quality.
CGS: Superior intra-rater consistency; recommended for studies or clinical settings with a single observer.
SDS: Widely used but shows variable reliability in literature; less consistent compared to VAS and CGS in this study.
Limitations include the use of video recordings without audio, which may omit auditory cues important for assessing induction quality, and the findings apply primarily to situations involving free inductions.
The authors advocate for routine inclusion of the VAS during evaluation of anesthetic induction in horses to enhance consistency and comparability in both clinical practice and research.
Summary
This study systematically compares three methods for assessing equine anesthetic induction quality, a key period influencing anesthesia safety.
Findings suggest VAS is most suitable for multi-rater studies, CGS for single-observer assessments, and that SDS, although common, may lack consistency.
By recommending specific tools based on context, the study helps improve standardization and reliability in veterinary anesthetic assessment.
Cite This Article
APA
Villalba-Díez M, Benavente-Sánchez L, Bustamante R, Santiago-Llorente I, Villalba-Orero M.
(2025).
Reliability of three scoring systems for assessing quality of anaesthetic induction in horses.
Equine Vet J.
https://doi.org/10.1111/evj.70103
Hubbell JAE, Muir WW, Gorenberg E, Hopster K. A review of equine anesthetic induction: are all equine anesthetic inductions ‘crash’ inductions?. J Equine Vet Sci 2024;139:105130.
Gozalo‐Marcilla M, Bettschart‐Wolfensberger R, Johnston M, Taylor PM, Redondo JI. Data collection for the fourth multicentre confidential enquiry of perioperative fatalities (CEPEF4) study: new technology and preliminary results. Animals 2021;11.
Wolfe KL, Hofmeister EH, Clark‐Price SC, Reed R, Quandt J. Development of the Auburn Induction Scale for evaluating induction quality in dogs. Vet Anaesth Analg 2022;49:608–614.