Abstract: Recent developments in animal motion tracking and pose recognition have revolutionized the study of animal behavior. More recent efforts extend beyond tracking towards affect recognition using facial and body language analysis, with far-reaching applications in animal welfare and health. Deep learning models are the most commonly used in this context. However, their "black box" nature poses a significant challenge to explainability, which is vital for building trust and encouraging adoption among researchers. Despite its importance, the field of explainability and its quantification remains under-explored. Saliency maps are among the most widely used methods for explainability, where each pixel is assigned a significance level indicating its relevance to the neural network's decision. Although these maps are frequently used in research, they are predominantly applied qualitatively, with limited methods for quantitatively analyzing them or identifying the most suitable method for a specific task. In this paper, we propose a framework aimed at enhancing explainability in the field of animal affective computing. Assuming the availability of a classifier for a specific affective state and the ability to generate saliency maps, our approach focuses on evaluating and comparing visual explanations by emphasizing the importance of meaningful semantic parts captured as segments, which are thought to be closely linked to behavioral indicators of affective states. Furthermore, our approach introduces a quantitative scoring mechanism to assess how well the saliency maps generated by a given classifier align with predefined semantic regions. This scoring system allows for systematic, measurable comparisons of different pipelines in terms of their visual explanations within animal affective computing. Such a metric can serve as a quality indicator when developing classifiers for known biologically relevant segments or help researchers assess whether a classifier is using expected meaningful regions when exploring new potential indicators. We evaluated the framework using three datasets focused on cat and horse pain and dog emotions. Across all datasets, the generated explanations consistently revealed that the eye area is the most significant feature for the classifiers. These results highlight the potential of the explainability frameworks such as the suggested one to uncover new insights into how machines 'see' animal affective states.
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
Overview
This research paper proposes a new framework to improve the explainability of deep learning models used for recognizing emotional states in animals through motion and facial analysis.
The framework introduces a quantitative method to evaluate how well visual explanations (saliency maps) correspond to biologically meaningful body parts linked to animal affective states, enhancing trust and understanding of AI decisions.
Background and Motivation
Animal behavior studies have advanced with motion tracking and pose recognition technologies, enabling detailed analysis of movement and expressions.
Recent research focuses on identifying affective states (emotions, pain) in animals using facial and body language analysis powered by deep learning models.
Deep learning models, while powerful, are often “black boxes” whose decision-making process is unclear, posing challenges for trust and scientific adoption.
Explainability—the ability to understand and interpret how models make decisions—is critical, yet it remains underdeveloped in animal affective computing.
Explainability Techniques and Challenges
Saliency maps are a common explainability tool that highlight important pixels influencing a model’s prediction.
Currently, saliency maps are used mostly qualitatively, making it difficult to systematically compare or validate explanations.
There is a lack of quantitative metrics to assess how well saliency maps align with relevant biological features or semantic regions on the animal’s body.
Proposed Framework
The authors propose a segment-based framework focusing on semantic body regions (segments) that are meaningful indicators of animal affective states (e.g., eyes, ears, facial areas).
The framework:
Assumes the existence of a classifier for an animal affective state and the ability to produce saliency maps for its predictions.
Divides the animal’s body into predefined semantic segments that correspond to biologically relevant behavioral indicators.
Introduces a quantitative scoring mechanism to evaluate how well saliency maps highlight these meaningful segments.
This scoring system allows comparison between different classifiers or saliency methods in a reproducible, objective way.
The metric serves as both a quality check when designing classifiers for known indicators and as a tool to explore novel affective features.
Evaluation and Results
The framework was tested on three datasets involving:
Cat and horse pain detection
Dog emotion recognition
Across all datasets:
The analysis consistently identified the eye area as the most significant feature influencing classifier decisions.
This aligns with biological understanding that eyes and face are critical affective indicators in animals.
The framework successfully quantified the explainability of classifiers and highlighted relevant semantic regions.
Significance and Implications
This framework advances explainability in animal affective computing by introducing systematic, quantitative analysis of visual explanations.
It helps bridge the gap between black-box deep learning models and biological understanding of animal emotions.
The approach can increase trust and transparency, encouraging adoption by researchers and practitioners interested in animal welfare and health.
Additionally, it offers a foundation for discovering new behavioral features linked to affective states by highlighting unexpected important regions.
Summary
The paper presents a novel segment-based explainability framework tailored for animal affect recognition models.
By quantitatively linking saliency map highlights to biologically meaningful body parts, it provides objective tools to interpret and validate AI decisions.
Validated across multiple datasets, the framework consistently found key semantic regions, such as the eyes, are critical to model predictions.
This work contributes to making AI in animal affective computing more transparent, trustworthy, and insightful.
Cite This Article
APA
Boneh-Shitrit T, Finka L, Mills DS, Luna SP, Dalla Costa E, Zamansky A, Bremhorst A.
(2025).
A segment-based framework for explainability in animal affective computing.
Sci Rep, 15(1), 13670.
https://doi.org/10.1038/s41598-025-96634-y
Information Systems Department, University of Haifa, Haifa, Israel.
Finka, Lauren
Cats Protection, National Cat Centre, Chelwood Gate, Sussex, UK.
Mills, Daniel S
School of Life&Environmental Sciences, Joseph Bank Laboratories, University of Lincoln, Lincoln, UK.
Luna, Stelio P
School of Veterinary Medicine and Animal Science, São Paulo State University (Unesp), São Paulo, Brazil.
Dalla Costa, Emanuella
Department of Veterinary Medicine and Animal Sciences, University of Milan, Milan, Italy.
Zamansky, Anna
Information Systems Department, University of Haifa, Haifa, Israel. annazam@is.haifa.ac.il.
Bremhorst, Annika
Dogs and Science, Bern, Switzerland.
Department for Clinical Veterinary Science, Vetsuisse Faculty, University of Bern, Bern, Switzerland.
MeSH Terms
Animals
Behavior, Animal / physiology
Deep Learning
Neural Networks, Computer
Affect
References
This article includes 60 references
Hogg MA, Abrams D. Social cognition and attitudes. 3rd edn, 684–721 (Pearson Education Limited, 2007).
Picard, R. W. (MIT press, 2000).
Ho M-T, Mantello P, Nguyen H-KT, Vuong Q-H. Affective computing scholarship and the rise of china: a view from 25 years of bibliometric data. 1–14 (2021).
Tao J, Tan T. Affective computing: A review. 981–995 (Springer, 2005).
Sharma G, Dhall A. A survey on automatic multimodal emotion recognition in the wild. 35–64 (Springer International Publishing, 2021).
Diogo R, Abdala V, Lonergan N, Wood B. From fish to modern humans-comparative anatomy, homologies and evolution of the head and neck musculature. 391–424 (2008).
Romero-Ferrero F, Bergomi MG, Hinz RC, Heras FJ, de Polavieja GG. Idtracker.ai: tracking all individuals in small or large collectives of unmarked animals.. 179–182 (2019).
Broomé S. Going deeper than tracking: a survey of computer-vision based recognition of animal pain and affective states.. arXiv preprint arXiv 2206.08405 (2022).
Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Vol. 1 (MIT press Cambridge, 2016).
Wang Y. A systematic review on affective computing: Emotion models, databases, and recent advances.. (2022).
Arrieta AB. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.. 82–115 (2020).
Samek W, Montavon G, Lapuschkin S, Anders CJ, Müller K-R. Explaining deep neural networks and beyond: A review of methods and applications.. 247–278 (2021).
Räuker T, Ho A, Casper S, Hadfield-Menell D. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks.. 464–483 (IEEE, 2023).
Ras G, Xie N, Van Gerven M, Doran D. Explainable deep learning: A field guide for the uninitiated.. 329–396 (2022).
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization.. 2921–2929 (2016).
Selvaraju RR. Grad-cam: Visual explanations from deep networks via gradient-based localization.. 618–626 (IEEE Computer Society, 2017).
Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks.. (2017).
Kindermans P-J. The (un) reliability of saliency methods.. 267–280 (2019).
Ellis, G. (Springer, 2018).
Blumenthal-Barby JS, Krieger H. Cognitive biases and heuristics in medical decision making: a critical review using a systematic search strategy.. 539–557 (2015).
Li XH. An experimental study of quantitative evaluations on saliency methods.. 3200–3208 (2021).
Boneh-Shitrit T. Explainable automated recognition of emotional states from canine facial expressions: the case of positive anticipation and frustration.. 22611 (2022).
Correia-Caeiro C, Burrows A, Wilson DA, Abdelrahman A, Miyabe-Nishiwaki T. Callifacs: The common marmoset facial action coding system.. e0266442 (2022).
Caeiro C, Waller B, Zimmerman E, Burrows A, Davila Ross M. Orangfacs: A muscle-based movement coding system for facial communication in orangutans.. 115–129 (2013).
Bremhorst A, Sutter NA, Würbel H, Mills DS, Riemer S. Differences in facial expressions during positive anticipation and frustration in dogs awaiting a reward.. 1–13 (2019).
Zhou S, Li W, Zhou M, Dilger RN, Condotta ICFS, Wu Z, Tang X, Wu Y, Wang T, Li J. Foundations of Livestock Behavioral Recognition: Ethogram Analysis of Behavioral Definitions and Its Practices in Multimodal Large Language Models.. Animals (Basel) 2025 Oct 19;15(20).