Analyze Diet
Scientific reports2025; 15(1); 13670; doi: 10.1038/s41598-025-96634-y

A segment-based framework for explainability in animal affective computing.

Abstract: Recent developments in animal motion tracking and pose recognition have revolutionized the study of animal behavior. More recent efforts extend beyond tracking towards affect recognition using facial and body language analysis, with far-reaching applications in animal welfare and health. Deep learning models are the most commonly used in this context. However, their "black box" nature poses a significant challenge to explainability, which is vital for building trust and encouraging adoption among researchers. Despite its importance, the field of explainability and its quantification remains under-explored. Saliency maps are among the most widely used methods for explainability, where each pixel is assigned a significance level indicating its relevance to the neural network's decision. Although these maps are frequently used in research, they are predominantly applied qualitatively, with limited methods for quantitatively analyzing them or identifying the most suitable method for a specific task. In this paper, we propose a framework aimed at enhancing explainability in the field of animal affective computing. Assuming the availability of a classifier for a specific affective state and the ability to generate saliency maps, our approach focuses on evaluating and comparing visual explanations by emphasizing the importance of meaningful semantic parts captured as segments, which are thought to be closely linked to behavioral indicators of affective states. Furthermore, our approach introduces a quantitative scoring mechanism to assess how well the saliency maps generated by a given classifier align with predefined semantic regions. This scoring system allows for systematic, measurable comparisons of different pipelines in terms of their visual explanations within animal affective computing. Such a metric can serve as a quality indicator when developing classifiers for known biologically relevant segments or help researchers assess whether a classifier is using expected meaningful regions when exploring new potential indicators. We evaluated the framework using three datasets focused on cat and horse pain and dog emotions. Across all datasets, the generated explanations consistently revealed that the eye area is the most significant feature for the classifiers. These results highlight the potential of the explainability frameworks such as the suggested one to uncover new insights into how machines 'see' animal affective states.
Publication Date: 2025-04-21 PubMed ID: 40258884PubMed Central: PMC12012102DOI: 10.1038/s41598-025-96634-yGoogle Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
  • Journal Article

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

Overview

  • This research paper proposes a new framework to improve the explainability of deep learning models used for recognizing emotional states in animals through motion and facial analysis.
  • The framework introduces a quantitative method to evaluate how well visual explanations (saliency maps) correspond to biologically meaningful body parts linked to animal affective states, enhancing trust and understanding of AI decisions.

Background and Motivation

  • Animal behavior studies have advanced with motion tracking and pose recognition technologies, enabling detailed analysis of movement and expressions.
  • Recent research focuses on identifying affective states (emotions, pain) in animals using facial and body language analysis powered by deep learning models.
  • Deep learning models, while powerful, are often “black boxes” whose decision-making process is unclear, posing challenges for trust and scientific adoption.
  • Explainability—the ability to understand and interpret how models make decisions—is critical, yet it remains underdeveloped in animal affective computing.

Explainability Techniques and Challenges

  • Saliency maps are a common explainability tool that highlight important pixels influencing a model’s prediction.
  • Currently, saliency maps are used mostly qualitatively, making it difficult to systematically compare or validate explanations.
  • There is a lack of quantitative metrics to assess how well saliency maps align with relevant biological features or semantic regions on the animal’s body.

Proposed Framework

  • The authors propose a segment-based framework focusing on semantic body regions (segments) that are meaningful indicators of animal affective states (e.g., eyes, ears, facial areas).
  • The framework:
    • Assumes the existence of a classifier for an animal affective state and the ability to produce saliency maps for its predictions.
    • Divides the animal’s body into predefined semantic segments that correspond to biologically relevant behavioral indicators.
    • Introduces a quantitative scoring mechanism to evaluate how well saliency maps highlight these meaningful segments.
  • This scoring system allows comparison between different classifiers or saliency methods in a reproducible, objective way.
  • The metric serves as both a quality check when designing classifiers for known indicators and as a tool to explore novel affective features.

Evaluation and Results

  • The framework was tested on three datasets involving:
    • Cat and horse pain detection
    • Dog emotion recognition
  • Across all datasets:
    • The analysis consistently identified the eye area as the most significant feature influencing classifier decisions.
    • This aligns with biological understanding that eyes and face are critical affective indicators in animals.
    • The framework successfully quantified the explainability of classifiers and highlighted relevant semantic regions.

Significance and Implications

  • This framework advances explainability in animal affective computing by introducing systematic, quantitative analysis of visual explanations.
  • It helps bridge the gap between black-box deep learning models and biological understanding of animal emotions.
  • The approach can increase trust and transparency, encouraging adoption by researchers and practitioners interested in animal welfare and health.
  • Additionally, it offers a foundation for discovering new behavioral features linked to affective states by highlighting unexpected important regions.

Summary

  • The paper presents a novel segment-based explainability framework tailored for animal affect recognition models.
  • By quantitatively linking saliency map highlights to biologically meaningful body parts, it provides objective tools to interpret and validate AI decisions.
  • Validated across multiple datasets, the framework consistently found key semantic regions, such as the eyes, are critical to model predictions.
  • This work contributes to making AI in animal affective computing more transparent, trustworthy, and insightful.

Cite This Article

APA
Boneh-Shitrit T, Finka L, Mills DS, Luna SP, Dalla Costa E, Zamansky A, Bremhorst A. (2025). A segment-based framework for explainability in animal affective computing. Sci Rep, 15(1), 13670. https://doi.org/10.1038/s41598-025-96634-y

Publication

ISSN: 2045-2322
NlmUniqueID: 101563288
Country: England
Language: English
Volume: 15
Issue: 1
Pages: 13670
PII: 13670

Researcher Affiliations

Boneh-Shitrit, Tali
  • Information Systems Department, University of Haifa, Haifa, Israel.
Finka, Lauren
  • Cats Protection, National Cat Centre, Chelwood Gate, Sussex, UK.
Mills, Daniel S
  • School of Life&Environmental Sciences, Joseph Bank Laboratories, University of Lincoln, Lincoln, UK.
Luna, Stelio P
  • School of Veterinary Medicine and Animal Science, São Paulo State University (Unesp), São Paulo, Brazil.
Dalla Costa, Emanuella
  • Department of Veterinary Medicine and Animal Sciences, University of Milan, Milan, Italy.
Zamansky, Anna
  • Information Systems Department, University of Haifa, Haifa, Israel. annazam@is.haifa.ac.il.
Bremhorst, Annika
  • Dogs and Science, Bern, Switzerland.
  • Department for Clinical Veterinary Science, Vetsuisse Faculty, University of Bern, Bern, Switzerland.

MeSH Terms

  • Animals
  • Behavior, Animal / physiology
  • Deep Learning
  • Neural Networks, Computer
  • Affect

References

This article includes 60 references
  1. Hogg MA, Abrams D. Social cognition and attitudes. 3rd edn, 684–721 (Pearson Education Limited, 2007).
  2. Picard, R. W. (MIT press, 2000).
  3. Ho M-T, Mantello P, Nguyen H-KT, Vuong Q-H. Affective computing scholarship and the rise of china: a view from 25 years of bibliometric data. 1–14 (2021).
  4. Tao J, Tan T. Affective computing: A review. 981–995 (Springer, 2005).
  5. Sharma G, Dhall A. A survey on automatic multimodal emotion recognition in the wild. 35–64 (Springer International Publishing, 2021).
  6. Zhu X. A review of key technologies for emotion analysis using multimodal information. 1504–1530 (2024).
  7. Zhu X, Huang Y, Wang X, Wang R. Emotion recognition based on brain-like multimodal hierarchical perception. 56039–56057 (2024).
  8. Wang R. Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking. 39 (2024).
  9. Zhu X. A client-server based recognition system: Non-contact single/multiple emotional and behavioral state assessment methods. 108564 (2025).
    doi: 10.1016/j.cmpb.2024.108564pubmed: 39732086google scholar: lookup
  10. Diogo R, Abdala V, Lonergan N, Wood B. From fish to modern humans-comparative anatomy, homologies and evolution of the head and neck musculature. 391–424 (2008).
  11. Mathis A. Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. 1281 (2018).
    doi: 10.1038/s41593-018-0209-ypubmed: 30127430google scholar: lookup
  12. Pennington ZT. ezTrack: An open-source video analysis pipeline for the investigation of animal behavior. 1–11 (2019).
    doi: 10.1038/s41598-019-56408-9pmc: PMC6934800pubmed: 31882950google scholar: lookup
  13. Amir S, Zamansky A, van der Linden D. K9-blyzer-towards video-based automatic analysis of canine behavior. 2017.
  14. Pereira TD. Fast animal pose estimation using deep neural networks. 117–125 (2019).
    doi: 10.1038/s41592-018-0234-5pmc: PMC6899221pubmed: 30573820google scholar: lookup
  15. Graving JM. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. e47994 (2019).
    doi: 10.7554/eLife.47994pmc: PMC6897514pubmed: 31570119google scholar: lookup
  16. Romero-Ferrero F, Bergomi MG, Hinz RC, Heras FJ, de Polavieja GG. Idtracker.ai: tracking all individuals in small or large collectives of unmarked animals.. 179–182 (2019).
    doi: 10.1038/s41592-018-0295-5pubmed: 30643215google scholar: lookup
  17. Broomé S. Going deeper than tracking: a survey of computer-vision based recognition of animal pain and affective states.. arXiv preprint arXiv 2206.08405 (2022).
  18. Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Vol. 1 (MIT press Cambridge, 2016).
  19. Wang Y. A systematic review on affective computing: Emotion models, databases, and recent advances.. (2022).
  20. Arrieta AB. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai.. 82–115 (2020).
  21. Samek W, Montavon G, Lapuschkin S, Anders CJ, Müller K-R. Explaining deep neural networks and beyond: A review of methods and applications.. 247–278 (2021).
  22. Räuker T, Ho A, Casper S, Hadfield-Menell D. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks.. 464–483 (IEEE, 2023).
  23. Ras G, Xie N, Van Gerven M, Doran D. Explainable deep learning: A field guide for the uninitiated.. 329–396 (2022).
    doi: 10.1613/jair.1.13200google scholar: lookup
  24. La Rosa B. State of the art of visual analytics for explainable deep learning.. vol. 42, 319–355 (Wiley Online Library, 2023).
  25. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B. Definitions, methods, and applications in interpretable machine learning.. 22071–22080 (2019).
    doi: 10.1073/pnas.1900654116pmc: PMC6825274pubmed: 31619572google scholar: lookup
  26. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization.. 2921–2929 (2016).
  27. Selvaraju RR. Grad-cam: Visual explanations from deep networks via gradient-based localization.. 618–626 (IEEE Computer Society, 2017).
  28. Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks.. (2017).
  29. Kindermans P-J. The (un) reliability of saliency methods.. 267–280 (2019).
  30. Ellis, G. (Springer, 2018).
  31. Blumenthal-Barby JS, Krieger H. Cognitive biases and heuristics in medical decision making: a critical review using a systematic search strategy.. 539–557 (2015).
    doi: 10.1177/0272989X14547740pubmed: 25145577google scholar: lookup
  32. Adebayo J. Sanity checks for saliency maps.. (2018).
  33. Hooker S, Erhan D, Kindermans PJ, Kim B. A benchmark for interpretability methods in deep neural networks.. (2019).
  34. Kim SS, Meister N, Ramaswamy VV, Fong R, Russakovsky O. Hive: Evaluating the human interpretability of visual explanations.. 280–298 (Springer, 2022).
  35. Tjoa E, Guan C. Quantifying explainability of saliency methods in deep neural networks with a synthetic dataset.. 858–870 (2023).
    doi: 10.1109/TAI.2022.3228834google scholar: lookup
  36. Li XH. An experimental study of quantitative evaluations on saliency methods.. 3200–3208 (2021).
  37. Boneh-Shitrit T. Explainable automated recognition of emotional states from canine facial expressions: the case of positive anticipation and frustration.. 22611 (2022).
    doi: 10.1038/s41598-022-27079-wpmc: PMC9803655pubmed: 36585439google scholar: lookup
  38. Broomé S, Gleerup KB, Andersen PH, Kjellstrom H. Dynamics are important for the recognition of equine pain in video.. 12667–12676 (2019).
  39. Feighelstein M. Explainable automated pain recognition in cats.. 8973 (2023).
    doi: 10.1038/s41598-023-35846-6pmc: PMC10238514pubmed: 37268666google scholar: lookup
  40. Correia-Caeiro C, Burrows A, Wilson DA, Abdelrahman A, Miyabe-Nishiwaki T. Callifacs: The common marmoset facial action coding system.. e0266442 (2022).
  41. Caeiro C, Waller B, Zimmerman E, Burrows A, Davila Ross M. Orangfacs: A muscle-based movement coding system for facial communication in orangutans.. 115–129 (2013).
    doi: 10.1007/s10764-012-9652-xgoogle scholar: lookup
  42. Waller B, Correia Caeiro C, Peirce K, Burrows A, Kaminski J. Dogfacs: the dog facial action coding system (2013).. .
  43. Evangelista MC. Facial expressions of pain in cats: the development and validation of a feline grimace scale.. 1–11 (2019).
    doi: 10.1038/s41598-019-55693-8pmc: PMC6911058pubmed: 31836868google scholar: lookup
  44. Finka LR. Geometric morphometrics for the study of facial expressions in non-human animals, using the domestic cat as an exemplar.. 1–12 (2019).
    doi: 10.1038/s41598-019-46330-5pmc: PMC6614427pubmed: 31285531google scholar: lookup
  45. Dalla Costa E. Development of the horse grimace scale (hgs) as a pain assessment tool in horses undergoing routine castration.. e92281 (2014).
  46. Bremhorst A, Sutter NA, Würbel H, Mills DS, Riemer S. Differences in facial expressions during positive anticipation and frustration in dogs awaiting a reward.. 1–13 (2019).
    doi: 10.1038/s41598-019-55714-6pmc: PMC6917793pubmed: 31848389google scholar: lookup
  47. Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale.. (2021).
  48. Caron M. Emerging properties in self-supervised vision transformers.. (2021).
  49. Fu, R. et al. , 02312 (2020).
  50. Feighelstein M. Automated recognition of pain in cats.. 9575 (2022).
    doi: 10.1038/s41598-022-13348-1pmc: PMC9187730pubmed: 35688852google scholar: lookup
  51. Zhang Z. Nested hierarchical transformer: Towards accurate, data-efficient and interpretable visual understanding.. (2022).
  52. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition.. (2016).
  53. Oquab M. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 2023.
  54. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL. New support vector algorithms.. 1207–1245 (2000).
    doi: 10.1162/089976600300015565pubmed: 10905814google scholar: lookup
  55. Kindermans P-J. Learning how to explain neural networks: Patternnet and patternattribution. 2017.
    doi: 10.48550/arxiv.1705.05598google scholar: lookup
  56. Jocher G, Chaurasia A, Qiu J. Ultralytics yolov8. 2023.
  57. Sammut, C. & Webb, G. I. (eds) 600–601 (Springer, 2010).
  58. Sammut C, Webb GI. Leave-one-out cross-validation. 600–601 (2010).
  59. Broome S. Going deeper than tracking: A survey of computer-vision based recognition of animal pain and emotions. 572–590 (2023).
  60. Hao Z, AghaKouchak A, Nakhjiri N, Farahmand A. Global integrated drought monitoring and prediction system (GIDMaPS) data sets. 2014.
    pmc: PMC4322588pubmed: 25977759doi: 10.6084/m9.figshare.853801google scholar: lookup

Citations

This article has been cited 1 times.
  1. Zhou S, Li W, Zhou M, Dilger RN, Condotta ICFS, Wu Z, Tang X, Wu Y, Wang T, Li J. Foundations of Livestock Behavioral Recognition: Ethogram Analysis of Behavioral Definitions and Its Practices in Multimodal Large Language Models.. Animals (Basel) 2025 Oct 19;15(20).
    doi: 10.3390/ani15203030pubmed: 41153957google scholar: lookup