Premature mortality analysis of 52,000 deceased cats and dogs exposes socioeconomic disparities

Paper Link

Abstract

Monitoring mortality rates offers crucial insights into public health by uncovering the hidden impacts of diseases, identifying emerging trends, optimising resource allocation, and informing effective policy decisions. Here, we present a novel approach to analysing premature mortality in companion animals, utilising data from 28,159 deceased dogs and 24,006 deceased cats across the United Kingdom. By employing PetBERT-ICD, an automated large language model (LLM) based \emph{International Classification of Disease 11} syndromic classifier, we reveal critical insights into the causes and patterns of premature deaths. Our findings highlight the significant impact of behavioural conditions on premature euthanasia in dogs, particularly in ages one to six. We also identify a 19% increased risk of premature mortality in brachycephalic dog breeds, raising important animal welfare concerns. Our research establishes a strong correlation between socioeconomic status and premature mortality in cats and dogs. Areas with the lowest Index of Multiple Deprivation (IMD) scores show nearly a 50% reduction in the risk of premature mortality across cats and dogs, underscoring the powerful impact that socioeconomic factors can have on pet health and longevity. This research underscores the necessity of examining the socioeconomic disparities affecting animal health outcomes. By addressing these inequities, we can better safeguard the well-being of our companion animals.

Dataset

Electronic health records have been collected since March 2014 by SAVSNET, the Small Animal Veterinary Surveillance Network, comprising a sentinel network of 253 volunteer veterinary practices across the United Kingdom. A full description of SAVSNET has been presented elsewhere9. In summary, veterinary practices with compatible practice management software with the SAVSNET data exchange are recruited based on convenience. Within these participating practices, data is collected from each booked consultation (where an appointment has been made to see a veterinary practitioner or nurse). All owners attending a participating practice are informed of the data collection process and are given the opportunity to opt-out during their consultation. At this point, data is not sent to the SAVSNET data exchange and, therefore, is not utilised in any research. Pet owners are provided with informed consent that their data could be used for research purposes. Data is collected on a consultation-by-consultation basis and includes information such as species, breed, sex, neuter status, age, owner’s postcode, insurance and microchipping status and, crucially to this study, a free-text clinical narrative outlining the events that occurred within that consultation. To support the current study, high-level syndromic labels corresponding to the chapters of the International Classification of Diseases 11 (ICD-11) were appended to each consultation, building on established methodologies. These labels and associated conditions adhere to the broad categorisation framework outlined by the World Health Organisation (WHO)27. This method leveraged PetBERT, a large language model that was further pre-trained on the SAVSNET corpus and subsequently fine-tuned to function as a multi-label classifier covering the ICD-11’s high-level categories. The original study26 provides a comprehensive explanation of this approach. Sensitive information, including personal identifiers, was cleaned from the dataset prior to analysis. Additionally, all data discussed within this paper is at a population level; therefore, no specific individual will be discussed. SAVSNET has ethical approval from the University of Liverpool Research Ethics Committee (RETH000964). We hereby confirm that all experiments were conducted in strict compliance with the Research Ethics Committee.

Data extraction Narratives mentioning death or euthanasia were identified using a generalised Python regular expression (regex) to screen for terms such as “euthanasia”, “put to sleep (PTS)”, and “died”. The final regex is outlined below. The generalised dataset was then randomly sampled for 250 suspected potential death/euthanasia cases. These were then manually read to verify that they met the case definition of a “declaration of death”. Common false positives included discussions related to future euthanasia events or euthanasia mentioned as advisory by the attending practitioner. Unless the euthanasia event occurred in the same consultation, these records were not annotated as cases. Adapting the works of Yalniz et. al., a semi-supervised teacher-student model approach was employed, where a small subset of manually annotated records was used to train a small binary sequence classification model49. The resultant model from this task was then used against the entire dataset to extract animals identified as a case, a random sample of 200 records was passed to a practising clinician to verify that the performance of the extraction method was sufficient enough to continue the study.

euth|dead|died|pts|put to sleep|pento|doa|crem|burial|bury|qol|quality|ashes|scatter|casket

precision recall f1-score support
No Death 0.9940119760479041 1.0 0.996996996996997 166.0
Death 1.0 0.9846153846153847 0.9922480620155039 65.0
accuracy 0.9956709956709957 0.9956709956709957 0.9956709956709957 0.9956709956709957
macro avg 0.9970059880239521 0.9923076923076923 0.9946225295062504 231.0
weighted avg 0.9956969178526064 0.9956709956709957 0.9956607165909492 231.0

The cephalic index for each breed was appended as appropriate. These labels were derived from various sources17,50,51. Animals identified as crossbreeds were labeled as ’other’ due to insufficient cephalic data.

Defining premature mortality

To define breed-specific premature longevity thresholds for each breed within our dataset, we employed a bootstrapping method to estimate the median age at death for each breed. Specifically, we generated 10,000 new datasets through random sampling with replacement from the original dataset. For each new dataset, we calculated the median age at death for each breed and established a 95% confidence interval. Below 85% of the lower bound of this confidence interval served as the threshold for identifying premature mortality. To ensure robustness in our findings, we included only breeds with at least ten observations in both premature and expected death categories. Using a breed-specific 95% confidence interval, we accounted for the variances in life expectancy inherent to different breeds. An animal whose age at death was below the lower bound of its breed’s confidence interval was classified as having died prematurely.

Premature mortality and years of lost life (YLL) To analyse the causes of death among animals, we leveraged ICD-11 chapter labels automatically assigned using PetBERT26. In some cases, the final ’death’ record did not provide precise details regarding the cause of death. We extended our analysis to include consultations up to six months before the recorded death to address this. Any ICD-11 syndromic label present during this period was considered a potential indicator of the cause of death. We calculated the YLL for each age group (ages 1 to 14) using the following equation:

YLL=i=1n(LiAi)YLL = \sum _{i=1}^{n} (L_{i} - A_{i})

where is the estimated breed-specific premature longevity thresholds produced above for a given animal for the age group and is the age at death for each individual . This computation was performed for each ICD-11 chapter to determine the YLL attributable to different causes of death. Records maybe annotated with multiple syndromes and therefore each will be counted independently. To quantify the proportions of lost life, we summed the total number of years lost for each age group due to death linked to each ICD-11 chapter and divided these by the total number of years of lost life for the same age group. This approach allowed us to understand the distributions and impacts of various causes of death across different age groups and disease classifications.

Premature mortality risk factor analysis

We established the breed-specific premature longevity thresholds to investigate risk factors associated with premature death. Animals that died before this age were classified as experiencing premature death, while those living beyond this age were considered to have died ‘as expected’. We implemented a data truncation approach to mitigate survivorship bias and ensure a fair comparison between animals. Regardless of their actual lifespan, we only considered clinical events and data points occurring before the breed-specific premature longevity thresholds for all animals in the dataset. This approach effectively creates a standardised observation window for all animals in the study. Using this prepared dataset, we employed an initial univariate mixed-effect logistic regression model using case-control status as a binary dependent variable to identify potential risk factors that may increase the odds of an animal dying before their breed-specific longevity threshold. Animals identified as having died prematurely were categorised as cases, while controls were animals whose deaths occurred beyond the breed-specific premature threshold age. Each explanatory variable was analysed individually, and the fit compared to a null model was assessed using the likelihood ratio test (LRT chi-squared test), incorporating practice as a random effect to account for potential clustering. An initial multivariable logistic regression model was constructed by including only those explanatory variables that demonstrated an LRT p-value 0.2 compared to the null model. This preliminary model was then refined through a backward selection process to achieve the best model fit characterised by the lowest possible Akaike information criterion (AIC). The final multivariable model was analysed for multicollinearity by calculating the variance inflation factor (VIF), confirming that multicollinearity was absent.

Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support