Russell and Burch Revisited Michael FW Festing [email protected] Workshop on “The missing “R”: Reproducibility in a Changing Research Landscape”, ILAR, Washington DC, June 2014 1 Terms of reference “… has the earnest effort to addressing the “3R’s” actually contributed to the issue of reproducibility in scientific studies? Has, for example, the goal to reduce the number of mice to minimum necessary to attain statistical significance actually left experiments with insufficient numbers per treatment group for reproducibility?” 2 Why do animal experiments have fewer subjects than clinical trials Clinical trials are large because: Aim is to detect small, clinically important outcomes Human patients quite variable Animal experiments Any well designed experiment Aim to detect only large effects Laboratory animals are uniform: should give repeatable results. Age and weight Diet It doesn’t depend on sample size Environment butGenotype is subject toif specified levels of (particularly inbred strains used) Health sampling More reliablevariation induced disease(Type models I and Type II errors). 3 Principles of Humane Experimental Technique (Russell and Burch 1959) Commissioned by Universities Federation for Animal Welfare (UFAW) Replacement In-vitro methods, less sentient animals Refinement Free of infectious disease Minimise pain and distress. Anasthesia and analgesia, environmental enrichment Reduction, e.g. Research strategy Experimental design and statistics 4 “Reduction” means better experimental design and statistics Obtaining the same amount of information from fewer animals e.g. Better control of variation using randomised block designs Use of inbred strains Obtaining more information from the same number of animals e.g factorial designs 5 Janine A. Clayton & Francis Collins Policy: NIH to balance sex in cell and animal studies “As part of its initiative to enhance rigour, the NIH plans to disseminate training on experimental design for NIH staff, trainees and grantees. Evaluation of sex differences will be included in these modules.” Nature 14 May 2014 6 Incorporating both sexes into one experiment Factorial design Half of each sex Treated Control All male design Males & females in two expts. Treated Control Treated Control Treated Control 7 What is the scope for “reduction” Experiments often poorly designed and incorrectly analysed Result: Too many false positive and false negative results and a waste of animals and scientific resources. Festing MFW (1992). The scope for improving the design of laboratory animal experiments. Lab Animals 26:256-267.* Festing MFW (1994). Reduction of animal use: experimental design and quality of experiments. Lab Animals 28:212-221. * 1st prize by GV-SOLAS for best published or unpublished manuscript on any aspect of laboratory animal science 8 Survey of a random sample of 271 published papers using laboratory animals Of the papers studied: 87% did not report random allocation of subjects to treatments 86% did not report “blinding” where it seemed to be appropriate 100% failed to justify the sample sizes used 5% did not clearly state the purpose of the study 6% did not indicate how many separate experiments were done 13% did not identify the experimental unit correctly 26% failed to state the sex of the animals 24% reported neither age not weight of animals 4% did not mention the number of animals used 35% which reported numbers used these differed in the materials and methods and the results sections etc. Kilkenny et al (2009), PLoS One Vol. 4, e7824 9 Experiments don’t have to be large Muriel claims that she can tell whether the milk is put in the cup before or after the tea. Eight cups of tea are prepared, with four TM and four MT. She is told that they will be presented to her in random order and she should indicate which type they are. Number of ways of choosing four cups out of eight cups = ! = 1680/24 = 70. Only 1/70 is right, so if she does it correctly p=0.014 ! ! Decision rule: If the p-value is less than p=0.05, we reject the “null hypothesis” that she can’t detect TM/MT and accept the alternative that she can. The result is said to be “statistically significant” 10 After RA Fisher Statistical errors in a well designed experiment Chance of false positive results (Type I error) Depends on: 1) significance level (usually set at α=0.05) Chance of false negative results (Type II error) Depend on: 1) Sample size 2) Significance level 3) Effect size 4) Alternative hypothesis 5) Variability of the experimental material Current crisis involves too many false positive results. In a well designed experiment these don’t depend on sample size 11 False positive results in badly designed/analysed experiments Selective publication of positive results Incorrect randomisation (e.g. groups kept separate with different environments and terminated at different times) Failure to blind where it is possible Pseudo-replication & incorrect identification of the experimental unit Failure in quality control of experimental material (e.g. animals and reagents) Inadequate external validity (can not be generalised to other situations) Inadequate description of methods (e.g. strain nomenclature) Incorrect statistical analysis: No statistical analysis Multiple testing without adjustment Wrong statistical model Incorrect treatment of outliers: cherry-picking the data 12 Clear evidence of conflicts of interest impacting results Positive results in studies of endocrine disruption by bisphenol A. 94/104 = 90% Government funded 0/11 = 0% Industry funded Frederick S. vom Saal1 and Claude Hughes. Environ Health Perspect 113:926–933 (2005) 13 (10 Govt. funded, 3 Industry) studies used SD rats from Charles River. All were negative. This strain resistant to DES 13 Percent responders to a synthetic polypeptide in outbred CD rats 100 Percent responders 90 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 Sample number Simonian et al 1968, J. Immunol. 101:730. Note that 7 colonies of inbred rats were either 100% responders or non-responders. N~30 14 Annual Statistics of Scientific Procedures on Living Animals Great Britain 2012 4 million animals/70 million people/yr. ~ 4 animal/person in a 70 year lifespan 15 Annual Statistics of Scientific Procedures on Living Animals Great Britain 2012 16 Annual Statistics of Scientific Procedures on Living Animals Great Britain 2012 4 million animals each year for 70 million people. ~ 4 animals/person in a 70 year lifespan 17 Training needed “A basic understanding of experimental design and statistics is necessary for all scientists. For investigators with no previous training in statistics, this level of expertise can probably be obtained from an introductory course. There are many texts on statistical methods, which can be used for both learning purposes and as reference books. Biomedical research workers should have more detailed training in biometrics and statistics so that they can act as consultants to other investigators in their own institutes.” The Three R's: The Way Forward Joanne Zurlo, Deborah Rudacille, and Alan M. Goldberg Article reprinted from "Environmental Health Perspectives," August 1996, vol. 104, no. 8 18 2002 19 WWW. 20 Conclusions The 3Rs provide a strategy for every research project Replace animals with in-vitro methods wherever possible Refine experiments to minimise pain and distress of animals must be used Use the minimum number of animals consistent with achieving the objective In a well designed experiment false positive results depend only on the significance level Current problems are due to excessive numbers of false positive results. This is due to faulty experimental design Training is needed!! 21
© Copyright 2026 Paperzz