Rigor and Reproducibility in Experimental Design: Glossary

Key Points

Introduction
  • Reproducibility expects the same results using the same data and methods from an original study.

  • Replicability corroborates the results of an experiment using the same methods as the original.

  • Funding agencies and publishers increasingly require evidence of reproducibility in grant applications and manuscript submissions.

Why we need better design
  • Solid experimental designs reduce both the impact of variability and the need for larger sample sizes.

  • Good study design anticipates analysis of the experiment.

Common flaws
  • When designing an experiment, use biological replicates.

  • Choose a single representative value (the mean, median, or mode) for technical replicates.

  • Poor study design can lead to waste and insignificant results.

Types of experiments
  • There are multiple kinds of biological experiments, which may involve generating hypotheses, testing the feasibility of research procedures, and testing hypotheses.

  • Pilot studies test the feasibility and efficiency of research procedures on a small scale, with few animals.

  • Experimental studies are characterized by randomization, replication, and control.

  • Exploratory studies are observational or correlational studies which identify patterns that inform hypotheses.

Experimental designs
  • Good experimental design minimizes error in a study.

  • Well-designed experiments are randomized, have adequate replicates, and feature local control of environmental variables.

  • There are three types of error in experiments: systematic, biological, and random.

Factorial designs
  • .

Power calculation and sample size
  • When designing an experiment, use biological replicates.

  • Choose a single representative value (the mean, median, or mode) for technical replicates.

Planning an experiment
  • A good experimental question is one that is worthwhile answering and that raises one or more testable hypotheses given constraints such as time, resources, etc. (Fry, p460)

  • Identify and categorize the experimental units and the other variables in the study (e.g., background, constant, primary, uncontrollable) [Should there be discussion of measurement?]

  • Good experimental design is strategic in its sampling methods by considering randomization, blocking or stratification methods, and sample size

  • Good design planning also identifies the statistical tests and analysis while considering other study aspects (e.g., hypotheses, variables, design structures, and samples).

Glossary

blocking
Samples of similar structure grouped together from both treatment and control.
biological replicate
Measurement(s) from biologically distinct samples (preferably taken at the same time) that convey the random biological variation that exists within a population. Biological replicates should not be confused with technical replicates.

block :

blocking
used to reduce unexplained variability by grouping together samples of similar structure from both treatment and control. For example, a new drug is tested on both male and female subjects. Sex of the patient is a blocking factor that accounts for treatment variability between males and females.
confounder
An unaccounted for variable that exerts either a small or large effect on a dependent (response) variable. Such variables increase variance and bias in the study.
control
An experimental subject that does not receive the treatment, and that is used as a baseline to evaluate the effect of the treatment on another group of subjects.
controlled experiment
an experiment done in parallel on a treatment and a control group that differ in one way (the independent or explanatory variable). Investigators determine which subjects go in the treatment group and which in the control group. For contrast, see observational experiment.

deviation :

effects :

experimental error :

experimental unit :

factorial design
A design that permits testing of multiple variables at once.

fixed effects :

observational experiment
an experiment done in parallel on a treatment (or exposure) and a control group that differ in one way (the independent or explanatory variable). The subjects, not the investigators, determine whether they are in the treatment group or the control group (i.e. smokers and non-smokers). For contrast, see controlled experiment.

random effects :

random error :

randomization
A method to reduce bias and minimize the likelihood of chance altering the results of an experiment.
replicability
Other researchers obtain corroborating results using experimental methods similar to those of the original study, generating their own data independently.
reproducibility
Other researchers duplicate results and can draw the same conclusions as the original study did using the same materials and methods (i.e. specific measurement devices, original data, software, statistical method).

sensitivity :

specificity :

systematic error :

technical replicate
Repeated measurements of the same sample that represent independent measures of the random noise associated with protocols or equipment. For contrast, see biological replicate.

treatment :

variability
the extent to which a distribution is spread out or squeezed in; also known as dispersion or spread
variance
a measure of statistical variability or dispersion
variation

External references

Data source

Gatti, DM, Simecek P, Somes L, Jeffery CT, Vincent MJ, Choi K, Chen X, Churchill GA and Svenson KL (2017). The Effects of Sex and Diet on Physiology and Liver Gene Expression in Diversity Outbred Mice. bioRxiv.

Reproducibility

Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility. Nature. 2014 Jan 30;505(7485):612-3.

Goodman SN, Fanelli D and Ioannidis, JPA. What does research reproducibility mean? Science Trans Med. 2016 Jun 01;8(341)

Hess KR. Statistical design considerations in animal studies published recently in cancer research. Cancer research. 2011 Jan 15;71(2):625-.

National Academies of Sciences, Engineering, and Medicine. 2015. Reproducibility Issues in Research with Animals and Animal Models: Workshop in Brief. Washington, DC: The National Academies Press. https://doi.org/10.17226/21835.

National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303.

Chalmers, Iain et al. 2009. Avoidable waste in the production and reporting of research evidence. The Lancet 374(9683):86-89

Bollen, K, Cacioppo, JT, Kaplan, R, Krosnick, J, Olds, JL. 2015. Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science. Arlington, VA: National Science Foundation.

Nuzzo, R. How scientists fool themselves – and how they can stop. Nature 526(7572) 182–185 (08 October 2015)

Yong, E. Replication studies: Bad copy. Nature 485, 298–300. (17 May 2012)

Experimental design

Krzywinski M, Altman N. Designing comparative experiments. Nat Methods. 2014 June;11(6):597-598.

Blainey P, Krzywinski M, Altman N. Points of significance: replication. Nat Methods. 2014 Sep;11(9):879-80.

Voelkl, B., Würbel, H., Krzywinski, M. et al. The standardization fallacy. Nat Methods 18, 5–7 (2021). https://doi.org/10.1038/s41592-020-01036-9

ILAR Journal: Design & Statistical Analysis of Animal Experiments

Dickersin K, Chan SS, Chalmersx TC, Sacks HS, Smith Jr H. Publication bias and clinical trials. Controlled clinical trials. 1987 Dec 1;8(4):343-53.

Error prone. Nature  487  406 EP -  (2012) https://doi.org/10.1038/487406a

Festing & Altman’s Guidelines for the Design and Statistical Analysis of Experiments Using Laboratory Animals

Fisher RA. The design of experiments.

Derek J. Fry, Teaching Experimental Design, ILAR Journal, Volume 55, Issue 3, 2014, Pages 457–471,

Johnson PD, Besselsen DG. Practical aspects of experimental design in animal research. ILAR journal. 2002 Jan 1;43(4):202-6.

Kilkenny C, Parsons N, Kadyszewski MF, Cuthill IC, Fry D, Hutton J, Altman DG. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PloS one. 2009;4(11).

Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS biology. 2010 Jun;8(6).

Gary Oehlert’s A First Course in Design and Analysis of Experiments

Statistics Done Wrong by Alex Reinhart

Repeated Measures Design by Mark Conaway

The Analysis Factor: Effect Size Statistics, Power, and Sample Size Calculations

The Analysis Factor: Series on Confusing Statistical Terms

Statistical Rules of Thumb

Nature Practical Guides

Krzywinski M, Altman N. Points of significance: Analysis of variance and blocking. Nat Methods. 2014 Jul;11(7):699-700.

Why animal research needs to improve 28 September 2011 Nature 477, 511 (2011) doi:10.1038/477511a

Optimal experimental design B. Smucker and M. Krzywinski and N. Altman Nature Methods  15  559–560  (2018) https://doi.org/10.1038/s41592-018-0083-2 Customize the experiment for the setting instead of adjusting the setting to fit a classical design.

Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, Robson R, Thabane M, Giangregorio L, Goldsmith CH. A tutorial on pilot studies: the what, why and how. BMC medical research methodology. 2010 Dec;10(1):1.

Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine. 2008 Jan 17;358(3):252-60.

Statistics

Face up to false positives D. MacArthur Nature  487  427 EP -  (2012) https://doi.org/10.1038/487427a

Nuzzo, R. Scientific method: Statistical errors. Nature 506, 150–152. (13 February 2014)

Know when your numbers are significant D. L. Vaux Nature  492  180 EP -  (2012) https://doi.org/10.1038/492180a