Research Designs: Choosing and Fine-tuning a Design for Your Study
Sportscience 12, 12-21, 2008
Other resources for design. Article, slideshow and spreadsheet for sample-size estimation. Article and slideshow for deciding on which kind of controlled trial is best. Article and spreadsheets for getting balanced assignment of subjects to groups in controlled trials.
Update 2 May 2017: The scatterplots exemplifying moderator and mediator analyses have been transferred to the slideshow on Linear Models and Effect Magnitudes.
Update to 2011: Slides on the different kinds of controlled trial copied from slideshow in the article on Controlled-trial Decision Tree.
Update 6 Aug 2008: A slide on mechanisms in interventions has been added to clarify how to estimate the contribution of a mechanism variable.
My article on research design (Hopkins, 2000) is one of the most popular pages at this site, netting 3000-4000 unique visitors per month, possibly because it is the third or fourth hit in a Google search for “research design”. The article is sound but needed an update. The present article meets that need, in the form of a slideshow on research design. (Related resources at this site, especially for undergraduate or novice researchers: Finding Out What’s Known and Dimensions of Research.)
Some material in the slideshow is based on sections of the first draft of an article on statistical guidelines (Hopkins et al., 2009) that Steve Marshall, Alan Batterham and Juri Hanin co-authored with me. We subsequently deleted the sections on design from the article to make the length acceptable for the intended journal. The sections in question were themselves based mainly on my earlier article, but I acknowledge here the contribution of these colleagues. Some material comes from an article about the different kinds of controlled trial (Batterham and Hopkins, 2005). Estimates of sample size for each design come from my article and spreadsheet on sample size for magnitude-based inferences (Hopkins, 2006). I can point to no other published articles or books that I used to support the assertions in the first draft of this slideshow or in the earlier article. The assertions are either common knowledge amongst researchers and statisticians or are based on my own experience or introspections. I have sometimes checked that my use of jargon and understanding of concepts concur with what other apparent experts state at Wikipedia and other sites. The assertions are also now consistent with references that the reviewer brought to my attention (see below).
The diagrams I have used to explain confounding and mediation in observational studies are simple versions of the so-called directed acyclic graphs (DAGs) that have been used to facilitate understanding of confounding in epidemiology. What appears to be a definitive reference on this topic (Greenland et al., 1999) is probably too difficult for the average researcher (including me) to understand without an unreasonable investment of time. The simpler treatment I have presented here should provide researchers with sufficient understanding to be meticulous about design and analysis of their own observational studies and wary of the confounding that inevitably biases the effects in published observational studies. For a classic reference on such biases, see Taubes (1995).
I devised a similar set of DAG-like diagrams to explain bias in interventions. A figure with imaginary data explaining what happens when you adjust for a covariate in an intervention is similarly original and has certainly helped me to understand the issues.
The PDF reprint version of this article contains the images of the slides, preceded by this text. The slideshow in Powerpoint format is a better learning resource, because the slides build up point by point in full-screen view.
The reviewer (Ian Shrier) identified several minor problems and made comments that led to the following improvements in the slides on inferences about causation: customary use of the term moderator (and its synonym, modifier); a note that some kinds of covariate can create bias (Hernan et al., 2004; Shrier, 2007); and a note that unknown confounders can bias estimates of effects and their mechanisms (Cole and Hernan, 2002). He queried the use of time series, which Batterham and I used for the simplest type of intervention, so I now refer to such designs as pre-post single-group. He also noted that “you have made some over-simplifications for pedagogical purposes, and people should [be advised to] seek help if they are not familiar with the nuances of any particular design.” I agree and have added such advice.
Batterham AM, Hopkins WG (2005). A decision tree for controlled trials. Sportscience 9, 33-39
Cole SA, Hernan MA (2002). Fallibility in estimating direct effects. International Journal of Epidemiology 31, 163-165
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10, 37-48
Hernan MA, Hernandez-Diaz S, Robins JM (2004). A structural approach to selection bias. Epidemiology 15, 615-625
Hopkins WG (2000). Quantitative research design. Sportscience
Hopkins WG (2006). Estimating sample size for magnitude-based inferences. Sportscience 10, 63-70
Hopkins WG, Marshall SW, Batterham AM, Hanin J (2009). Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise (in press)
Shrier I (2007). Understanding causal inference: the future direction in sports injury prevention. Clinical Journal of Sport Medicine 17, 220-224
Taubes G (1995). Epidemiology faces its limits. Science 269, 164-169
Published July 2008