Demystifying Statistics in Medicine: An Approach to Appraising Randomized Controlled Trials
Randomized controlled trials (RCT) allow scientists to test whether an intervention leads to an outcome of interest. The first clinical trial dates back to 1747 and was conducted by James Lind; although not with the rigor of today, Lind selected 12 participants ravaged with scurvy and allocated in pairs to 6 groups1. The first pair took a quart of cyder every day; the next two took twenty-five drops of elixir vitriol three times a day; two more took two spoonfuls of vinegar three times a day; two others were put on a course of sea-water; two more had two oranges and one lemon given them every day; and the two remaining patients, took an electary recommended by a hospital surgeon. And thus based on Lind’s experiment, we now know that routine use of lemon juice can prevent scurvy.
Implicit in their design is freedom from spurious causality and bias and therefore RCTs are often considered the gold standard for evidence in clinical medicine. Here, I outline a simple 8 step framework to effectively appraise RCT in a clinical setting. Formal analysis of RCTs, however, requires careful consideration and an in-depth review.
Consideration # 1 – Is this the right population?
A key consideration in assessing the validity of results from an RCT is to appraise whether the question posed is appropriately answerable based on the population studied. Let’s consider a hypothetical trial testing the efficacy of a drug in preventing strokes among high-risk individuals. To appropriately answer this question, investigators must define who is at a high risk of having a stroke. Let’s say in this hypothetical study, the investigators have narrowly defined ‘high risk’ as those with a previous TIA. In this scenario, the effect of the drug may be overestimated in a population of patients who are less conservatively defined as ‘high risk’. That is, the results of the study are less applicable to individuals who may be at a high risk of stroke based on clinical risk factors such as family history, cardiovascular disease, diabetes, et cetera, but who haven’t experienced a TIA. Here, the investigators must be careful to apply their findings only to those patients with a previous TIA and carefully extend the benefits of the drug to all those at risk of having a stroke.Consideration # 2 – How are the participants selected?
As other study designs, RCTs are also subject to selection bias, which results when the groups being compared differ in baseline prognostic factors relevant to the outcome; common types of selection bias include2: Volunteer Bias: individuals who volunteer to participate in the study are different from non-volunteers Non-Respondent Bias: responders and non-responders differ Attrition Bias: subjects who drop out of the study differ from those who remain These should be considered in evaluating the validity of RCTs.Consideration # 3 – And, what about the comparison?
A key methodological component of an RCT and one that renders it superior to other research designs is the use of a control condition that is compared to the experimental intervention. The choice of control comparison is largely dependent on clinical knowledge to date. A few major options exist for comparison groups:- No comparator – In this case, participants are randomly assigned to receive either the treatment (e.g. drug) or nothing at all. Outcomes in both groups are compared at the conclusion of the trial. During the trial, however, only the treatment group is actively followed; the comparator group is not assessed. This method allows the investigator to understand whether the new treatment produces benefit above and beyond the passage of time.
- Standard of care – In this scenario, the investigators randomly assign participants to either receive the drug or continue to receive the standard of care. Only this option can answer whether the drug is better than standard treatment.
- Placebo – The participants, as above, are randomly allocated to receive the experimental intervention or a disguised placebo. The key difference between this comparison and the ‘no comparator’ is that for all intents and purposes both arms are treated equally. Like the drug group, the placebo group would also receive a pill and regular follow-up. This comparison can test for whether the new treatment produces benefits beyond attention and regular follow-up from healthcare practitioners.
- Dose – Unlike the others, in this design, the intent is to investigate the optimal dose for treatment and as such the participants will be randomly assigned to different doses of the same treatment.
Consideration # 4 – Type of Randomized Controlled Trials design
Largely, three types of RCTs exist: superiority, equivalence, and non-inferiority. While the specific conditions of each, the statistical consideration, and sample size calculations, can be found in detailed publications3,4, it is important to understand how they differ and implications for the interpretation of results.- Non-inferiority – These trials are conducted to show that the new treatment is at least as effective as the standard treatment
- Equivalence – The main purpose of this design is to ascertain that the new treatment and standard treatment are equally effective
- Superiority – The objective of this type of trial is to verify that a new treatment is more effective than a standard treatment. If not significant, this suggests that the new treatment is not more efficacious than the control treatment by a statistically/clinically relevant amount, but does not specifically comment on whether the treatments are equivalent or non-inferior
Consideration # 5 – Anatomy of the Randomized Controlled Trials
The methodological stability of an RCT depends on the rigour with which it was conducted. Several key principles are always explored in the critical appraisal of the study: Randomized – randomization ensures that each participant has an equal probability of being allocated to the treatment or control group, thus effectively balancing prognostic factors. There are several methods by which this can be accomplished.- Simple randomization is akin to a coin flip. While the process is truly random, in small studies this can create unequal groups. This should be considered in larger studies.
- Blocked randomization allows the investigator to randomize participants to the control (C) and treatment (T) groups in blocks of n. For example, if a block of 4 is used, participants can be allocated to T or C in any of the following orders: TTCC, TCTC, CTCT, TCCT, CTTC, or CCTT. This method ensures that sample size in the two groups is equal, but it may be possible to predict group assignment.
- Stratified randomization can be used if there is an underlying imbalance in the population of interest, i.e. ethnicity, age. The investigator can pre-select strata of interest and then randomize within the strata to ensure representation.
Consideration # 6 – What’s in an outcome?
The crux of an RCT’s utility falls on the outcome assessed. Ideally, there should one central outcome; there may be a few secondary outcomes, but the conclusions of the study should be focused on the central outcome. Outcomes can be primary, intermediate, or composite. For example, if assessing the effect of a drug on coronary artery disease, the investigator can measure an endpoint outcome (number of acute coronary syndromes while on the drug), an intermediate or surrogate measure such as lipids, or a composite endpoint, such as number of MIs, strokes, or deaths while on the drug. Composite outcomes often lend statistical precision by increasing power but may be challenging to interpret5. The statistical significance generally applies to the results as a whole rather than to the individual components of the composite outcome, and as such conclusions about one of the endpoints cannot be reliably made. Generally, these should be avoided. In addition to the types of outcomes used, they should be evaluated for validity, reliability, and diagnostic accuracy. Valid outcomes are those that measure the outcome they intend. Let’s use the example of an investigator who is interested in assessing the effect of an intervention on depression. If the outcome, depression, is established with the use of a questionnaire, the tool must correlate to other measures of that construct such as a structured psychiatric interview. Furthermore, the tool must be reliable. Specifically, it should demonstrate temporal stability (a person diagnosed as depressed at one point using the tool should be identified as such at other times using the same tool) and internally consistent (scores on different questions should be correlated and overall identify the same person as depressed)6. Ultimately, outcomes in an RCT should be considered carefully in the critical appraisal; the efforts undertaken by the investigators are futile if the outcome is inadequately assessed.Consideration # 7 – Statistical significance and beyond
Intention to treat (ITT) vs. Per Protocol Analyses – Once participants have been randomized into the two groups, this randomization must be maintained to draw conclusions of causality between intervention and outcome. ITT represents the analysis of the participants to the group to which they were assigned regardless of whether the participants completed the intervention at the optimal level or at all. By contrast, per protocol analyses may include only those participants who completed the treatment and represents the best case treatment results. However, participants who do not ideally complete the treatment are not guaranteed to be equally distributed between the intervention and control groups, and as a result are likely to introduce bias to the results7. Sample size and power – Much has been written about the appropriate calculation of sample size and power. Overall, the sample size obtained will vary depending on the specific RCT design, threshold of significance, conservativeness of the effect size used, and number of a priori analyses. For details, the following resources should be perused4,8,9. Missing data – Data in RCTs are generally missing by three mechanisms: completely at random (MCAR) where missingness is independent of the intervention group or other measured covariates; missing at random (MAR) where missingness can be accounted for by the measured information and does not depend on unobserved data after conditioning on the measured data. An example is a survey of level of activity among college students. In this experiment, men were less likely than women to not complete the questionnaire, but this does not have anything to do with their intensity of activity (outcome of interest). Lastly, missing not at random (MNAR) is where missingness is dependent on unobserved / unmeasured data after accounting for data that is measured. Several valid techniques can be used to handle missing data; these include complete-case analysis, single and multiple imputation methods, and likelihood based mixed models. The appropriateness of these methods is contingent on the assumption of missingness. Furthermore, careful attention should be paid to the amount of missing data; the more missing data there is, the less reliable imputed results will be10,11.Consideration # 8 – Are the results relevant?
After establishing methodological validity and robustness of the RCT, one should consider whether the results are applicable to your patients. To this end, Akobeng has outlined four questions that can facilitate this assessment12:- Are the participants in the study similar enough to my patients?
- Do the potential side effects of the drug outweigh the benefits?
- Does the treatment conflict with the patient’s values and expectations?
- Is the treatment available and is my hospital prepared to fund it?
- Lind J, Dunn PM. PERINATAL LESSONS FROM THE PAST of Edinburgh and the treatment of scurvy. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1720613/pdf/v076p00F64.pdf. Accessed March 17, 2018.
- Shi Y, Sohani Z, Tang B, Teoderascu F. Essentials of Clinical Examination Handbook.
- Lesaffre E. Superiority, equivalence, and non-inferiority trials. Bull NYU Hosp Jt Dis. 2008;66(2):150-154. http://www.ncbi.nlm.nih.gov/pubmed/18537788. Accessed May 17, 2018.
- Zhong B. How to calculate sample size in randomized controlled trial? J Thorac Dis. 2009;1(1):51-54. http://www.ncbi.nlm.nih.gov/pubmed/22263004. Accessed May 17, 2018.
- Freemantle N, Calvert M, Wood J, Eastaugh J, Griffin C. Composite Outcomes in Randomized Trials. JAMA. 2003;289(19):2554. doi:10.1001/jama.289.19.2554.
- Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use [Paperback]. Fourth Edi. Toronto: Oxford University Press; Fourth Edition edition; 2008. http://www.amazon.ca/Health-Measurement-Scales-practical-development/dp/0199231885. Accessed April 23, 2014.
- West A, Spring B. Randomized Controlled Trials. Evidence Based Behaviour and Practice. doi:10.1002/14651858.CD004690.pub2.
- Sedgwick P. Randomised controlled trials: the importance of sample size. BMJ. 2015;350:h1586. doi:10.1136/BMJ.H1586.
- Walsh M, Srinathan SK, McAuley DF, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. J Clin Epidemiol. 2014;67(6):622-628. doi:10.1016/j.jclinepi.2013.10.019.
- Fiero MH, Huang S, Oren E, Bell ML. Statistical analysis and handling of missing data in cluster randomized trials: a systematic review. Trials. 2016;17:72. doi:10.1186/s13063-016-1201-z.
- Little RJ, D’Agostino R, Cohen ML, et al. The Prevention and Treatment of Missing Data in Clinical Trials. N Engl J Med. 2012;367(14):1355-1360. doi:10.1056/NEJMsr1203730.
- Akobeng AK. Understanding randomised controlled trials. Arch Dis Child. 2005;90(8):840-844. doi:10.1136/adc.2004.058222.
Loading Author...
Sign in or Register to comment