Research paper evaluation checklist

For example, a cohort study can study intervention and comparator groups concurrently, with information about the intervention and comparator collected prospectively PCS or retrospectively RCS , or study one group retrospectively and the other group prospectively HCS. These different kinds of cohort study are conventionally distinguished according to the time when intervention and comparator groups are formed, in relation to the conception of the study.

Evaluating Internet Research Sources

Some studies are sometimes incorrectly termed PCS, in our view, when data are collected prospectively, for example, for a clinical database, but when definitions of intervention and comparator required for the evaluation are applied retrospectively; in our view, this should be an RCS. Some of the study designs described in parts 1 and 2 may seem similar, for example, DID and CBA, although they are labeled differently.

In our view, these labels obscure some of the detailed features of the study designs that affect the robustness of causal attribution. Therefore, we have extended the checklist of features to highlight these differences. Where researchers use the same label to describe studies with subtly different features, we do not intend to imply that one or other use is incorrect; we merely wish to point out that studies referred to by the same labels may differ in ways that affect the robustness of an inference about the causal effect of the intervention of interest. The table also sets out our responses for the range of study designs as described in Box 1 , Box 2.

Evaluating Evidence

Cells in the table are completed with respect to the thumbnail sketches of the corresponding designs described in Box 1 , Box 2. Question 1 is new and addresses the issue of clustering, either by design or through the organizational structure responsible for delivering the intervention Box 3. This question avoids the need for separate checklists for designs based on assigning individual and clusters. Clustering is a potentially important consideration in both RCTs and nonrandomized studies. Clusters exist when observations are nested within higher level organizational units or structures for implementing an intervention or data collected; typically, observations within clusters will be more similar with respect to outcomes of interest than observations between clusters.

Research Paper Checklist audio presentation)

Analyses of clustered data that do not take clustering into account will tend to overestimate the precision of effect estimates. Clustering can also arise implicitly, from naturally occurring hierarchies in the data set being analyzed, that reflect clusters that are intrinsically involved in the delivery of the intervention or comparator. Both explicit and implicit clustering can be present in a single study.


Question 1 in the checklist distinguishes individual allocation, cluster allocation explicit clustering , and clustering due to the organizational hierarchy involved in the delivery of the interventions being compared implicit clustering. Users should respond factually, that is, with respect to the presence of clustering, without making a judgment about the likely importance of clustering degree of dependence between observations within clusters. These questions are designed to tease apart the nature of the research question and the basis for inferring causality.

Question 2 classifies studies according to the number of times outcome assessments were available. In each case, the response items distinguish whether or not the outcome is assessed in the same or different individuals at different times. Treatment effects can be estimated as changes over time or between groups. Question 3 aims to classify studies according to the parameter being estimated. Response items distinguish changes over time for the same or different individuals.

Question 4 asks about the principle through which the primary researchers aimed to control for confounding. Three response items distinguish methods that:. Questions 5—7 are essentially the same as in the original checklist [1] , [2]. Question 5 asks about how groups of individuals or clusters were formed because treatment effects are most frequently estimated from between group comparisons.

An additional response option, namely by a forcing variable, has been included to identify credible quasi-experimental studies that use an explicit rule for assignment based on a threshold for a variable measured on a continuous or ordinal scale or in relation to a spatial boundary. Other, nonexperimental, study designs should be classified by the method of assignment same list of variables but without there being an explicit assignment rule. Question 6 asks about important features of a study in relation to the timing of their implementation.

Question 7 asks about the variables that were measured and available to control for confounding in the analysis. The two broad classes of variables that are important are the identification and collection of potential confounder variables and baseline assessment of the outcome variable s. The answers to this question will be less important if the researchers of the original study used a method to control for any confounding, that is, used a credible quasi-experimental design. The health care evaluation community has historically been much more difficult to win around to the potential value of nonrandomized studies to evaluate interventions.

We think that the checklist helps to explain why, that is, because designs used in health care evaluation do not often control for unobservables when the study features are examined carefully. To the extent that these features are immutable, the skepticism is justified. However, to the extent that studies may be possible with features that promote the credibility of causal inference, health care evaluation researchers may be missing an opportunity to provide high-quality evidence. Reflecting on the circumstances of nonrandomized evaluations of health care and health system interventions may provide some insights why these different groups have disagreed about the credibility of effects estimated in quasi-experimental studies.

The risk of confounding in health care settings is inherently greater because participants' characteristics are fundamental to choices about interventions in usual care; mitigating against this risk requires high-quality clinical data to characterize participants at baseline and, for pharmaco-epidemiological studies about safety, often over time. Important questions about health care for which quasi-experimental methods of evaluation are typically considered are often to do with the outcome of discrete episodes of care, usually binary, rather than long-term outcomes for a cohort of individuals; this can lead to a focus on the invariant nature of the organizations providing the care rather than the varying nature of the individuals receiving care.

These contrasts are apparent between, for example: DID studies using panel data to evaluate an intervention such as CCT among individuals with CBA studies of an intervention implemented at an organizational level studying multiple cross-sections of health care episodes; or credible and less credible interrupted time series.

There is a new article in the field of hospital epidemiology which also highlights various features of what it terms as quasi-experimental designs [56]. The list of features appears to be aimed at researchers designing a quasi-experimental study, acting more as a prompt e. There is some overlap with our checklist, but the list described also includes several study attributes intended to reduce the risk of bias, for example, blinding. By contrast, we consider that an assessment of the risk of bias in a study is essential and needs to be carried out as a separate task.

Critical appraisal notes and checklists

The primary intention of the checklist is to help review authors to set eligibility criteria for studies to include in a review that relate directly to the intrinsic strength of the studies in inferring causality. The checklist should also illuminate the debate between researchers in different fields about the strength of studies with different features—a debate which has to date been somewhat obscured by the use of different terminology by researchers working in different fields of investigation.

Furthermore, where disagreements persist, the checklist should allow researchers to inspect the basis for these differences, for example, the principle through which researchers aimed to control for confounding and shift their attention to clarifying the basis for their respective responses for particular items. Authors' contributions: All three authors collaborated to draw up the extended checklist.

All three authors approved submission of the final manuscript.

Funding: B. R is supported in part by the U. National Center for Biotechnology Information , U. Sponsored Document from. Journal of Clinical Epidemiology. J Clin Epidemiol. Barnaby C. Wells , b and Hugh Waddington c. George A. Author information Article notes Copyright and License information Disclaimer. Reeves: ku. Accepted Feb 6. This article has been cited by other articles in PMC. Abstract Objectives The aim of the study was to extend a previously published checklist of study design features to include study designs often used by health systems researchers and economists. Study Design and Setting Expert consensus meeting.

Results The checklist comprises seven questions, each with a list of response items, addressing: clustering of an intervention as an aspect of allocation or due to the intrinsic nature of the delivery of the intervention; for whom, and when, outcome data are available; how the intervention effect was estimated; the principle underlying control for confounding; how groups were formed; the features of a study carried out after it was designed; and the variables measured before intervention. Conclusion The checklist clarifies the basis of credible quasi-experimental studies, reconciling different terminology used in different fields of investigation and facilitating communications across research communities.

What is new? Introduction There are difficulties in drawing up a taxonomy of study designs to evaluate health care interventions or systems that do not use randomization [1]. Box 1 Thumbnail sketches of quasi-experimental studies used in program evaluations of CCT programs. Randomized controlled trial RCT Individual participants, or clusters of participants, are randomly allocated to intervention or comparator. Quasi-randomized controlled trial Q-RCT Individual participants, or clusters of participants, are allocated to intervention or comparator in a quasi-random manner.

For a credible study, the allocation mechanism should not be known to participants or any personnel responsible for data collection. In rare cases, the unit of analysis will be measured at the disaggregate level i.


Commonly, however, longitudinal data sets are clustered at aggregate levels of care e. Difference study, including difference-in-differences study DID Analysis of a cohort over time, in which no individuals have the intervention at the start and some receive the intervention by the end of the period of study. The typical study is clustered, with some clusters implementing the intervention; data are often also aggregated by cluster, for example, primary care practice.

A key feature of this design is the availability of longitudinal data for the same individuals for the entire period of study; studies that evaluate cluster-aggregated data often ignore changes in the individuals belonging to a cluster over time. Cross-sectional study XS The feature of this study design is that data required to classify individuals according to receipt of the intervention or comparator of interest and according to outcome are collected at the same time.

Common methods of analysis include statistical matching e. A key limitation of this design is the inability to account for unobservable confounding and in some instances reverse causality. Open in a separate window. Box 2 Thumbnail sketches of quasi-experimental study designs used by health care evaluation researchers. Studies are cited which correspond to the way in which we conceive studies described with these labels. This design is the same as the RCT design described in Box 1. The allocation rule may be as good as random but, typically, gives rise to a less credible study compared to health system studies, where the allocation rule is applied by a higher level decision maker ; if allocation is not concealed, research personnel who know the rule can recruit selectively or allocate participants in a biased way.

Introduction to Research | Cornell University Library

This design is essentially the same as the Q-RCT design described in Box 1 but with different mechanisms for allocation. Controlled before-and-after study CBA Study in which outcomes are assessed at two time periods for several clusters usually geographic. Clusters are classified into intervention and comparator groups.

All clusters are studied without the intervention during period 1. Between periods 1 and 2, clusters in the intervention group implement the intervention of interest whereas clusters in the comparator group do not. The outcome for clusters receiving the intervention is compared to the outcome for comparator clusters during period 2, adjusted for the outcomes observed during period 1 when no clusters had had the intervention.