Grading quality of evidence and strength of recommendations GRADE Working Group
Clinical guidelines are only as good as the evidence and judgments they are based on. The GRADE approach aims tomake it easier for users to assess the judgments behind recommendations
Users of clinical practice guidelines and other recommendationsneed to know how much confidence they can place in therecommendations. Systematic and explicit methods of makingjudgments can reduce errors and improve communication. Wehave developed a system for grading the quality of evidence andthe strength of recommendations that can be applied across awide range of interventions and contexts. In this article wepresent a summary of our approach from the perspective of aguideline user. Judgments about the strength of a recommenda-tion require consideration of the balance between benefits andharms, the quality of the evidence, translation of the evidenceinto specific circumstances, and the certainty of the baseline risk. It is also important to consider costs (resource utilisation) beforemaking a recommendation. Inconsistencies among systems for
ing the complex judgments that go into clinical practice
grading the quality of evidence and the strength of
guidelines or other healthcare recommendations, either
recommendations reduce their potential to facilitate critical
implicitly or explicitly. To achieve simplicity in our presentation
appraisal and improve communication of these judgments. Our
we do not discuss all the nuances or provide detailed guidance
system for guiding these complex judgments balances the need
that guideline panels would need to apply our approach. This
for simplicity with the need for full and transparent
can be obtained from the authors (www.GradeWorking-
consideration of all important issues.
A systematic and explicit approach to making judgments
Introduction
about the quality of evidence and the strength of recommenda-tions can help to prevent errors, facilitate critical appraisal of
Judgments about evidence and recommendations are complex.
these judgments, and can help to improve communication of this
Consider, for example, the choice between selective serotonin
information. Since the 1970s a growing number of organisations
reuptake inhibitors and tricyclic antidepressants for the
have employed various systems to grade the quality (level) of evi-
treatment of moderate depression. Clinicians must decide which
dence and the strength of recommendations.1–28 Unfortunately,
outcomes to consider, which evidence to include for each
different organisations use different systems to grade the quality
outcome, how to assess the quality of that evidence, and how to
of evidence and the strength of recommendations. The same evi-
determine if selective serotonin reuptake inhibitors do more
dence and recommendation could be graded as II-2, B; C+, 1; or
good than harm compared with tricyclics. Because resources are
strong evidence, strongly recommended depending on which
always limited and money that is spent on selective serotonin
system is used. This is confusing and impedes effective commu-
reuptake inhibitors cannot be used elsewhere, they may also
need to decide whether any incremental health benefits are
The GRADE Working Group began as an informal collabo-
ration of people with an interest in tackling the shortcomings of
It is not practical for individual clinicians and patients to
present grading systems. Table 1 summarises these shortcomings
make these judgments unaided for each clinical decision.
and the ways in which we have overcome them. The GRADE sys-
Clinicians and patients commonly use clinical practice
tem enables more consistent judgments, and communication of
guidelines as a source of support—that is, recommendations that
such judgments can support better informed choices in health
have been systematically developed by panels of people with
care. Box 1 shows the steps in developing and implementing
access to the available evidence, an understanding of the clinical
guidelines from prioritising problems through evaluating their
problem and research methods, and sufficient time for reflection.
implementation. We focus here on grading the quality of
Users of systematically developed guidelines need to know
evidence and strength of recommendations.
how much confidence they can place in evidence andrecommendations. We describe the factors on which ourconfidence should be based and a systematic approach for mak-
Members of GRADE Working group are listed at the end of this articleBMJ VOLUME 328 19 JUNE 2004 Definitions
reuptake inhibitors and tricyclic antidepressants), for whom(moderately depressed adult patients), and in what setting
We have used the following definitions: the quality of evidence
indicates the extent to which one can be confident that anestimate of effect is correct. The strength of a recommendationindicates the extent to which one can be confident that
Quality of evidence for each important outcome
adherence to the recommendation will do more good than
A systematic review of available evidence should guide these
judgments. Reviewers should consider four key elements: study
Judgments about the quality of evidence require assessments
design, study quality, consistency, and directness.
of the validity of the results of individual studies for importantoutcomes. Explicit criteria should be used in making these
Study design
judgments.26 29–32 The steps in our approach, which follow these
Study design refers to the basic study design, which we have
judgments, are to make sequential judgments about:
broadly categorised as observational studies and randomised tri-
The quality of evidence across studies for each important out-
als. Both logical arguments and empirical evidence support
this.33–36 Although observational studies commonly have results
Which outcomes are critical to a decision
that are similar to those of randomised trials, this is not always
The overall quality of evidence across these critical outcomes
the case. One dramatic example of such a discrepancy is the dif-
ferent results of observational studies that suggested hormone
replacement therapy decreased the risk of coronary heart
All of these judgments depend on having a clearly defined
disease and subsequent randomised trials that found no
question and considering all of the outcomes that are likely to be
reduction in risk and even an increased risk.37 38 Unfortunately, it
important to those affected. The question should identify which
is not possible to know in advance whether observational studies
options are being compared (for example, selective serotonin
accurately predict the findings of subsequent randomised trials. Once the results of high quality randomised trials are available,few people would argue for continuing to base recommenda-tions on non-randomised studies with discrepant results. Box 1: Sequential process for developing guidelines
On the other hand, randomised trials are not always feasible
and, in some instances, observational studies may provide better
First steps 1. Establishing the process—For example, prioritising problems,
evidence, as is generally the case for rare adverse effects. Moreo-
selecting a panel, declaring conflicts of interest, and agreeing on
ver, the results of randomised trials may not always be
applicable—for example, if the participants are highly selectedand motivated relative to the population of interest. It is
Preparatory steps 2. Systematic review—The first step is to identify and critically
therefore essential to consider study quality, the consistency of
appraise or prepare systematic reviews of the best available
results across studies, and the directness of the evidence, as well
as the appropriateness of the study design. So, for example, well
3. Prepare evidence profile for important outcomes—Profiles are
designed case series may provide high quality evidence for com-
needed for each subpopulation or risk group, based on the
plication rates from surgery or procedures, such as intraopera-
results of systematic reviews, and should include a quality
tive deaths or perforations after colonoscopy, which is more
directly relevant than evidence from randomised trials. Similarly,
Grading quality of evidence and strength of
cohort studies can provide high quality evidence for rates of
recommendations
recall or procedures precipitated by false positive screening
4. Quality of evidence for each outcome—Judged on information
results, such as biopsy rates after mammography.
summarised in the evidence profile and based on the criteria intable 25. Relative importance of outcomes—Only important outcomes
Study quality
should be included in evidence profiles. The included outcomes
Study quality refers to the detailed study methods and execution.
should be classified as critical or important (but not critical) to a
Reviewers should use appropriate criteria to assess study quality
for each important outcome.26 29–32 For randomised trials, for
6. Overall quality of evidence—The overall quality of evidence
example, reviewers might use criteria such as the adequacy of
should be judged across outcomes based on the lowest quality of
allocation concealment, blinding, and follow up. Reviewers
evidence for any of the critical outcomes.
should make explicit their reasons for downgrading a quality rat-
7. Balance of benefits and harms—The balance of benefits andharms should be classified as net benefits, trade-offs, uncertain
ing. For example, they may state that failure to blind patients and
trade-offs, or no net benefits based on the important health
physicians reduced the quality of evidence for an intervention’s
impact on pain severity and that they considered this a serious
8. Balance of net benefits and costs—Are incremental health benefits
worth the costs? Because resources are always limited, it isimportant to consider costs (resource utilisation) when making a
Consistency
Consistency refers to the similarity of estimates of effect across
9. Strength of recommendation—Recommendations should beformulated to reflect their strength—that is, the extent to which
studies. If there is important unexplained inconsistency in the
one can be confident that adherence will do more good than
results, our confidence in the estimate of effect for that outcome
decreases. Differences in the direction of effect, the size of thedifferences in effect, and the significance of the differences guide
Subsequent steps 10. Implementation and evaluation—For example, using effective
the (inevitably somewhat arbitrary) decision about whether
implementation strategies that address barriers to change,
important inconsistency exists. Separate estimates of magnitude
evaluation of implementation, and keeping up to date
of effect for different subgroups should follow when investiga-tors identify a compelling explanation for inconsistency. For
BMJ VOLUME 328 19 JUNE 2004 Table 1 Comparison of GRADE and other systems Other systems Advantages of GRADE system*
Implicit definitions of quality (level) of evidence and
Makes clear what grades indicate and what should
Implicit judgments regarding which outcomes are
Clarifies each of these judgments and reduces risks
important, quality of evidence for each important
of introducing errors or bias that can arise when
outcome, overall quality of evidence, balance
between benefits and harms, and value ofincremental benefits
Not considered for each important outcome.
Systematic and explicit consideration of study
Ensures these factors are considered appropriately
Judgments about quality of evidence are often
design, study quality, consistency, and directness
of evidence in judgments about quality of evidence
Explicit consideration of imprecise or sparse data,
reporting bias, strength of association, evidence ofa dose-response gradient, and plausibleconfounding
Implicitly based on the quality of evidence for
Based on the lowest quality of evidence for any of
Reduces likelihood of mislabelling overall quality of
the outcomes that are critical to making a decision
evidence when evidence for a critical outcome islacking
Explicit judgments about which outcomes are
Ensures appropriate consideration of each outcome
critical, which ones are important but not critical,
when grading overall quality of evidence and
and which ones are unimportant and can be
Explicit consideration of trade-offs between
Clarifies and improves transparency of judgments
important benefits and harms, the quality of
evidence for these, translation of evidence intospecific circumstances, and certainty of baselinerisks
Explicit consideration after first considering whether
Ensures that judgments about value of net health
Consistent GRADE evidence profiles, including
Ensures that all panel members base their
quality assessment and summary of findings
judgments on same information and that thisinformation is available to others
Seldom used by more than one organisation and
International collaboration across wide range of
Builds on previous experience to achieve a system
organisations in development and evaluation
that is more sensible, reliable, and widely applicable
*Most other approaches do not include any of these advantages, although some may incorporate some of these advantages.
instance, differences in the effect of carotid endarterectomy on
who have had a myocardial infarction as a surrogate for mortal-
high and lower grade stenoses should lead to separate estimates
ity,41 changes in lipoproteins as a surrogate for coronary heart
disease,37 and bone density in postmenopausal women as a sur-rogate for fracture reduction.42
Directness
The accuracy of a diagnostic test is also a surrogate for
Directness refers to the extent to which the people, interventions,
important outcomes that might be affected by accurate
and outcome measures are similar to those of interest. For exam-
diagnosis, including improved health outcomes from appropri-
ple, there may be uncertainty about the directness of the
ate treatment and reduced harms from false positive results. Dif-
evidence if the people of interest are older, sicker, or have more
ferent criteria must be used when considering study design for
comorbidity than those in the studies.39 To determine whetherimportant uncertainty exists, we can ask whether there is a com-
studies of diagnostic accuracy. However, consideration of the
pelling reason to expect important differences in the size of the
directness of evidence is based on how confident we are of the
effect. Because many interventions have more or less the same
relation between being classified correctly (as a true positive or
relative effects across most patient groups, we should not apply
negative) or incorrectly (as a false positive or negative) and
overly stringent criteria in deciding whether evidence is direct.
important consequences of this. For example, there is consistent
For some therapies—for example, behavioural interventions in
evidence from well designed studies that there are fewer false
which cultural differences are likely to be important—more strin-
negative results with non-contrast helical computed tomography
than with intravenous pyelography in the diagnosis of suspected
Similarly, reviewers may identify uncertainty about the
acute urolithiasis.43 However, there is major uncertainty about
directness of evidence for drugs that differ from those in the
whether this has important health consequences.44 Because of
studies but are within the same class. Similar issues arise for other
this, the quality of this evidence could be considered low for
types of interventions. For instance, can you generalise results to
a less intense counselling intervention than that used in a study,
Another type of indirect evidence arises when there are no
or to an alternative surgical technique? These judgments can be
direct comparisons of interventions and investigators must make
difficult,40 and it is important for investigators to explain the
comparisons across studies. For example, this would be the case
rationale for the conclusions that they draw.
if there were randomised trials that compared selective serotonin
On the other hand, studies using surrogate outcomes gener-
reuptake inhibitors with placebo and tricyclics with placebo, but
ally provide less direct evidence than those using outcomes that
no trials that compared selective serotonin reuptake inhibitors
are important to people. It is therefore prudent to use much
with tricyclics. Indirect comparisons always leave greater
more stringent criteria when considering the directness of
uncertainty than direct comparisons because of all the other dif-
evidence for surrogate outcomes. Examples of indirect evidence
ferences between studies that can affect the results.45
based on surrogate outcomes that subsequent trials showed to bemisleading include suppression of cardiac arrhythmia in patients
BMJ VOLUME 328 19 JUNE 2004
factors that were not adjusted for in studies comparing mortalityrates of for-profit and not-for-profit hospitals would have
Box 2: Criteria for assigning grade of evidence
reduced the observed effect.48 Thus, the evidence that for-profit
Type of evidence
hospitals have a higher risk of mortality is more convincing.)
These considerations act cumulatively. For example, if
randomised trials have both serious limitations and there is
uncertainty about the directness of the evidence, the grade of
Decrease grade if:
evidence would drop from high to low.
• Serious ( − 1) or very serious ( − 2) limitation to study quality
The same rules should be applied to judgments about the
quality of evidence for harms and benefits. Important plausible
• Some ( − 1) or major ( − 2) uncertainty about directness
harms can and should be included in evidence summaries by
considering the indirect evidence that makes them plausible. For
• High probability of reporting bias ( − 1)
example, if there is concern about anxiety in relation to screen-ing for melanoma and no direct evidence is found, it may be
Increase grade if:
appropriate to consider evidence from studies of other types of
• Strong evidence of association—significant relative risk of > 2
( < 0.5) based on consistent evidence from two or moreobservational studies, with no plausible confounders (+1)46
Judgments about the quality of evidence for important
• Very strong evidence of association—significant relative risk of
outcomes across studies can and should be made in the context
> 5 ( < 0.2) based on direct evidence with no major threats to
of systematic reviews, such as Cochrane reviews. Judgments
about the overall quality of evidence, trade-offs, and recommen-
• Evidence of a dose response gradient (+1)
dations typically require information beyond the results of a
• All plausible confounders would have reduced the effect (+1)
Overall quality of evidence Combining the four components The quality of evidence for each main outcome can be
Other systems have commonly based judgments of the overall
determined after considering each of the above elements: study
quality of evidence on the quality of evidence for the benefits of
design, study quality, consistency, and directness. Our approach
interventions. When the risk of an adverse effect is critical for a
initially categorises evidence based on study design into
judgment, and evidence regarding that risk is weaker than
randomised trials and observational studies (cohort studies,
evidence of benefit, ignoring uncertainty about the risk of harm
case-control studies, interrupted time series analyses, and
is problematic. We suggest that the lowest quality of evidence for
controlled before and after studies). We then suggest considering
any of the outcomes that are critical to making a decision should
whether the studies have serious limitations, important
provide the basis for rating overall quality of evidence.
inconsistencies in the results, or whether uncertainty about the
Outcomes that are important, but not critical, should be
directness of the evidence is warranted (box 2). We suggest the
included in evidence profiles and should be considered when
following definitions in grading the quality of the evidence:
making judgments about the balance between health benefits and
High = Further research is very unlikely to change our confi-
harms but should not be taken into consideration when grading
the overall quality of evidence. Deciding whether an outcome is
Moderate = Further research is likely to have an important
critical, important but not critical, or not important is a value judg-
impact on our confidence in the estimate of effect and may
ment. So far as possible these judgments should take account of
Low = Further research is very likely to have an important
impact on our confidence in the estimate of effect and is likely to
Box 3: Imprecise or sparse data Very low = Any estimate of effect is very uncertain.
There is not an empirical basis for defining imprecise or sparse
Limitations in study quality, important inconsistency of
results, or uncertainty about the directness of the evidence can
• Data are sparse if the results include just a few events or
lower the grade of evidence. For instance, if all available studies
have serious limitations, the grade will drop by one level, and if
• Data are imprecise if the confidence intervals are sufficientlywide that an estimate is consistent with either important harms
all studies have very serious limitations the grade will drop by
two levels. Fatally flawed studies may be excluded.
These different definitions can result in different judgments.
Additional considerations that can lower the quality of
Although it may not be possible to reconcile these differences, we
evidence include imprecise or sparse data (box 3) and high risk
offer the following guidance when considering whether to
of reporting bias. Additional considerations that can raise the
downgrade the quality of evidence due to imprecise or sparse
A very strong association (for example, a 50-fold risk of
The threshold for considering data imprecise or sparse should
poisoning fatalities with tricyclic antidepressants compared with
be lower when there is only one study. A single study with a smallsample size (or few events) yielding wide confidence intervals
selective serotonin reuptake inhibitors, see table 2) or strong
spanning both the potential for harm and benefit should be
association (for example, a threefold increased risk of head inju-
ries among cyclists who do not use helmets compared with those
• Confidence intervals that are sufficiently wide that, irrespective
of other outcomes, the estimate is consistent with conflicting
recommendations should be considered as imprecise or sparse
Presence of all plausible residual confounding would have
reduced the observed effect. (For example, plausible explanatory
BMJ VOLUME 328 19 JUNE 2004
the values of those who will be affected by adherence to
Net benefits = the intervention clearly does more good than
The decision regarding what is critical can be difficult. The
Trade-offs = there are important trade-offs between the ben-
plausibility of adverse outcomes may influence the decision
regarding whether they are critical. Weak evidence about
Uncertain trade-offs = it is not clear whether the interven-
implausible putative harms should not lower the overall grade of
evidence. Decisions about whether a putative harm is plausible
No net benefits = the intervention clearly does not do more
may come from indirect evidence. For example, if there is impor-
tant concern about serious adverse effects of a drug because of
Those making a recommendation should consider four main
animal studies, the overall quality of evidence may receive a
lower grade based on whatever human evidence is available for
The trade-offs, taking into account the estimated size of the
that particular adverse effect. Sometimes lack of evidence for
effect for the main outcomes, the confidence limits around those
plausible putative harms may make it impossible to assess the net
estimates, and the relative value placed on each outcome
benefit of an intervention. In these circumstances a guideline
panel may elect to recommend additional research.
Translation of the evidence into practice in a specific setting,
If the evidence for all of the critical outcomes favours the
taking into consideration important factors that could be
same alternative, and there is high quality evidence for some, but
expected to modify the size of the expected effects, such as prox-
not all, of those outcomes, the overall quality of evidence might
imity to a hospital or availability of necessary expertise
still be considered high. For example, there is high quality
Uncertainty about baseline risk for the population of interest.
evidence that antiplatelet therapy reduces the risk of non-fatal
If there is uncertainty about translating the evidence into
stroke and non-fatal myocardial infarction in patients who have
practice in a specific setting, or uncertainty about baseline risk,
had a myocardial infarction. Although the evidence for all-cause
this may lower our confidence in a recommendation. For exam-
mortality is of moderate quality, the overall quality of evidence
ple, if an intervention has serious adverse effects as well as
might still be considered high, even if all cause mortality wasconsidered a critical outcome. Box 4: Values are not right or wrong Recommendations
The following example shows how different people might make
Does the intervention do more good than harm?
different recommendations because of differences in values, even
Recommendations involve a trade-off between benefits and
harms. Making that trade-off inevitably involves placing,
Question: Should the general population be screened for
implicitly or explicitly, a relative value on each outcome. It is
often difficult to judge how much weight to give to different out-
Setting: Primary care in the United States Baseline risk: General population (melanoma incidence in 1995
comes, and different people will often have different values. Peo-
ple making judgments on behalf of others are on stronger
Reference: Helfand et al. Screening for skin cancer. Systematic
ground if they have evidence of the values of those affected. For
instance, people making recommendations about chemotherapy
Rockville, MD: Agency for Healthcare Research and Quality.
for women with early breast cancer will be in a stronger position
April 2001. (AHRQ Publication No 01-S002.)
if they have evidence about the relative importance those women
There is very low quality evidence for the accuracy of screeningand for the outcome of lethal melanoma. Potential harms from
place on reducing the risk of a recurrence of breast cancer rela-
screening include the consequences of false positive tests, but
tive to avoiding the side effects of chemotherapy.
evidence regarding these is lacking. Based on this it is possible to
We suggest making explicit judgments about the balance
conclude that the overall quality of evidence is very low and that
between the main health benefits and harms before considering
there are uncertain net benefits from screening. Based on a
costs. Does the intervention do more good than harm? Recom-
single case-control study, the odds ratio for lethal melanoma was
mendations must apply to specific settings and particular groups
estimated to be 0.37 for screened versus not screened people.
of patients whenever the benefits and harms differ across settings
The lifetime risk of dying of melanoma was estimated to be0.36% for white men.
or patient groups. For instance, consider whether we should rec-
Based on this evidence, many people might make a
ommend that patients with atrial fibrillation receive warfarin to
recommendation of “don’t screen” because of placing a high
reduce their risk of stroke, despite the increase in bleeding risk
value on avoiding the potential but unknown harms of screening
that will result. Recommendations, or their strength, are likely to
healthy people relative to the uncertain benefits. However, some
differ in settings where regular monitoring of the intensity of
people might recommend “probably screen” because of placing a
anticoagulation is available and settings where it is not. Further-
high value on the small but potentially important benefits ofscreening relative to the unknown potential harms. Under these
more, recommendations (or their strength) are likely to differ in
circumstances, after taking into consideration costs, a panel
patients at very low risk of stroke (those under 65 without any
developing guidelines might elect not to make a
comorbidity) and patients at higher risk (such as older patients
recommendation for clinical practice and to make a specific
with heart failure) because of differences in the absolute
recommendation regarding the research that is needed to reduce
reduction in risk. Recommendations must therefore be specific
uncertainty and clarify the trade-offs.
to a patient group, and a practice setting. It is particularly impor-
This example is typical of the value judgments that underlierecommendations about screening, but the same issues arise in
tant to consider the circumstances of disadvantaged populations
making recommendations about treatment for both acute and
when making recommendations and, when appropriate, modify
chronic conditions, where it is always necessary to balance the
recommendations to take into consideration differences
expected benefits against the expected harms in light of the
between advantaged and disadvantaged populations.
relative values attached to each important outcome, and
We suggest using the following definitions to categorise the
BMJ VOLUME 328 19 JUNE 2004 Table 2 Quality assessment of trials comparing selective serotonin reuptake inhibitors (SSRIs) with tricyclic antidepressants for treatment of moderate depression in primary care2 Quality assessment Summary of findings No of patients modifying Relative No of studies Consistency Directness factors* Tricyclics Absolute Importance Depression severity (measured with Hamilton Depression Rating Scale after 4 to 12 weeks) Transient side effects resulting in discontinuation of treatment Poisoning fatalities§
WMD = weighted mean difference, RRR = relative risk reduction. *Imprecise or sparse data, a strong or very strong association, high risk of reporting bias, evidence of a dose-response gradient, effect of plausible residual confounding. †There was uncertainty about the directness of the outcome measure because of the short duration of the trials. ‡It is possible that people at lower risk were more likely to have been given SSRIs and it is uncertain if changing antidepressant would have deterred suicide attempts. §There is uncertainty about the baseline risk for poisoning fatalities.
important benefits, a recommendation is likely to be much less
monetary value of resources used—are important considerations
certain when the baseline risk of the population of interest is
in making recommendations, but they are context specific,
change over time, and their magnitude may be difficult to
We suggest using the following categories for recommenda-
estimate. While recognising the difficulty of making accurate
estimates of costs, we suggest that the incremental costs of
“Do it” or “don’t do it”—indicating a judgment that most well
healthcare alternatives should be considered explicitly alongside
the expected health benefits and harms. When relevant and
“Probably do it” or “probably don’t do it”—indicating a judgment
available, disaggregated costs (differences in use of resources)
that a majority of well informed people would make but a
should be presented in evidence profiles along with important
outcomes. The quality of the evidence for differences in use of
A recommendation to use or withhold an intervention does
resources should be graded by using the criteria outlined above
not mean that all patients should be treated identically. Nor does
it mean that clinicians should not involve patients in the decision,or explain the merits of the alternatives. However, because mostwell informed patients will make the same choice, theexplanation of the relative merits of the alternatives may be rela-
How it works in practice
tively brief. A recommendation is intended to facilitate an
Table 2 shows an example of the system applied to evidence
appropriate decision for an individual patient or a population. It
from a systematic review comparing selective serotonin reuptake
should therefore reflect what people would likely choose, based
inhibitors with tricyclic antidepressants conducted in 1997.49
on the evidence and their own values or preferences in relation
After discussion we agreed that there was moderate quality
to the expected outcomes. A recommendation to “probably do
evidence for the relative effects of selective serotonin reuptake
something” indicates a need for clinicians to more fully and care-
inhibitors and tricyclic antidepressants on depression severity
fully consider patients’ values and preferences when offering
and poisoning fatalities and high quality evidence for transient
side effects. We then reached agreement that the overall quality
In some instances it may not be appropriate to make a recom-
of evidence was moderate and that there were net benefits in
mendation because of unclear trade-offs or lack of agreement (as
favour of selective serotonin reuptake inhibitors (no difference in
illustrated in box 4). When this is due to a lack of good quality evi-
depression severity, fewer transient side effects, and fewer
dence, specific research should be recommended that would pro-
poisoning fatalities). Despite agreement that there seemed to be
vide the evidence that is needed to inform a recommendation.
net benefits we concluded with a recommendation to “probably”
Are the incremental health benefits worth the costs?
use selective serotonin reuptake inhibitors, reflecting uncertainty
Because spending money on one intervention means less money
because of the quality of the evidence. We did not have evidence
to spend on another, recommendations rely, implicitly if not
of the costs of using selective serotonin reuptake inhibitors com-
explicitly, on judgments about the value of the incremental
pared with tricyclics for this exercise. Had we considered costs
health benefits in relation to the incremental costs. Costs—the
this recommendation might have changed. BMJ VOLUME 328 19 JUNE 2004
Sackett DL. Rules of evidence and clinical recommendations on the use of antithrom-botic agents. Arch Intern Med 1986:146:464-5.
Sackett DL. Rules of evidence and clinical recommendations on the use of antithrom-
Summary points
botic agents. Chest 1989;95:2-4S.
Cook DJ, Guyatt GH, Laupacis A, Sackett DL. Rules of evidence and clinicalrecommendations on the use of antithrombotic agents. Antithrombotic therapy
Organisations have used various systems to grade the
consensus conference. Chest 1992;102(suppl 4):305-11S.
quality of evidence and strength of recommendations
US Department of Health and Human Services, Public Health Service, Agency HealthCare Policy and Research. Acute pain management: operative or medical procedures andtrauma. Rockville, MD: Agency for Health Care Policy and Research Publications, 1992.
Differences and shortcomings in these grading systems can
be confusing and impede effective communication
Gyorkos TW, Tannenbaum TN, Abrahamowicz M, Oxman AD, Scott EA, Millson ME,et al. An approach to the development of practice guidelines for community healthinterventions. Can J Public Health 1994;85(suppl 1):S8-13.
A systematic and explicit approach to making judgments
Hadorn DC, Baker D. Development of the AHCPR-sponsored heart failure guideline:
about the quality of evidence and the strength of
methodologic and procedural issues. Jt Comm J Qual Improv 1994;20:539-54.
Cook DJ, Guyatt GH, Laupacis A, Sackett DL, Goldberg RJ. Clinical recommendations
using levels of evidence for antithrombotic agents. Chest 1995;108(suppl 4):227-30S.
10 Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook DJ, Cook RJ, et al. Users’ guide to
the medical literature IX: a method for grading health care recommendations. JAMA
The approach takes into account study design, study
quality, consistency and directness in judging the quality of
11 Scottish Intercollegiate Guidelines Network (SIGN). Forming guideline recommenda-
tions. In: A guideline developers’ handbook. Edinburgh: SIGN, 2001. (Publication No 50.)
www.sign.ac.uk/guidelines/fulltext/50/section6.html (accessed 8 Feb 2004).
12 US Preventive Services Task Force. Guide to clinical preventive services. 2nd ed. Baltimore:
The balance between benefits and harms, quality of
Williams and Wilkins, 1996:xxxix-lv.
13 Eccles M, Clapp Z, Grimshaw J, Adams PC, Higgins B, Purves I, et al. North of England
evidence, applicability, and the certainty of the baseline risk
evidence based guidelines development project: methods of guideline development.
are all considered in judgments about the strength of
14 Centro per la Valutazione della Efficacia della Assistenza Sanitaria (CeVEAS).Schema
http://web1.satcom.it/interage/ceveas/html/doc/45/
15 Guyatt G, Schünemann H, Cook D, Jaeschke R, Pauker S, Bucher H. Grades of recom-
mendation for antithrombotic agents. Chest 2001;119:3S-7S. www.chestjournal.org/content/vol119/1_suppl/ (accessed 8 Feb 2004). Conclusions
16 Phillips B, Ball C, Sackett D, Badenoch D, Straus S, Haynes B, Dawes M. Levels of evi-
dence and grades of recommendations. Oxford: Oxford Centre for Evidence-Based
In any system that might be used to grade the quality of evidence
Medicine. www.cebm.net/levels_of_evidence.asp (accessed 8 Feb 2004).
17 National Health and Medical Research Council. How to use the evidence: assessment
and strength of recommendations there is a need to balance
and application of scientific evidence. Canberra: AusInfo, 2000. www.health.gov.au/
simplicity and clarity. Reducing the complexity of a system is also
nhmrc/publications/pdf/cp69.pdf (accessed 8 Feb 2004).
likely to reduce clarity, since judgments are more likely to be
18 Harbour R, Miller J. A new system for grading recommendations in evidence based
guidelines. BMJ 2001;323:334-6.
made implicitly rather than explicitly in simple systems. On the
19 Roman SH, Silberzweig SB, Siu AL. Grading the evidence for diabetes performance
other hand, efforts to improve clarity and make judgments more
measures. Eff Clin Pract 2000;3:85-91.
20 Woloshin S. Arguing about grades. Eff Clin Pract 2000;3:94-5.
transparent are likely to result in more complexity. In the system
21 Guyatt GH, Schünemann H, Cook D, Pauker S, Sinclair J, Bucher H, et al. Grades of
described here we have attempted to find a balance between
recommendation for antithrombotic agents. Chest 2001;119: 3-7S.
22 Atkins D, Best D, Shapiro EN, eds. Third US Preventive Services Task Force:
simplicity and clarity. Regardless of how simple or complex a
background, methods and first recommendations. Am J Prev Med 2001;20:3(suppl):1-
system is, judgments are always required. The approach that we
23 Woolf SH, Atkins D. The evolving role of prevention in health care: contributions of the
have described provides a framework for structured reflection
US Preventive Services Task Force. Am J Prev Med 2001;20:3(suppl):13-20.
and can help to ensure that appropriate judgments are made,
24 Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, et al. Current
methods of the US Preventive Services Task Force: a review of the process. Am J Prev
but it does not remove the need for judgment.
25 Briss PA, Zaza S, Pappaioanou M, Fielding J, Wright-De Aguero L, et al. Developing an
Members of the Grades of Recommendation Assessment, Development
evidence-based guide to community preventive services—methods. Am J Prev Med
and Evaluation (GRADE) Working Group who have contributed to this
article include David Atkins, Dana Best, Peter A Briss, Martin Eccles, Yngve
26 Zaza S, Wright-De A, Briss PA, Truman BI, Hopkins DP, Hennessy MH, et al. Data col-
Falck-Ytter, Signe Flottorp, Gordon H Guyatt, Robin T Harbour, Margaret
lection instrument and procedure for systematic reviews in the guide to communitypreventive services. Am J Prev Med 2000;18(suppl 1):44-74.
C Haugh, David Henry, Suzanne Hill, Roman Jaeschke, Gillian Leng, Ales-
27 Greer N, Mosser G, Logan G, Halaas GW. A practical approach to evidence grading. Jt
sandro Liberati, Nicola Magrini, James Mason, Philippa Middleton, Jacek
Comm J Qual Improv 2000;26:700-12.
Mrukowicz, Dianne O’Connell, Andrew D Oxman, Bob Phillips, Holger J
28 West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, et al. Systems to rate the strength
Schünemann, Tessa Tan-Torres Edejer, Helena Varonen, Gunn E Vist, John
of scientific evidence. Rockville, MD: Agency for Healthcare Research and Quality,
2002:64-88. (AHRQ publication No 02-E016.)
29 Guyatt G, Drummond R, eds. Users’ guide to the medical literature. Chicago, IL: AMA
The National Institute for Clinical Excellence (NICE) for England and
Wales and the Polish Institute for Evidence-Based Medicine (PIEBM) have
30 Clarke M, Oxman AD, eds. Assessment of study quality. Cochrane reviewers’ handbook
provided support for meetings of the GRADE Working Group. The institu-
4.1.5 section 6. In: Cochrane Library. Issue 4. Oxford: Update Software, 2002.
tions with which members of the Working Group are affiliated have
31 Jüni P, Altman DG, Egger M. Assessing the quality of randomised controlled trials. In:
Egger M, Davey Smith G, Altman DG, eds. Systematic reviews in health care: meta-analysis
provided intramural support. Alessandro Liberati’s participation in
in context. London: BMJ Books, 2001:87-121.
GRADE activities was supported by a grant from the Ministero Università e
32 West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, et al. Systems to rate the strength
Ricerca Scientifica (M.I.U.R., Progetto COFIN 2001). of scientific evidence. Rockville, MD: Agency for Healthcare Research and Quality,2002:51-63. (AHRQ publication No 02-E016.)
Contributors: All of the members of the GRADE Working Group listed
33 Kunz R, Vist G, Oxman AD. Randomisation to protect against selection bias in health-
above have contributed to the preparation of this manuscript and the
care trials (Cochrane methodology review). In: Cochrane Library Issue 4. Oxford:
development of the ideas contained in it, participated in at least one meet-
ing, and read and commented on drafts of this article. GHG and ADO led
34 Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, et al. Com-
the process. GEV has had primary responsibility for preparing the evidence
parison of evidence of treatment effects in randomized and nonrandomized studies.
profiles used in the pilot study and coordinating the process.
35 Kleijnen J, Gøtzsche P, Kunz RA, Oxman AD, Chalmers I. So what’s so special about
Competing interests: Most of the members of the GRADE Working Group
randomisation? In: Chalmers I, Maynard A, eds. Non-random reflections on health care
have a vested interest in another system of grading the quality of evidence
research: on the 25th anniversary of Archie Cochrane’s effectiveness and efficiency. London:
and the strength of recommendations.
36 Lacchetti C, Guyatt G. Surprising results of randomized controlled trials. In: Guyatt G,
Drummond R, eds. Users’ guide to the medical literature. Chicago, IL: AMA Press,
Canadian Task Force on the Periodic Health Examination. The periodic health exami-
nation. CMAJ 1979;121:1193-254.
37 Hulley S, Grady D, Bush T, Furberg C, Herrington D, Riggs B, et al. Randomized trial of
Sackett DL. Rules of evidence and clinical recommendations on the use of antithrom-
estrogen plus progestin for secondary prevention of coronary heart disease in
botic agents. Chest 1986;89(suppl 2):2-3S.
postmenopausal women. JAMA 1998;280:605-13. BMJ VOLUME 328 19 JUNE 2004
38 Writing Group for the Women’s Health Initiative Investigators. Risks and benefits of
45 Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimat-
estrogen plus progestin in healthy postmenopausal women. Principal results from the
ing efficacy of competing interventions: evidence from published meta-analyses. BMJ
women’s health initiative randomized controlled trial. JAMA 2002;288:321-33.
39 Dans A, McAlister F, Dans L, Richardson WS, Straus S, Guyatt G. Applying results in
46 Bross IDJ. Pertinency of an extraneous variable. J Chron Dis 1967;20: 487-95.
individual patients. In: Guyatt G, Drummond R, eds. Users’ guide to the medical literature.
47 Thompson DC, Rivara FP, Thompson R. Helmets for preventing head and facial inju-
Chicago, IL: AMA Press, 2002:369-84.
ries in bicyclists. Cochrane Database Syst Rev 2000;(2):CD001855.
40 McAlister F, Laupacis A, Wells G. Drug class effects. In: Guyatt G, Drummond R, eds.
48 Devereaux PJ, Choi PT, Lacchetti C, Weaver B, Schünemann HJ, Haines T, et al. A sys-
Users’ guide to the medical literature. Chicago, IL: AMA Press, 2002:415-31.
tematic review and meta-analysis of studies comparing mortality rates of private
41 Echt DS, Liebson PR, Mitchell LB, Peters RW, Obias-Manno D, Barker AH, et al. Mor-
for-profit and private not-for-profit hospitals. CMAJ 2002;166:1399-406.
tality and morbidity in patients receiving encainide, flecainide, or placebo. The cardiac
49 North of England Evidence Based Guideline Development Project. Evidence based clini-
arrhythmia suppression trial. N Engl J Med 1991;324:781-8. cal practice guideline: the choice of antidepressants for depression in primary care. Newcastle
42 Riggs BL, Hodgson SF, O’Fallon WM, Chao EY, Wahner HW, Muhs JM, et al. Effect of
upon Tyne: Centre for Health Services Research, 1997.
fluoride treatment on the fracture rate in postmenopausal women with osteoporosis. N
43 Worster A, Preyra I, Weaver B, Haines T. The accuracy of noncontrast helical computed
tomography versus intravenous pyelography in the diagnosis of suspected acuteurolithiasis: a meta-analysis. Ann Emerg Med 2002;40:280-6.
44 Worster A, Haines T. Does replacing intravenous pyelography with noncontrast helical
Correspondence to: Andrew D Oxman, Informed Choice Research Department,
computed tomography benefit patients with suspected acute urolithiasis? Can Assoc
Norwegian Health Services Research Centre, PO Box 7004, St Olavs Plass, 0130
BMJ VOLUME 328 19 JUNE 2004
be less than 47 dB for coherent channel cable systems,when measured with modulated carriers and time Measuring averaged.” In other words, the visual carrier must be at least 51dB above any interfering signals, except (there isalways an “except”) in a system with harmonically cable system related carriers (HRC). In an HRC system, the visual carrier must be at least 47 dB abo
The Graph of a Rational Function List of things to do to analyze the graph of a rational function f (x) = Find the domain there are places where denominator = 0 are prohibited, otherwise domain is R . Locate any intercepts solve p(x) = 0. The x intercepts are points of the form (r,0) where r is a root of p(x) but not a root of q(x). Vertical asymptotes reduce f(x) to l