|
|
||||||||
Editorials |
1 From the Department of Quantitative Health Sciences (N.A.O.) and Division of Radiology (N.A.O., M.T.M.), the Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH 44195. Received February 1, 2005; revision requested February 14; revision received February 19; accepted March 7. Address correspondence to N.A.O. (e-mail: nobuchow{at}bio.ri.ccf.org).
Randomized clinical trials (RCTs) of screening modalities are large, expensive, and time-consuming, and they often have negative or inconclusive results (1,2). This is no surprise. The prevalence of preclinical disease in the general population is very low. Thus, the benefit of screening is conferred on only a few, while harm from screening can affect many through detection of pseudodisease and other false-positive or incidental findings.
It is our belief that potential benefit from screening can be unmasked if (a) the selection of subjects for screening is based on prior determination of risk factors, (b) the images from the screening test are interpreted through a process that yields high sensitivity, (c) positive findings at screening are handled with a level of surveillance appropriate for the risk (ie, appropriate for the posttest probability of disease), and (d) the screening test is repeated at optimal intervals. Henschke et al (3) addressed the need for repeat screening; here, we address the first three issues.
Benefit from screening comes through the early detection of preclinical conditions that will develop into serious clinical disease. The benefit of screening (ie, the absolute risk reduction attributable to screening) can be formulated, roughly, as follows: B = Prc · Sens · (FRc FRp), where B is the benefit of screening, Prc is the probability that a screened person has a form of the disease that will eventually progress to serious clinical disease (ie, it excludes pseudodisease), Sens is the sensitivity of the screening test, FRc is the fatality rate when disease is detected clinically, and FRp is the fatality rate when disease is detected preclinically.
For screening to be efficacious, the benefit must exceed the harm that occurs because of false-positive findings of all types (including pseudodisease and incidental findings). We believe that many RCTs of screening tests mask the potential benefit of screening. We propose several steps for a redesign of RCTs to enable a better assessment of benefit.
First, we must increase the prevalence rate of preclinical disease (ie, the probability that preclinical disease is present) in the screened population. This can be accomplished through a two-stage screening strategy in which the population first is screened for the presence of risk factors (eg, smoking history, family history of cancer) and the higher-risk subpopulation then undergoes imaging. This two-stage process already exists for targeted screening of the lung in heavy smokers; however, it does not exist for untargeted screening (eg, total body screening). Even with this two-stage approach, the prevalence rate remains low, often at less than 2%.
Second, we must interpret the images that result from screening in such a way as to increase the sensitivity of the screening test. Figure 1 illustrates the effect of sensitivity on the absolute risk reduction attributable to screening and on the number of people who must undergo screening to save one life. We assume a disease prevalence rate of 1% and differences of 0.10, 0.40, and 0.70 for fatality rates when disease is detected clinically instead of preclinically. For the three scenarios considered, with a test sensitivity of 0.50, the numbers of people who would have to be screened to save one life are 2000, 500, and 286; with a test sensitivity of 0.90, however, the numbers are reduced to 1111, 278, and 159. Clearly, an increase in sensitivity improves the benefit of screening.
|
A reliable way to increase sensitivity is to have multiple independent readers working in series. Specifically, if the first reader detects a suspicious finding, then the case is considered positive. If the first reader does not detect a lesion, the case is presented to a second reader. If the second reader detects a suspicious finding, then the case is considered positive, and so on. The use of this series-style interpretation tends to increase both sensitivity and the false-positive rate.
Figure 2 illustrates the improvement in sensitivity as the number of independent readers of the image series increases. If the initial reading of the images has a sensitivity of 0.40 and a false-positive rate of 0.10, then interpretation by a second independent reader with similar accuracy increases the sensitivity to 0.64 and the false-positive rate to 0.19. Interpretation by a third reader in the series increases the sensitivity to 0.78, with an associated false-positive rate of 0.27. Of course, second and third interpretations do not have to be supplied by human observers; computer-aided diagnosis algorithms also can be used for this purpose.
|
We believe that, as the magnitude of the absolute risk reduction attributable to screening more closely approaches its full potential, investigators in RCTs will be better able to quantify the gains that are possible with screening. If this goal can be accomplished with smaller RCTs, then we can address the question of efficacy sooner and with less expense.
Efficacy involves trade-offs between the benefits and harms associated with screening. Thus, the third way to unmask the potential benefit of screening involves the appropriate handling of positive screening results. Unlike the relatively high disease prevalence at diagnostic testing in a symptomatic population, which may be as high as 50%, the prevalence in an asymptomatic population, even with a two-stage screening strategy, is usually no higher than 2%. For a test with sensitivity and specificity of 0.80, the posttest probability of disease (ie, the probability that disease is present in a patient with a positive test result) is 0.80 for diagnostic testing versus 0.08 for screening. Thus, while abnormal findings should trigger action in diagnostic settings, abnormal findings should trigger surveillance, not action, in screening settings. Diagnostic and screening findings are not analogous.
An example of this type of surveillance was used in the Early Lung Cancer Action Project study. Henschke et al (6) developed guidelines for dealing with various positive findings at lung cancer screening. These guidelines involved repeated imaging at short and then longer intervals to evaluate the stability of the lesion. The guidelines were effective in helping to greatly reduce the number of biopsies performed because of false-positive results and detection of pseudodisease.
A critical question, then, is How many positive screening results are acceptable? This question is particularly important if positive results at screening are more costly to manage than are positive results at diagnostic testing. To address this question, we made some crude calculations of the expected utility of screening. The expected utility (ie, the average preference for health states), which depends on the possible outcomes of screening (ie, true-positive, false-negative, true-negative, and false-positive) and the likelihood of each occurring, is calculated as follows: EU = Sens · Prc · UTP + (1Sens) · Prc · UFN + Spec · (1Prc) · UTN + (1Spec) · (1Pr) · UFP, where EU is the expected utility, Sens and PRc are as defined earlier, Spec is the specificity of the screening test, and UTP, UFN, UTN, and UFP are the utility of true-positive, false-negative, true-negative, and false-positive results, respectively. We use several simplifying assumptions similar to those used by other investigators (7,8). First, we assume that the utility of a true-negative finding is zero. We also assume that the utility of a false-negative finding is equal to the negative of one-half the utility of a true-positive finding (ie, UFN = UTP/2). The negative sign denotes harm associated with this outcome, which we arbitrarily assign to be one-half of the screening benefit bestowed by a true-positive result; the arbitrary amount of one-half is derived from the notion that the preclinical disease may be detected with screening at a later date. Like Wagner et al (7), we define relative utility as the ratio obtained by dividing the value of the utility of a true-positive finding by that of the utility of a false-positive finding (UTP/UFP). Finally, we set the utility of a false-positive finding (UFP) at a value of 1 (this has no effect on our results), and we consider a range of values for relative utility.
Let us consider a screening test with accuracy similar to that of CT colonography (true-positive rate, 0.4; false-positive rate, 0.1) (1). The expected utility of screening becomes positive (ie, screening becomes efficacious) when relative utility is between 20 and 200 (assuming prevalence rates of 0.5%2.0%). For this discussion, we assume that screening is effective, although not tremendously effective (ie, we assume relative utility values of 50500). Now, if we shift to a higher sensitivity and a higher false-positive rate (true-positive rate, 0.64; false-positive rate, 0.19; Fig 2), then, at the same relative utility value, the expected utility is increased by 400% or more. In other words, a shift to a higher sensitivity, although it is accompanied by a shift to a higher false-positive rate, increases the efficacy of screening by a substantial amount. Some of this increase in utility may be negated by the cost of follow-up of the positive findings, but it is unlikely that the entire gain will be lost. Similarly, a shift to a sensitivity of 0.78, with a false-positive rate of 0.27 (Fig 2), produces an additional increase in expected utility, although the increment (40%) is smaller.
We also consider a screening test with a higher accuracy (perhaps due to more experienced readers), at a true-positive rate of 0.80 and a false-positive rate of 0.20. The expected utility of screening becomes positive when the value of the relative utility is 1575. At this higher accuracy level, if screening is marginally effective, there is no gain in expected utility when shifting to a higher sensitivity and a higher false-positive rate (true-positive rate, 0.96; false-positive rate, 0.36). When screening is more effective (relative utility, >200), however, a shift to a higher sensitivity and a higher false-positive rate results in an increase in the expected utility by 20%30%. Thus, these calculations suggest that moderate to high sensitivity yields the highest expected utility, and very high sensitivity may or may not improve the expected utility.
In conclusion, we believe that researchers must design RCTs of screening modalities to better assess the benefit of screening. This could mean RCTs with smaller sample sizes and greater expected utility. A drawback is that new surveillance algorithms are needed (including patient education about the meaning of positive results at screening). If we do not take these steps, then we may underestimate the benefit of screening; then, after considerable expense, we run the risk of discarding a useful screening tool because we have not assessed its full potential.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. M. Tsutsui, F. Xie, D. Cloutier, S. Kalvaitis, A. Elhendy, and T. R. Porter Real-time dobutamine stress myocardial perfusion echocardiography predicts outcome in the elderly Eur. Heart J., February 1, 2008; 29(3): 377 - 385. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Black Randomized Clinical Trials for Cancer Screening: Rationale and Design Considerations for Imaging Tests J. Clin. Oncol., July 10, 2006; 24(20): 3252 - 3260. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |