Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2372050171
This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Obuchowski, N. A.
Right arrow Articles by Modic, M. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Obuchowski, N. A.
Right arrow Articles by Modic, M. T.
(Radiology 2005;237:401-403.)
© RSNA, 2005


Editorials

Screening with Multisection CT: Unmasking the Benefit1

Nancy A. Obuchowski, PhD and Michael T. Modic, MD

1 From the Department of Quantitative Health Sciences (N.A.O.) and Division of Radiology (N.A.O., M.T.M.), the Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH 44195. Received February 1, 2005; revision requested February 14; revision received February 19; accepted March 7. Address correspondence to N.A.O. (e-mail: nobuchow{at}bio.ri.ccf.org).

Randomized clinical trials (RCTs) of screening modalities are large, expensive, and time-consuming, and they often have negative or inconclusive results (1,2). This is no surprise. The prevalence of preclinical disease in the general population is very low. Thus, the benefit of screening is conferred on only a few, while harm from screening can affect many through detection of pseudodisease and other false-positive or incidental findings.

It is our belief that potential benefit from screening can be unmasked if (a) the selection of subjects for screening is based on prior determination of risk factors, (b) the images from the screening test are interpreted through a process that yields high sensitivity, (c) positive findings at screening are handled with a level of surveillance appropriate for the risk (ie, appropriate for the posttest probability of disease), and (d) the screening test is repeated at optimal intervals. Henschke et al (3) addressed the need for repeat screening; here, we address the first three issues.

Benefit from screening comes through the early detection of preclinical conditions that will develop into serious clinical disease. The benefit of screening (ie, the absolute risk reduction attributable to screening) can be formulated, roughly, as follows: B = Prc · Sens · (FRc – FRp), where B is the benefit of screening, Prc is the probability that a screened person has a form of the disease that will eventually progress to serious clinical disease (ie, it excludes pseudodisease), Sens is the sensitivity of the screening test, FRc is the fatality rate when disease is detected clinically, and FRp is the fatality rate when disease is detected preclinically.

For screening to be efficacious, the benefit must exceed the harm that occurs because of false-positive findings of all types (including pseudodisease and incidental findings). We believe that many RCTs of screening tests mask the potential benefit of screening. We propose several steps for a redesign of RCTs to enable a better assessment of benefit.

First, we must increase the prevalence rate of preclinical disease (ie, the probability that preclinical disease is present) in the screened population. This can be accomplished through a two-stage screening strategy in which the population first is screened for the presence of risk factors (eg, smoking history, family history of cancer) and the higher-risk subpopulation then undergoes imaging. This two-stage process already exists for targeted screening of the lung in heavy smokers; however, it does not exist for untargeted screening (eg, total body screening). Even with this two-stage approach, the prevalence rate remains low, often at less than 2%.

Second, we must interpret the images that result from screening in such a way as to increase the sensitivity of the screening test. Figure 1 illustrates the effect of sensitivity on the absolute risk reduction attributable to screening and on the number of people who must undergo screening to save one life. We assume a disease prevalence rate of 1% and differences of 0.10, 0.40, and 0.70 for fatality rates when disease is detected clinically instead of preclinically. For the three scenarios considered, with a test sensitivity of 0.50, the numbers of people who would have to be screened to save one life are 2000, 500, and 286; with a test sensitivity of 0.90, however, the numbers are reduced to 1111, 278, and 159. Clearly, an increase in sensitivity improves the benefit of screening.



View larger version (24K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Graph shows the effect of the sensitivity level on the absolute risk reduction due to screening (left vertical axis) and on the number of people who must be screened to save one life (right vertical axis), with consideration of three values for change in the fatality rate (0.10, 0.40, and 0.70) when disease is detected clinically versus preclinically. As sensitivity increases, the absolute risk reduction due to screening also increases, and the number who must be screened to save one life decreases.

 
In conventional thought, to maximize the benefit of screening, it is necessary to reduce the usually large number of false-positive findings at screening in a low-prevalence population. Consensus and majority decision making with regard to image interpretation are commonly used in RCTs of screening tests because these are effective methods for lowering the false-positive rate. According to these methods, unless two or more readers detect disease, the test result is considered negative. The use of consensus and majority decision making in image interpretation also can result in lower sensitivity. This occurs when readers have low sensitivity. For example, if each reader on a three-member panel has a sensitivity of 0.40 and a false-positive rate of 0.10 (comparable to the reported values for accuracy of CT colonography) (4), then the sensitivity of the panel is only 0.35.

A reliable way to increase sensitivity is to have multiple independent readers working in series. Specifically, if the first reader detects a suspicious finding, then the case is considered positive. If the first reader does not detect a lesion, the case is presented to a second reader. If the second reader detects a suspicious finding, then the case is considered positive, and so on. The use of this series-style interpretation tends to increase both sensitivity and the false-positive rate.

Figure 2 illustrates the improvement in sensitivity as the number of independent readers of the image series increases. If the initial reading of the images has a sensitivity of 0.40 and a false-positive rate of 0.10, then interpretation by a second independent reader with similar accuracy increases the sensitivity to 0.64 and the false-positive rate to 0.19. Interpretation by a third reader in the series increases the sensitivity to 0.78, with an associated false-positive rate of 0.27. Of course, second and third interpretations do not have to be supplied by human observers; computer-aided diagnosis algorithms also can be used for this purpose.



View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Graph shows changes in sensitivity (vertical axis) and false-positive rate (horizontal axis) according to the number of independent readings of an image series: one (1), two (2), or three (3). Image interpretation by a series of readers (in which images interpreted as negative for disease at the initial reading are passed to another reader) increases sensitivity and the false-positive rate.

 
An increase in the prevalence rate and/or in the sensitivity of the test has a direct effect on the sample size in RCTs (5). A doubling of the prevalence rate (eg, an increase from 1% to 2%) in the screened population reduces the overall required sample size by 75%. An improvement in sensitivity by just 0.10 reduces the overall required sample size by 21%–35%, with a greater reduction occurring at lower sensitivity levels. An improvement in sensitivity by 0.20 enables a reduction of 39%–55% in the required sample size.

We believe that, as the magnitude of the absolute risk reduction attributable to screening more closely approaches its full potential, investigators in RCTs will be better able to quantify the gains that are possible with screening. If this goal can be accomplished with smaller RCTs, then we can address the question of efficacy sooner and with less expense.

Efficacy involves trade-offs between the benefits and harms associated with screening. Thus, the third way to unmask the potential benefit of screening involves the appropriate handling of positive screening results. Unlike the relatively high disease prevalence at diagnostic testing in a symptomatic population, which may be as high as 50%, the prevalence in an asymptomatic population, even with a two-stage screening strategy, is usually no higher than 2%. For a test with sensitivity and specificity of 0.80, the posttest probability of disease (ie, the probability that disease is present in a patient with a positive test result) is 0.80 for diagnostic testing versus 0.08 for screening. Thus, while abnormal findings should trigger action in diagnostic settings, abnormal findings should trigger surveillance, not action, in screening settings. Diagnostic and screening findings are not analogous.

An example of this type of surveillance was used in the Early Lung Cancer Action Project study. Henschke et al (6) developed guidelines for dealing with various positive findings at lung cancer screening. These guidelines involved repeated imaging at short and then longer intervals to evaluate the stability of the lesion. The guidelines were effective in helping to greatly reduce the number of biopsies performed because of false-positive results and detection of pseudodisease.

A critical question, then, is How many positive screening results are acceptable? This question is particularly important if positive results at screening are more costly to manage than are positive results at diagnostic testing. To address this question, we made some crude calculations of the expected utility of screening. The expected utility (ie, the average preference for health states), which depends on the possible outcomes of screening (ie, true-positive, false-negative, true-negative, and false-positive) and the likelihood of each occurring, is calculated as follows: EU = Sens · Prc · UTP + (1–Sens) · Prc · UFN + Spec · (1–Prc) · UTN + (1–Spec) · (1–Pr) · UFP, where EU is the expected utility, Sens and PRc are as defined earlier, Spec is the specificity of the screening test, and UTP, UFN, UTN, and UFP are the utility of true-positive, false-negative, true-negative, and false-positive results, respectively. We use several simplifying assumptions similar to those used by other investigators (7,8). First, we assume that the utility of a true-negative finding is zero. We also assume that the utility of a false-negative finding is equal to the negative of one-half the utility of a true-positive finding (ie, UFN = –UTP/2). The negative sign denotes harm associated with this outcome, which we arbitrarily assign to be one-half of the screening benefit bestowed by a true-positive result; the arbitrary amount of one-half is derived from the notion that the preclinical disease may be detected with screening at a later date. Like Wagner et al (7), we define relative utility as the ratio obtained by dividing the value of the utility of a true-positive finding by that of the utility of a false-positive finding (UTP/UFP). Finally, we set the utility of a false-positive finding (UFP) at a value of –1 (this has no effect on our results), and we consider a range of values for relative utility.

Let us consider a screening test with accuracy similar to that of CT colonography (true-positive rate, 0.4; false-positive rate, 0.1) (1). The expected utility of screening becomes positive (ie, screening becomes efficacious) when relative utility is between 20 and 200 (assuming prevalence rates of 0.5%–2.0%). For this discussion, we assume that screening is effective, although not tremendously effective (ie, we assume relative utility values of 50–500). Now, if we shift to a higher sensitivity and a higher false-positive rate (true-positive rate, 0.64; false-positive rate, 0.19; Fig 2), then, at the same relative utility value, the expected utility is increased by 400% or more. In other words, a shift to a higher sensitivity, although it is accompanied by a shift to a higher false-positive rate, increases the efficacy of screening by a substantial amount. Some of this increase in utility may be negated by the cost of follow-up of the positive findings, but it is unlikely that the entire gain will be lost. Similarly, a shift to a sensitivity of 0.78, with a false-positive rate of 0.27 (Fig 2), produces an additional increase in expected utility, although the increment (40%) is smaller.

We also consider a screening test with a higher accuracy (perhaps due to more experienced readers), at a true-positive rate of 0.80 and a false-positive rate of 0.20. The expected utility of screening becomes positive when the value of the relative utility is 15–75. At this higher accuracy level, if screening is marginally effective, there is no gain in expected utility when shifting to a higher sensitivity and a higher false-positive rate (true-positive rate, 0.96; false-positive rate, 0.36). When screening is more effective (relative utility, >200), however, a shift to a higher sensitivity and a higher false-positive rate results in an increase in the expected utility by 20%–30%. Thus, these calculations suggest that moderate to high sensitivity yields the highest expected utility, and very high sensitivity may or may not improve the expected utility.

In conclusion, we believe that researchers must design RCTs of screening modalities to better assess the benefit of screening. This could mean RCTs with smaller sample sizes and greater expected utility. A drawback is that new surveillance algorithms are needed (including patient education about the meaning of positive results at screening). If we do not take these steps, then we may underestimate the benefit of screening; then, after considerable expense, we run the risk of discarding a useful screening tool because we have not assessed its full potential.


    References
 TOP
 References
 

  1. Fontana RS, Sanderson DR, Woolner LB, et al. Screening for lung cancer: a critique of the Mayo Lung Project. Cancer 1991; 67(4 suppl):1155–1164.[CrossRef][Medline]
  2. Miettinen OS, Henschke CI. CT screening for lung cancer: coping with nihilistic recommendations. Radiology 2001;221:592–596.[Abstract/Free Full Text]
  3. Henschke CI, Yankelevitz DF, Kostis WJ. CT screening for lung cancer. Semin Ultrasound CT MR 2003; 24(1):23–32.[CrossRef][Medline]
  4. Cotton PB, Durkalski VL, Pineau BC, et al. Computed tomographic colonography (virtual colonoscopy): a multicenter comparison with standard colonoscopy for detection of colorectal neoplasia. JAMA 2004;291:1713–1719.[Abstract/Free Full Text]
  5. Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 1981;2:93–113.[CrossRef][Medline]
  6. Henschke CI, McCauley DI, Yankelevitz DF, et al. Early lung cancer action project: overall design and findings from baseline screening. Lancet 1999;354:99–105.[CrossRef][Medline]
  7. Wagner RF, Beam CA, Beiden SV. Reader variability in mammography and its implications for expected utility over the population of readers and cases. Med Decis Making 2004;24:561–572.[Abstract]
  8. Patton DD, Woolfenden JM. A utility-based model for comparing the cost-effectiveness of diagnostic studies. Invest Radiol 1989;24:263–271.[CrossRef][Medline]



This article has been cited by other articles:


Home page
Eur Heart JHome page
J. M. Tsutsui, F. Xie, D. Cloutier, S. Kalvaitis, A. Elhendy, and T. R. Porter
Real-time dobutamine stress myocardial perfusion echocardiography predicts outcome in the elderly
Eur. Heart J., February 1, 2008; 29(3): 377 - 385.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
W. C. Black
Randomized Clinical Trials for Cancer Screening: Rationale and Design Considerations for Imaging Tests
J. Clin. Oncol., July 10, 2006; 24(20): 3252 - 3260.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Obuchowski, N. A.
Right arrow Articles by Modic, M. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Obuchowski, N. A.
Right arrow Articles by Modic, M. T.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE