Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2302031277
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sunshine, J. H.
Right arrow Articles by Applegate, K. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sunshine, J. H.
Right arrow Articles by Applegate, K. E.
(Radiology 2004;230:309-314.)
© RSNA, 2004


Statistical Concepts Series

Technology Assessment for Radiologists1

Jonathan H. Sunshine, PhD and Kimberly E. Applegate, MD, MS

1 From the Department of Research, American College of Radiology, 1891 Preston White Dr, Reston, VA 20191 (J.H.S.); Riley Hospital for Children, Indiana University Medical Center, Indianapolis (K.E.A.); and Department of Diagnostic Radiology, Yale University, New Haven, Conn (J.H.S.). Received August 10, 2003; revision requested August 19; revision received and accepted August 21. Address correspondence to J.H.S. (e-mail: jonathans@acr.org).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 
Health technology assessment is the systematic and quantitative evaluation of the safety, efficacy, and cost of health care interventions. This article outlines aspects of technology assessment of diagnostic imaging. First, it presents a conceptual framework of a hierarchy of levels of efficacy that should guide thinking about imaging test evaluation. In particular, the framework shows how the question answered by most evaluations of imaging tests, "How well does this test distinguish disease from the nondiseased state?" relates to the fundamental questions for all health technology assessment, "How much does this intervention improve the health of people?" and "What is the cost of that improvement?" Second, it describes decision analysis and cost-effectiveness analysis, which are quantitative modeling techniques usually used to answer the two core questions for imaging. Third, it outlines design and operational considerations that are vital if researchers who are conducting an experimental study are to make a quality contribution to technology assessment, either directly through their findings or as an input into decision analyses. Finally, it includes a separate discussion of screening—that is, the application of diagnostic tests to nonsymptomatic populations—because the requirements for good screening tests are different from those for diagnostic tests of symptomatic patients and because the appropriate evaluation methods also differ.

© RSNA, 2004

Index terms: Cancer screening • Efficacy study • Radiology and radiologists, outcomes studies • Technology assessment


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 
Technologic innovation and diffusion of technology into daily practice in radiology have been nothing short of remarkable in the past several decades. Health technology assessment is the careful evaluation of a medical technology for evidence of its safety, efficacy, cost, cost-effectiveness, and ethical and legal implications (1). Interest and research in health technology assessment are growing in response to the wider application of new technology and the increasing costs of health care today (2).

The goal of this article is to describe some of the rationale and the methods of technology assessment as applied to radiology. For any health care intervention, including diagnostic imaging tests, the ultimate questions are, "How much does this do to improve the health of people?" and "How much does it cost for that gain in health?" We need such an understanding of the radiology services we provide to advocate for our patients and to use our resources efficiently and effectively.


    OUTCOMES
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 
Measures of diagnostic accuracy, which are the metrics most commonly used for evaluation of diagnostic tests, answer the question, "How well does this test distinguish disease from the nondiseased state?" The answer to that question often does not provide an answer to the questions about improvement of health and the cost of that improvement, which are the core outcome questions about health care interventions (3,4).

The most productive way to think about this gap between diagnostic accuracy on the one hand and outcomes on the other hand and to think about the inclusion of relevant outcomes in the evaluation of diagnostic tests is to use the conceptual scheme of a six-level "hierarchy of efficacy" developed by Fryback and Thornbury (5,6) (Table). They point out that efficacy at any level in their hierarchy is necessary for efficacy at the level with the next highest number but is not sufficient. In their scheme, diagnostic accuracy is at level 2, and patient and societal outcomes are at levels 5 and 6, respectively. Thus, there may be "many a slip between cup and lip"—that is, between diagnostic accuracy of an imaging test on the one hand and improved health and adequate cost-effectiveness on the other.


View this table:
[in this window]
[in a new window]

 
Hierarchy of Efficacy for Diagnostic Tests

 
Let us trace partway through the schema, starting at the lowest level, to understand the principle that efficacy at one level is necessary but not sufficient for efficacy at the next level. Technical efficacy (level 1), such as a certain minimum spatial resolution, is necessary for diagnostic accuracy (level 2), but it does not guarantee it. Similarly, diagnostic accuracy is necessary if a test is to affect the clinician’s diagnosis (level 3), but it is not sufficient. Rather, other sources of information, such as patient history, may dominate, so that even a highly accurate test may have little or no effect on the diagnosis. In such an instance, fairly obviously, the test does not contribute to the level 5 goal of improving patient health.

As the Table shows, there are multiple measures that can be used to quantify the efficacy of a diagnostic imaging test at any of the six levels. Hence, evaluations of imaging tests can involve a variety of measures. Thinking in terms of the hierarchy is also helpful for identification of the level(s) at which information should be obtained in an evaluation of a diagnostic imaging test. Experience, as well as reflection, has taught some lessons. The most important of these include:

1. Because higher-level efficacy is possible only if lower-level efficacy exists, it is often useful to measure efficacy at relatively low-numbered levels.

2. In particular, in the development of a test, it is helpful to measure aspects of technical efficacy (level 1), such as sharpness, noise level, and ability to visualize the anatomic structures of interest. An important aspect of test development consists of finding the technical parameters (voltage, section thickness, etc) that give the best diagnostic accuracy; these measures of technical efficacy are often key results in that process.

3. Diagnostic accuracy (level 2) is the highest level of efficacy that is characteristic of the test alone. For example, the sensitivity and specificity of a test are not dependent on what other diagnostic information is available, unlike level 3 (diagnosis). Also, the methodology and statistics used in measurement of diagnostic accuracy are relatively fully developed. Therefore, measurement of diagnostic accuracy is usually worthwhile.

4. Above diagnostic accuracy, effect on treatment (level 4), an "intermediate outcome," is relatively attractive to measure. It can be measured fairly easily and reliably in a prospective study, and it is closer in the hierarchy to the ultimate criteria, effect on patient health (level 5) and cost-effectiveness (level 6).

5. Effect on patient health (level 5) is usually observable only after a substantial delay, especially for chronic illnesses, such as cardiovascular disease and cancer, which are currently the predominant causes of mortality in the United States. Also, it is the end result of a multistep process of health care. Because diagnostic tests occur near the beginning of the process, and some random variation enters into the results at every step, the effect of a diagnostic test on final outcomes is usually difficult to observe without an inordinate number of patients. For example, the current principal randomized controlled trial of computed tomographic (CT) screening for lung cancer requires some 50,000 patients and is expected to take 8 years and cost $200 million (7). Thus, effects on patient health (level 5) and cost-effectiveness (level 6) are uncommon as end points in experimental studies on the evaluation of diagnostic tests.


    CLINICAL DECISION ANALYSIS AND COST-EFFECTIVENESS ANALYSIS
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 
Instead, assessments of imaging technologies at levels 5 and 6 of the efficacy hierarchy are generally conducted by using decision analysis rather than direct experimental studies. Decision analysis (811) is an objective and systematic technique for combining the results of experimental studies that cover different health care steps to estimate effects of care processes more extensive than those directly studied in any single experimental research project. Cost-effectiveness analysis is a form of decision analysis that involves evaluation of the costs of health care, as well as the outcomes (12,13). What follows is a brief explanation of clinical decision analysis and cost-effectiveness analysis and the role they may play in technology assessment in radiology. Although we concentrate on cost-effectiveness analysis, the same methods and applications apply to decision analysis.

Cost-effectiveness analysis recognizes that the results of care are rarely 0% and 100% outcomes but rather are probabilistic (14). It involves the creation of algorithms, usually displayed as decision trees, as shown in Figure 1, which incorporate probabilities of events and, often, the valuations (usually called "utilities") of possible outcomes of these events. Individual or population-based preferences for certain outcomes and treatments are factored into these utilities.



View larger version (20K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Example of a typical imaging decision analysis tree. In this example, an imaging test is compared with clinical examination for the correct diagnosis of acute appendicitis.

 
Cost-effectiveness analysis can be divided into three basic steps: defining the problem, building the decision model, and analyzing the model.

Defining the Problem
For any cost-effectiveness analysis, one of the most difficult tasks is defining the appropriate research question. The issues to address in defining the problem are the population reference case, strategies, time horizon, perspective, and efficacy (outcome) measures. The reference case is a description of the patient population the cost-effectiveness analysis is intended to cover. For example, the reference case for the cost-effectiveness analysis in Figure 1 consists of persons with acute abdominal pain seen in the emergency department.

The issue of strategies is, what are the care strategies that we should compare? Too many strategies may be confusing to compare. Too few may make an analysis suspect of missing possibly superior strategies. The decision tree in Figure 1 compares costs and outcomes of a clinical examination versus an imaging test for the diagnosis of acute appendicitis; in a fuller model, ultrasonography (US) and CT might be considered separate imaging strategies. In general, cost-effectiveness analysis and decision analysis address whether a new diagnostic test or treatment strategy should replace the current standard of care, in which case the current standard and the proposed new approach are the strategies to include. Alternatively, often the issue is which of a series of tests or treatments is best, and these then become the strategies to include.

The time horizon for which the cost-effectiveness analysis model is used to evaluate costs, benefits, and risks of each strategy must be stated and explained. Sometimes, the time horizon may be limited because of incomplete data, but this creates a bias against strategies with long-term benefits.

Finally, cost-effectiveness analysis allows costs to be counted from different perspectives. The perspective might be that of a third-party payer, in which case only insurance payments count as costs, or that of society, in which case all monetary costs, including those paid by the patient, count, and so—at least in some analyses—do nonmonetary costs, such as travel and waiting time involved in obtaining care.

Building the Cost-Effectiveness Analysis Model
Cost-effectiveness analysis is usually based on a decision tree, a visual representation of the research question (Fig 1). These decision trees are created and analyzed with readily available computer software, such as DATA (TreeAge Software, Williamstown, Mass). The tree incorporates the choices, probabilities of events occurring, outcomes, and utilities for each strategy being considered. Each branch of the tree must have a probability assigned to it, and each path in the tree must have a cost and outcome assigned. Data typically come from direct studies of varying quality, from expert opinion (which is usually unavoidable because some needed data values can not be obtained in any other way), and from some less directly relevant literature. For example, in Figure 1, the probability of a positive test result may be selected from published literature and added to the decision tree under the branch labeled "Positive Test/Surgery." Costs are frequently not ascertained directly, but rather are estimated by using proxies such as Medicare reimbursement rates or the charge and/or cost data of a hospital. Building the decision tree requires experience and judgment.

The complexity of cost-effectiveness analysis sometimes makes it difficult to understand and therefore undervalued (14,15). One way to improve understanding and allow readers to judge for themselves the value of a cost-effectiveness analysis model is to be explicit about the assumptions of the model. Many assumptions are needed simply because of limited data available to answer the research question.

Analyzing the Cost-Effectiveness Analysis Model
Once the model has been created, analysis should then include baseline analysis of cost and effectiveness and sensitivity analysis. The average cost and effectiveness for each strategy, considering all the outcomes to which it might lead, are computed simultaneously. We calculate averages by weighting the end probabilities of each branch and by summing for each strategy by moving from right to left in the tree. In cost-effectiveness analysis decision trees such as that in Figure 1, the costs and utilities for each outcome would be placed in the decision tree at the right end of each branch.

Possible results when comparing two strategies include the following: One strategy is less expensive and more effective than another, one strategy is more expensive and less effective, one strategy is less expensive but less effective, and one strategy is more expensive but more effective. The choice in the first two situations is clear, and the better strategy is called "dominant." The final two situations involve trade-offs in cost versus effectiveness, however. In these situations, one compares strategies by using the incremental cost-effectiveness ratio, which allows evaluation of the ratio of increase in cost to increase in effectiveness. What maximal incremental cost-effectiveness ratio is acceptable is open to debate, but for the United States, $50,000–$100,000 per year of life in perfect health (usually called a "quality-adjusted life-year") is commonly recommended as a maximum.

Almost all payers in the United States state that they consider only effectiveness, not cost. Implicitly, then, they accept an indefinitely high incremental cost-effectiveness ratio—it does not matter how much more expensive a strategy is, as long as it is the least bit more effective or the public demands it intensely.

The final task in cost-effectiveness analysis is sensitivity analysis. Sensitivity analysis consists of changing "parameter values" (numerical values, such as probabilities, costs, and valuation of outcomes) in the model to find out what effect they have on the conclusions. A model should be tested in this way for "robustness," or strength of its conclusions with regard to changes in its assumptions and uncertainty in the parameters taken from the literature or expert opinion. If a small change in the value of a parameter leads to a change in the preferred strategy of the model, then the conclusion is said to be sensitive to that parameter, and the conclusion is weak. Sensitivity analysis may persuade doubtful readers of the soundness of the conclusions of the model by showing that the researchers were thorough and unbiased and the conclusions are not sensitive to the assumptions or parameters the readers question. Often, however, sensitivity analysis will show that conclusions are not robust. Alternatively, another cost-effectiveness analysis, conducted by different researchers by using different assumptions and parameters (which is really a form of sensitivity analysis), will reach different conclusions. While discouraging, a similar situation is not uncommon with experimental studies (such as clinical research), with one study having findings different from another. Also, identification of the parameters and assumptions to which the results are sensitive can be very helpful, because it tells researchers what needs to be investigated further through experimental studies to reach reliable conclusions.


    CHARACTERISTICS OF HIGH-QUALITY EXPERIMENTAL STUDIES
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 
Whether an experimental study is intended to provide direct findings (principally, as we have seen, at efficacy levels 1 through 4) or to provide findings to be used as input into decision analysis and/or cost-effectiveness analysis (which are then used to assess level 5 and 6 efficacy), several design and operational considerations are important for the study to be of high quality and substantial value (2,1619). Regrettably, the quality of studies on the evaluation of diagnostic imaging is very often poor (2023). Therefore, radiologists should be aware of these considerations so that they may read the literature critically and also improve the quality of the technology assessment studies they conduct.

The most important considerations follow. We focus on studies of diagnostic accuracy, since these are most common and constitute the principal focus of radiologists, but most of what is said applies to experimental studies of other levels of the hierarchy of efficacy.

Patient Characteristics
Patients in a study should be like those in whom a test will be applied in practice. Often, in initial studies, a test is applied predominantly to very sick patients or completely healthy individuals. This "spectrum bias" exaggerates the real-world ability of the test to distinguish disease from health because intermediate cases that are less than totally clear cut are eliminated. As a result, initial reports on a new test are often overly optimistic. On the other hand, such spectrum bias can be useful in initial studies to ascertain if a test has any possible promise and to help establish the operating parameters at which the test works best.

Number of Cases
The number of cases included in studies should be adequate. Almost always, the smaller the number of cases, the larger the minimum difference that can reliably be observed. Before a study is begun, a statistician should be asked to perform a power calculation to ascertain the number of cases required to detect, with desired reliability, the minimum difference regarded as clinically important. Often, the number of cases included in actual studies is inadequate (22). Such studies are referred to as "underpowered" and can lead to errors.

Design Considerations
Prospective studies are almost always preferable to retrospective studies. "Well begun is half done" carries a corollary that "poorly begun is hard to salvage." In a retrospective study, one has to work from someone else’s design and data collection, and these are typically far from optimal from the standpoint of your purposes.

The temptation to include in the research everything that might be studied should be resisted, lest the study collapse from its own complexity.

Often, the purpose of a study is to compare two diagnostic tests—for example, to compare a proposed new test with an established one. In this situation, unless data on patient health outcomes and cost must be directly obtained, an optimal design consists of applying both tests to all study patients, with interpretation of each test performed while blinded to the results of the other. In contrast, the common practice of using "historical controls" to represent the performance of the established test is usually a poor choice. The patient population in the historical control may be different, and the execution of the historical series may not meet standards of current best practice.

Reference Standard
The reference standard (sometimes less formally called the "gold standard") needs to be chosen carefully. While a perfect reference standard—one with 100% accuracy—often cannot be attained, it is important to do as well as possible. Methodologists routinely warn (4,22,24) that a reference standard that is dependent, even in part, on the test(s) being evaluated involves circular reasoning, and they say it is therefore seriously deficient, but they note that such standards are nonetheless not infrequently used.

Timing
Timing is important because diagnostic imaging is a field that is changing relatively rapidly. There is little point in undertaking a large-scale study when a new technique is in the initial developmental stage and is changing particularly rapidly; results will be obsolete before they are published. On the other hand, it is not wise to wait until a technique is fully mature because, by then, it will often be widely disseminated, making the study too late for its results to readily influence general clinical practice. Use of techniques that lead to rapid completion of a study, such as gathering data from multiple sites, is highly desirable because imaging evolves relatively rapidly.

Efficacy and Effectiveness
Most evaluations of diagnostic tests—and of any other medical care—are studies of efficacy, which is defined as results obtained under ideal conditions, such as those of a careful research project. Initially, efficacy is important to ascertain, but ultimately, one would want to know effectiveness, which is defined as results obtained in ordinary practice. Effectiveness is usually poorer than efficacy. For example, studies in individual academic institutions—that is, efficacy studies—showed that abdominal CT for patients suspected of having appendicitis significantly reduced the perforation rate and unnecessary surgery rate (25,26), but a study of essentially all hospital discharges in Washington state—that is, an effectiveness study—showed no improvement in either rate between 1987 and 1998, a period when laparoscopy and cross-sectional imaging techniques, including CT, became widely available (27). The systematization necessary for an organized study tends to preclude observation of effectiveness—the study protocol ensures uniform application of the test with its parameters set at optimal levels, and people are generally more careful and consistent and do better when they know their activity is being observed (this is called the Hawthorne effect).

Figure 2 lists some additional important considerations for high-quality studies. Sunshine and McNeil (16) discuss the above considerations and those in Figure 2 in more detail.



View larger version (55K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Additional procedures for enhancement of study quality and rapidity, with particular reference to a study of substantial scale.

 

    SCREENING
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 
Screening (28,29) is the performance of a diagnostic test in an asymptomatic population with the aim of reducing morbidity and/or mortality from disease. The requirements of efficacious screening are somewhat different from those of "conventional" diagnostic testing—that is, testing applied to symptomatic patients. These differences apply to the diagnostic test, available treatment, and evaluation of the test.

The Test
Because the prevalence of disease in a screening population is very low—for example, approximately one-half percent in screening mammography—a screening test must be highly specific. Otherwise, false-positive findings will greatly outnumber true-positive findings (even at the relatively high 90%–95% specificity rate for mammography—ie, 5%–10% recall rate—false-positive findings outnumber true-positive findings by 10–20 to 1), and the cost and morbidity of working up patients with false-positive findings will outweigh the gains from early detection in those with true-positive findings. Similarly, the cost and morbidity of the screening test itself (which apply to every patient screened) must be relatively low; otherwise, they will outweigh the gains of screening, which can occur only for the very small percentage of patients with true-positive findings.

In contrast, sensitivity can be modest. For example, screening mammography has an approximate 75% sensitivity, yet it allows us to identify three of every four possible breast cancers that could be detected if the test were perfectly (100%) sensitive. These requirements for a screening test can be somewhat eased if a high-risk population is identified, because the proportion of true-positive findings will increase. Note that while a screening test optimally has high specificity and may only need modest sensitivity, an optimal diagnostic test for symptomatic patients should have a high sensitivity, but the specificity may be modest.

Treatment
Oddly, the available treatment must be intermediate in efficacy. If treatment is fully efficacious—more specifically, if treatment of symptomatic patients is as efficacious and no more costly than the presymptomatic treatment made possible by screening—then nothing is to be gained by identifying disease before it becomes symptomatic. Conversely, if treatment is completely inefficacious—that is, there is no useful treatment for even presymptomatic disease—there is also no possible gain from screening. Screening can only be beneficial if treatment of presymptomatic disease is more efficacious than treatment of symptomatic disease (2931). (However, some hold that screening for untreatable genetic diseases and other untreatable diseases can be reasonable because parents can alter reproductive behavior and patients can gain more time to prepare for the consequences of disease.) Given these requirements regarding treatment effectiveness for screening to be sensible, new developments in treatment—for example, the introduction of pharmaceuticals such as donepezil hydrochloride (Aricept; Eisai America, New York, NY) that slows the previously unalterable rate of progression of Alzheimer disease—can completely alter the relevance of screening.

Evaluation of Screening
In general, the efficacy of treatment of presymptomatic disease relative to that of symptomatic disease is not known, although this is a critical issue for screening, as indicated in the previous paragraph. The reason for the lack of knowledge is as follows: if screening has not been done previously, relative efficacy simply is not known because presymptomatic cases have not been identified and treated. On the other hand, if the issue is introduction of a more sensitive screening test, one does not know the efficacy of treating the additional, presumably less advanced cases the new test detects. Partly for this reason, evaluation of screening generally has to consist of a randomized controlled trial in which (a) the intervention consists of the test and the treatment in combination and (b) the end point studied is the death rate, morbidity, or other adverse outcome(s) from the disease being screening for in the intervention population compared with the rates in the control population.

Biases
Three well-known biases (30,32,33) also generally necessitate this randomized controlled trial study design for evaluation of screening tests and generally preclude the use of other end points, such as 5-year survival from time of diagnosis. These three biases should be understood by all radiologists.

"Lead-time bias" refers to the fact that screening will allow detection of disease earlier in its natural history than will waiting for symptoms, so any measurement from time of diagnosis will be biased in favor of screening, regardless of the effectiveness of treatment. Consider an oversimplified example: For lung cancer, 5-year survival from diagnosis is currently 10%–20%. Assume that CT screening advances diagnosis by 51/2 years, but treatment has absolutely no value. Then 5-year survival would nonetheless increase to essentially 100% with screening. In short, survival time in a screened group will incorrectly appear to be better than that in a nonscreened group.

"Overdiagnosis bias" or "pseudodisease" (29,31) refers to the fact that applying a diagnostic test to asymptomatic individuals will identify "positive cases" that will never become clinically manifest in a person’s lifetime. Prostate cancer provides a striking example. It is the most common nonskin malignancy in men in the United States, affecting 10% of them, but careful histopathologic examination at autopsy shows microscopic prostate cancers in nearly 50% of men over the age of 75 years (34). If an imaging test as sensitive as histologic examination at autopsy were developed, but early detection had absolutely no effect on outcomes, the percentage of "cases" showing adverse outcomes would nonetheless decrease by four-fifths—but only because four-fifths of the "cases" never would have shown any effects of the disease in the absence of screening and treatment. The general point is that, because of overdiagnosis bias, any study of the outcome of cases identified with a screening test will be biased toward screening, for many of the cases identified with screening would never have had any adverse outcomes, even in the absence of treatment. Incidentally, the morbidity and cost of treating such cases is one of the negative consequences of screening.

"Length bias" can be thought of as an attenuated form of pseudodisease. It arises because cases of a disease vary in aggressiveness, with the faster-progressing cases typically also having a natural history with greater morbidity and mortality. Cases detected with screening are typically disproportionately indolent. This is because slow-progressing cases remain longer in the presymptomatic phase in which they are detectable only with screening and do not manifest symptoms. Thus, a test that helps identify asymptomatic cases disproportionately uncovers indolent cases, as Figure 3 shows. Hence, cases detected with screening disproportionately have a relatively favorable prognosis, regardless of the effectiveness of treatment. Thus, any study of outcomes in cases detected with screening (vs those detected when symptoms occur) will be biased toward screening.



View larger version (19K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Example of length bias. Half of the cases are the more indolent form (longer preclinical phase, longer symptomatic phase, and less severe adverse events, as shown by a smaller x). At any point in time (t1 and t2 are randomly chosen points in time), however, two-thirds of the cases detectable only with screening are indolent.

 
Other Considerations
While change in morbidity or mortality from the disease being screened for is the prime measure of the effect of screening, changes in other morbidity and mortality possibly caused by screening and/or treatment should also be considered. Concerns of this type include surgical complications, chemotherapy toxicity, radiation treatment–induced secondary cancers, radiation dose from screening, patient anxiety, and changes in patient satisfaction.

The percentage reduction in the risk of an adverse effect from the disease being screened for, called "relative risk reduction," is a common measure of the benefit of screening, but this measure needs to be set in context (35). For example, if screening reduces an individual’s risk of dying of a particular disease over the next decade from 1.0% to 0.4%, that is a 60% decrease in relative risk, but only 0.6 of a percentage point increase in the probability of surviving the decade.

In conclusion, for any health care intervention, including diagnostic imaging tests, the ultimate questions are, "How much does this do to improve the health of people?" and "How much does it cost for that gain in health?" By using the methods described in this article, we have the ability to answer these questions as we assess the remarkable imaging technologies available today.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 OUTCOMES
 CLINICAL DECISION ANALYSIS AND...
 CHARACTERISTICS OF HIGH-QUALITY...
 SCREENING
 REFERENCES
 

  1. Perry S, Thamer M. Medical innovation and the critical role of health technology assessment. JAMA 1999; 282:1869-1872.[Free Full Text]
  2. Eisenberg J. Ten lessons for evidence-based technology assessment. JAMA 1999; 282:1865-1869.[Free Full Text]
  3. Clancy CM, Eisenberg JM. Outcomes research: measuring the end results of health care. Science 1998; 282:245-246.[Free Full Text]
  4. Hillman BJ. Outcomes research and cost-effectiveness analysis for diagnostic imaging. Radiology 1994; 193:307-310.[Free Full Text]
  5. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making 1991; 11:88-94.
  6. Thornbury JR. Clinical efficacy of diagnostic imaging: love it or leave it. AJR Am J Roentgenol 1994; 162:1-8.[Abstract/Free Full Text]
  7. American College of Radiology Imaging Network. Contemporary screening for the detection of lung cancer. Available at: www.acrin-nlst.org/6654factsheet.html. Accessed August 20 2003.
  8. Weinstein MC, Fineberg HV. Clinical decision analysis Philadelphia, Pa: Saunders, 1980.
  9. Hunink MG, Glasziou PP, Siegel J, et al. Decision making in health and medicine: integrating evidence and values Cambridge, England: Cambridge University Press, 2001.
  10. Chapman GB, Sonnenberg FA. Decision making in health care: theory, psychology, and applications Cambridge, England: Cambridge University Press, 2000.
  11. Janne D’Othee B, Black WC, Pirard S, Zhuang Z, Bettman MA. Decision analysis in radiology. J Radiol 2001; 82:1693-1698.[Medline]
  12. Singer ME, Applegate KE. Cost-effectiveness analysis in radiology. Radiology 2001; 219:611-620.[Abstract/Free Full Text]
  13. Soto J. Health economic evaluations using decision analytic modeling: principles and practices—utilization of a checklist to their development and appraisal. Int J Technol Assess Health Care 2002; 18:94-111.[Medline]
  14. Kleinmuntz B. Clinical and actuarial judgment. Science 1990; 247:146-147.[Free Full Text]
  15. Tsevat J. SMDM presidential address: hearsay or heresy—are health decision scientists too left-brained? Med Decis Making 2003; 23:83-87.[CrossRef][Medline]
  16. Sunshine JH, McNeil BJ. Rapid method for rigorous assessment of radiologic imaging technologies. Radiology 1997; 202:549-557.[Abstract/Free Full Text]
  17. Baum RA, Rutter CM, Sunshine JH, et al. Multi-center trial to evaluate vascular magnetic resonance angiography of the lower extremity. JAMA 1995; 274:875-880.[Abstract]
  18. Lilford RJ, Pauker SG, Braunholtz DA, Chard J. Decision analysis and the implementation of research findings. BMJ 1998; 317:405-409.[Free Full Text]
  19. Blackmore CC. The challenge of clinical radiology research. AJR Am J Roentgenol 2001; 176:327-331.[Free Full Text]
  20. Kent DL, Haynor DR, Longstreth WT, Larson EB. The clinical efficacy of magnetic resonance imaging in neuroimaging. Ann Intern Med 1994; 120:856-871.[Abstract/Free Full Text]
  21. Holman BL. The research that radiologists do: perspective based on a survey of the literature. Radiology 1990; 176:329-332.[Abstract/Free Full Text]
  22. Blackmore CC, Black WC, Jarvik JG, Langlotz CP. A critical synopsis of the diagnostic and screening radiology outcomes literature. Acad Radiol 1999; 6:S8-S18.
  23. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD Initiative. Radiology 2003; 226:24-28.[Abstract/Free Full Text]
  24. Sox H, Stern S, Owens D, Abrams HL. Monograph of the council on health care technology: assessment of diagnostic technology in health care-rationale, methods, problems, and directions Washington, DC: National Academy Press, 1989.
  25. Rao PM, Rhea JT, Novelline RA, Mostafavi AA, Lawrason JN, McCabe CJ. Helical CT combined with contrast material administered only through the colon for imaging of suspected appendicitis. AJR Am J Roentgenol 1997; 169:1275-1280.[Abstract/Free Full Text]
  26. Sivit CJ, Siegel MJ, Applegate KE, Newman KD. When appendicitis is suspected in children. RadioGraphics 2001; 21:247-262.[Abstract/Free Full Text]
  27. Flum DR, Morris A, Koepsell T, Dellinger EP. Has misdiagnosis of appendicitis decreased over time? a population based analysis. JAMA 2001; 286:1748-1753.[Abstract/Free Full Text]
  28. Herman CR, Gill HK, Eng J, Fajardo LL. Screening for preclinical disease: test and disease characteristics. AJR Am J Roentgenol 2002; 179:825-831.[Free Full Text]
  29. Black WC, Welch HG. Screening for disease. AJR Am J Roentgenol 1997; 168:3-11.[Abstract/Free Full Text]
  30. Black WC, Ling A. Is earlier diagnosis really better? the misleading effects of lead time and length biases. AJR Am J Roentgenol 1990; 155:625-630.[Free Full Text]
  31. Morrison AS. Screening in chronic disease New York, NY: Oxford University Press, 1992; 125-127.
  32. Black WC, Welch HG. Advances in diagnostic imaging and overestimation of disease prevalence and the benefits of therapy. N Engl J Med 1993; 328:1237-1243.[Free Full Text]
  33. Morrison AS. The effects of early treatment, lead time and length bias on the mortality experienced by cases detected by screening. Int J Epidemiol 1982; 11:261-267.[Abstract/Free Full Text]
  34. Brant WE, Helms CA, eds. Fundamentals of diagnostic radiology 2nd ed. Philadelphia, Pa: Lippincott Williams & Wilkins, 1999; 825.
  35. Laupacis A, Sackett DL, Roberts RS. An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 1988; 318:1728-1733.[Medline]



This article has been cited by other articles:


Home page
RadiologyHome page
G. T. Sica
Bias in Research Studies
Radiology, March 1, 2006; 238(3): 780 - 789.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
W. Hollingworth
Radiology Cost and Outcomes Studies: Standard Practice and Emerging Methods
Am. J. Roentgenol., October 1, 2005; 185(4): 833 - 839.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sunshine, J. H.
Right arrow Articles by Applegate, K. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sunshine, J. H.
Right arrow Articles by Applegate, K. E.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE