Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2281020709
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gur, D.
Right arrow Articles by Warfel, T. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gur, D.
Right arrow Articles by Warfel, T. E.
(Radiology 2003;228:10-14.)
© RSNA, 2003


Special Reports

Prevalence Effect in a Laboratory Environment1

David Gur, ScD, Howard E. Rockette, PhD, Derek R. Armfield, MD, Arye Blachar, MD, Jennifer K. Bogan, MD, Giuseppe Brancatelli, MD, Cynthia A. Britton, MD, Manuel L. Brown, MD, Peter L. Davis, MD, James V. Ferris, MD, Carl R. Fuhrman, MD, Sara K. Golla, MD, Sanj Katyal, MD, Joan M. Lacomis, MD, Barry M. McCook, MD, F. Leland Thaete, MD and Thomas E. Warfel, MD, PhD

1 From Department of Radiology, Imaging Research, Suite 4200, University of Pittsburgh, 300 Halket St, Pittsburgh, PA 15213-3180. Supported by grant CA84507 from the National Cancer Institute, National Institutes of Health. Received June 14, 2002; revision requested August 8; revision received September 6; accepted October 21. Address correspondence to (e-mail: gurd@msx.upmc.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
PURPOSE: To measure observer performance at various levels of prevalence.

MATERIALS AND METHODS: A multiobserver multiabnormality receiver operating characteristic (ROC) study to assess the effect of prevalence on observer performance was conducted. Fourteen observers, including eight faculty members, two fellows, and four residents, interpreted 1,632 posteroanterior chest images with five prevalence levels by using a nested study design. Performance comparisons were accomplished by using a multireader multicase approach to assess the effect of prevalence from 28% (69 of 249) to 2% (31 of 1,577) on diagnostic accuracy. The mean times required to review and report a case were analyzed and compared for different levels of prevalence and readers’ experience.

RESULTS: Area under the ROC curve demonstrated that, with the study experimental conditions, no significant effect could be measured as a function of prevalence (P > .05) for any abnormality, group of cases, or readers. There were no significant differences (P > .05) in the mean times required to review and report cases at different prevalence levels and with different groups of readers.

CONCLUSION: The consistency in the results and the size of this study suggest that with laboratory conditions, if a prevalence effect exists, it is quite small in magnitude; hence, it will not likely alter conclusions derived from such studies.

© RSNA, 2003

Index terms: Diagnostic radiology, observer performance • Receiver operating characteristic (ROC) curve • Statistical analysis


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Several approaches for evaluations and comparisons of diagnostic systems have been proposed and analyzed over the years. All have advantages and limitations. When assessments include the detection and/or interpretation of abnormal findings, observer performance studies are incorporated into the evaluation process. It is extremely rare that such studies are conducted in a prospective blind manner, namely, one in which the observer is not aware that he or she participated in a study. The commonly used alternative is to conduct system evaluation studies in a laboratory setting. It is, therefore, necessary for the investigator to determine how the results of each experiment may be generalizable to actual clinical practice. Receiver operating characteristic (ROC)–type studies have become the generally accepted standard method for this purpose. The frequency of the use of ROC curve analysis in publications in the medical literature has been steadily increasing. The area under the ROC curve (Az) is frequently used as an indication of the adequacy of algorithms to predict clinical outcomes or as a measure of the inherent accuracy of a diagnostic test. During the past 30 years, this approach, which was originally designed for nonimaging applications (1,2), has been adopted, was refined, and is being extensively used in diagnostic imaging and other related areas for both detection and characterization tasks (37). The ROC method and its derivatives have become the most frequently used approaches for multiobserver, multicase, and often multitarget (abnormality) studies.

The actual prevalence of a given abnormality in the clinical setting may vary considerably depending on the particular practice, the demographics of the population served, and the type of procedures being reviewed, such as screening versus diagnostic examinations. For practical reasons of study efficiency, in most laboratory experiments highly selected and enriched sets of difficult positive and negative cases are used; for example, the fraction of actually positive subtle cases and difficult negative cases is substantially higher than that seen in the clinical environment.

The effect of the prevalence on observer performance (generally referred to as the "prevalence effect") is not very well studied. Although it is often cited as a potential bias and limitation to generalizability in many studies, there is little evidence as to its existence or magnitude with respect to either detection or classification tasks (2,810). In theory, barring observer behavioral effects, the results of ROC analysis should be independent of disease prevalence. However, both limited experimental data from studies in which this issue is explored, as well as those from retrospective reviews of detection rates in screening environments, suggest that there may be a measurable and potentially substantial effect that must be taken into account (1115).

Despite its fundamental nature and the possible existence of a prevalence effect, experimental data regarding the same group of experienced readers who interpret large sets of cases with a wide range of levels of prevalence do not exist. Hence, the purpose of our study was to measure observer performance at various levels of prevalence.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
General Study Design
This study was designed as a nested, multiobserver, multiabnormality ROC experiment in which observer performance was measured at five levels of prevalence ranging from 28% (69 of 249) to 2% (31 of 1,577). Eight board-certified radiologists, two fellows, and four 3rd-year residents participated in the study. Each interpreted a total of 3,208 posteroanterior chest images that were collected with an institutional review board–approved exempt protocol in five reading modes; informed consent was not required. The abnormalities of interest were nodule, pneumothorax, interstitial disease, alveolar disease, and rib fracture, which were selected because they span a wide range of imaging characteristics of primary interest in chest imaging. Readers rated each image as to the likelihood that the specific abnormality in question was present by using an ordinal sliding scale of 0, which indicated that the abnormality was absolutely not present, to 100, which indicated that the abnormality was definitely present. We established a core set of 194 cases, which were included in all reading modes. In the mode with the highest prevalence, we supplemented the positive cases with 55 additional cases. In all other reading modes, we added an increasing number (50, 200, 550, and 1,383) of actually negative cases (Table 1). As a result, the mode with the lowest prevalence required a total of 1,577 interpretations. Management of the reading sessions was completely computerized. Software automatically determined both in what order the modes were presented to each observer and the order in which the images were presented within each mode and each session, with modes counterbalanced among readers.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Reading Modes Used in the Study

 
Selection of Cases and Controls
In selecting actually positive and actually negative images for this study, we followed procedures that have been used successfully and reported in our prior studies (6,7,16). In brief, abnormal findings were verified independently by means of surgical (eg, biopsy of nodules) reports or other source documents when available; otherwise, we used reports from other imaging studies, such as computed tomography (CT) or subsequent radiography, to verify the abnormality in question. Negative findings on radiographs were verified with negative findings on subsequent radiographs obtained after the image was used in the study (eg, from at least 1 year later). In total, 1,632 high-quality posteroanterior chest radiographs were selected and verified as to the presence or absence of each of the abnormalities of interest.

The total numbers of nodules, cases of pneumothorax, cases of interstitial disease, cases of alveolar disease, and rib fractures depicted on the core (or nested) group of images were 38, 37, 41, 32, and 31, respectively (Table 2). In 28 cases, more than one abnormality was depicted; in 21 cases, two were depicted; and in seven cases, three were depicted. Fifty of the images in the core group were negative for all five abnormalities. The inclusion of this case mix was designed to increase the type of possible reported abnormalities and to increase the difficulty in estimation of the frequency distribution of each abnormality for the observers.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Distribution of Abnormalities Depicted on Core (Nested) Group of Images Listed by Type and Detection Difficulty

 
Two experienced observer investigators viewed each radiograph to ensure that all were of acceptable to outstanding quality. Only radiographs that were determined to be acceptable by both were used. Investigators who participated in any aspect of the study preparation were excluded from participation as observers in the ROC study. Table 2 includes a summary of the verified positive cases in the core group according to abnormality and degree of diagnostic difficulty (ie, subtle or typical). The total number of negative cases was 1,433. In the 55 actually positive cases that supplemented the core group in the enriched mode, the following were depicted: 15 nodules, 15 cases of pneumothorax, 28 cases of interstitial disease, 13 cases of alveolar disease, and 15 rib fractures (21 of the images depicted two abnormalities, and five depicted three abnormalities).

Selection of Observers and Prestudy Training
Fourteen observers were selected for the study. Four were 3rd-year radiology residents at the beginning of the reading sessions, two were radiology fellows, and eight were board-certified faculty radiologists with varying experience that ranged from 2 to 25 years in reading posteroanterior chest radiographs. All continue to read chest radiographs during periodic rotation in the emergency department in addition to their regular duties. We selected this group of observers to assess the effect, if any, that may be associated with observers’ training level and daily experience with reading posteroanterior chest radiographs. Observers were not made aware of the aims of the study or the prevalence levels to expect in any reading session. All observers received a detailed "Instruction to Observers" document to review. The document included a clear definition of the abnormalities in question, and a set of subtle and typical cases was used to demonstrate the types of cases to be included and to familiarize observers with the use of the computerized scoring form. The document also described in detail a step-by-step process for reviewing and rating cases during a session.

Performance of Study
The study was a five-mode comparison with varying levels of prevalence for each of the abnormalities. Fourteen readers viewed and rated each case five times. The reading sessions lasted for 18 months. Each reading session included approximately 50 randomized cases from only one mode. The study design allowed for case randomization within a mode and a session for each observer. Mode counterbalancing was implemented to decrease any reading-order effects. Observers completed readings of all cases in one mode (ie, one level of prevalence) before they were permitted to continue the study after a predetermined minimum period of 2 weeks between modes. Given the large number of cases and the complexity of the reading tasks, our experience indicated that this was sufficient time to ensure that individual cases were generally not remembered. Readers were allowed to spend as much time as desired on each image. During a reading session, observers were presented with a stack of envelopes, each containing one original conventional chest radiograph. These envelopes were arranged in the order of the designated interpretations for that session. Observers reported the results for each case on our computerized scoring form by using a computer mouse (7). Five sliding scales, one for each abnormality, were presented. The radiologists slid an indicator along the scale from 0 to 100 to indicate the likelihood (ie, probability) of the presence or absence of the abnormality in question. The study management software recorded the time required to review and report each case.

Of the 38 nodules depicted on the core images (Table 2), 26 were malignant and 12 were benign. In all modes but the enriched mode (ie, mode 1), actually negative cases were added to the core group to decrease (ie, "dilute") prevalence. The nested design was implemented for modes 2–5 as well; namely, all 244 cases in mode 2 were included in mode 3, all 394 cases in mode 3 were included in mode 4, and all 744 cases in mode 4 were included in mode 5.

Data Analyses
In our primary analyses, the Az values for the five modalities and 14 readers for the detection of each of the five disease categories were compared. We performed this analysis by using the method of Dorfman et al (17), which is a multireader multicase method that takes into consideration the correlation between readers who read the same set of cases. The first analysis was performed only with the 194 core cases that were read in all modes. Negative cases were defined as all cases in which the specific abnormality in question was not depicted, even if other abnormalities were present. However, the analysis was repeated for each subgroup that was common to more than one mode (ie, 244 cases for modes 2–5, 394 cases for modes 3–5, and 744 cases for modes 4–5). In addition, the analysis was repeated by using the analysis-of-variance method described by Obuchowski (18) and Obuchowski and Rockette (19). We also included a test for linear trend by incorporating the method proposed by Abelson and Tukey (20) into the procedure described by Obuchowski (18) and Obuchowski and Rockette (19). The data were also analyzed with only "pure" negative cases (namely, only those with negative findings for all five abnormalities in question). In an attempt to identify potential biases, we tested the data for reading-order effect and mean time required to read cases. The data were tested for trends in the Az when the cases were segmented according to the order in which each case was read (eg, first time, second time, etc), regardless of the specific mode. This was performed to determine possible case retention (ie, memorization or learning effects). The time (ie, seconds) to review and rate cases was averaged for each reader and each mode after exclusion of all measurements of 300 seconds or greater (ie, less than 2% [236 of 13,580] of all cases). The exclusion was instituted on the basis of the assumption, which was verified experimentally as well, that these excessively long times were the result of interruptions such as phone calls during the session. The test of Page (21) was used to determine whether there was a trend in the mean time used to read cases for the different reading modes. Both the mean time to review cases and the Az for faculty versus fellows and residents were compared by using the method of generalized estimating equations (22). The statistical power of selected alternative hypotheses was estimated by using the method described by Obuchowski (18).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Tables 37 summarize the mean Az and the corresponding standard error for the five modes defined by the prevalence of cases in the five disease categories. We present here the results for the multireader-multicase approach when it was applied to the core group of 194 cases. Tables 37 provide the results for the group of faculty (eight readers), fellows and residents (six readers combined), and the whole group (14 readers). It is clear that no overall trend is apparent over a wide range of prevalence levels. At the .05 {alpha} level, there were no statistically significant differences in performance among the five prevalence levels (P = .94, .20, .06, .88, and .15 for nodules, pneumothorax, interstitial disease, alveolar disease, and rib fractures, respectively).


View this table:
[in this window]
[in a new window]

 
TABLE 3. Mean Az and Standard Error for Each Group of Readers and Each Mode for Nodules

 

View this table:
[in this window]
[in a new window]

 
TABLE 4. Mean Az and Standard Error for Each Group of Readers and Each Mode for Cases of Pneumothorax

 

View this table:
[in this window]
[in a new window]

 
TABLE 5. Mean Az and Standard Error for Each Group of Readers and Each Mode for Cases of Interstitial Disease

 

View this table:
[in this window]
[in a new window]

 
TABLE 6. Mean Az and Standard Error for Each Group of Readers and Each Mode for Cases of Alveolar Disease

 

View this table:
[in this window]
[in a new window]

 
TABLE 7. Mean Az and Standard Error for Each Group of Readers and Each Mode for Rib Fractures

 
In the only borderline case, that of interstitial disease (P = .06), a substantial part of the difference in performance was attributable to one reader who had an exceptionally low Az of 0.35 in one mode, and that caused the large decrease in mean Az for the mode with the highest prevalence. For no other disease or prevalence level was there even a borderline indication of statistical significance for the comparison of the mean Az across modalities. When the results for interstitial disease with this reader were excluded, the results were not significant as to the effect of prevalence on performance (P = .19). After exclusion of the results for this observer, the mean Az and the corresponding standard error for interstitial disease were 0.70 ± 0.04, 0.74 ± 0.05, 0.72 ± 0.04, 0.69 ± 0.03, and 0.72 ± 0.05 for modes 1–5, respectively.

As seen in Tables 37, the mean observer performance levels of faculty tended to be higher than those of fellows and residents. Although the number of readers was small and interreader variability was quite large, the results were statistically significant for nodules (P = .03) and showed borderline significance for cases of alveolar disease (P = .06) and rib fractures (P = .07). No effect was observed (P > .17) when the cases were analyzed on the basis of the order of reading rather than on the basis of the prevalence level. Hence, no reading-order effect or learning effect could be identified. None of the analyses of the other nested groups yielded significant differences between modes (P > .12 for all 15 comparisons of three groups and five abnormalities).

When the data were analyzed by using the approach of Obuchowski (18) and of Obuchowski and Rockette (19) and with only the 50 cases in which findings were negative for all five abnormalities, the results were not substantially affected. The probability of detecting a difference varies for the different hypotheses and is a function of the alternative hypotheses, the number of cases with disease and the number of cases without disease, the covariance structure, and the type I error. Only the type I error remains the same for differing hypotheses. However, we conservatively estimated that with only the core cases, the probability of detection of a consistent linear trend of a 0.02 change in the Az between consecutive prevalence levels, or a total of 0.08 change between the lowest and highest Az levels, ranges from 83% to 95% for the five disease categories.

The mean time spent in viewing and rating the images of a case varied substantially for different readers and ranged from 27 seconds ± 4 to 96 seconds ± 23. This large variability was observed for both positive and negative cases. However, the mean time over all readers and cases was not significantly affected (P > .05) by prevalence. Mean times in seconds for all readers for the core cases were 55 ± 17, 57 ± 21, 55 ± 23, 56 ± 27, and 55 ± 16 for modes 1–5, respectively. When we assigned the readers to two groups, that is, faculty and all others (ie, fellows and residents), the mean reading time for all readers and modes was 51 seconds ± 17 for the faculty and 62 seconds ± 25 for the fellows and residents. The difference was not significant (P = .27) by using a generalized estimating equation approach, where the mean time over all cases was compared for the group of faculty and all others.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The reading task of this project lasted for 18 months. All observers who agreed to participate in the study completed the task. To our knowledge, this is one of the largest studies of this type performed to date with a 100% completion rate.

The possible effect of prevalence on observer performance studies has been mentioned in many studies as a potential bias that may impose a serious impediment to generalizability of laboratory results to the clinical environment (810). Laboratory observer performance studies generally require some form of a checklist-type scoring (ie, rating) to enable an ROC-type analysis. Hence, the prevalence effect or lack thereof in this setting may be different from that which may be exhibited in the clinical environment. The effect has been assumed to be a possibility in both types of environments but was never carefully studied. Some indirect supporting information was used in the past to infer the potential effect (15). However, because of the cost and complexity associated with the performance of this type of study, available experimental data are sparse and limited (12).

In this study, we attempted to determine the magnitude of the prevalence effect, if any, for a specific set of experimental conditions. By default, our results may not be generalizable to the general clinical environment or to any reading conditions that do not require a formatted checklist-type response. The case mix used in this study is different (ie, generally more subtle) than that which is typically seen in the clinical environment. Nonetheless, our study findings clearly demonstrated that, with a laboratory condition and a wide range of cases, abnormalities, and observer experiences, the prevalence effect could not be identified. This finding is in full concordance with theoretical underpinning of the ROC approach to performance assessment. The consistency in our results and in the relatively large number of readers and cases indicates that, if a prevalence effect exists, it is likely to be small in magnitude; hence, it will not likely alter conclusions derived from such studies. To the extent measured here, our study findings demonstrated that the observers’ level of training and experience affects detection performance, albeit not in regard to any measurable prevalence effect in the laboratory experiment. The range of Az values for individual readers and abnormalities clearly indicates that the cases were not very easy to diagnose. Validation of these results with a different set of cases, abnormalities, and observers may be important if we are to largely ignore this effect in future studies.

Despite its shortcomings in that this study was not performed in a double-blind manner and it required a checklist-type response, results of this experiment provide a data point regarding a fundamentally needed assumption that in some manner validates results of numerous laboratory experiments performed over many decades.


    ACKNOWLEDGMENTS
 
The authors thank Jill King, Amy Klym, Xiao Hui Wang, Rose Gennari, Colleen Plevyak, and Joseph Henretty. Their continued support ensured the success of this project.


    FOOTNOTES
 
Abbreviations: Az = area under the ROC curve, ROC = receiver operating characteristic

Author contributions: Guarantor of integrity of entire study, D.G.; study concepts, D.G., H.E.R.; study design, D.G., H.E.R., C.R.F.; literature research, D.G., D.R.A; clinical studies, D.R.A., A.B., J.K.B., G.B, C.A.B., M.L.B., P.L.D., J.V.F., S.K.G., S.K., J.M.L., B.M.M., F.L.T., T.E.W.; data acquisition and analysis/interpretation, D.G., H.E.R., C.R.F.; statistical analysis, D.G., H.E.R.; manuscript preparation and definition of intellectual content, D.G., H.E.R.; manuscript editing, revision/review, and final version approval, all authors


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Peterson WW, Birdsall TG, Fox WC. The theory of signal detectability. Trans Inst Radio Eng Professional Group Information Theory 1954; 4:171-212.
  2. Green DM, Swets JA. Signal detection theory and psychophysics New York, NY: Wiley, 1966.
  3. Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory New York, NY: Academic, 1982.
  4. Foley WD, Wilson CR, Keyes GS, et al. The effect of varying spatial resolution on the detectability of diffuse pulmonary nodules: assessment with digitized conventional radiographs. Radiology 1981; 141:25-31.
  5. MacMahon H, Vyborny CJ, Metz CE, et al. Digital radiography of subtle pulmonary abnormalities: an ROC study of the effect of pixel size on observer performance. Radiology 1986; 158:21-26.
  6. Slasky BS, Gur D, Good WF, et al. Receiver operating characteristic analysis of chest image interpretation with conventional, laser-printed, and high-resolution workstation images. Radiology 1990; 174:775-780.
  7. Herron JM, Bender T, Campbell WL, Sumkin JH, Rockette HE, Gur D. Effects of luminance and resolution on observer performance with chest radiographs. Radiology 2000; 215:169-174.
  8. Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 221:720-733.
  9. Metz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989; 24:234-245.
  10. Brogdon BG, Kelsey CA, Moseley RD, Jr. Factors affecting perception of pulmonary lesions. Radiol Clin North Am 1983; 21:633-654.
  11. Ethell SC, Manning D. Effects of prevalence on visual search and decision making in fracture detection. Proc SPIE 2001; 4324:249-257.
  12. Egglin TKP, Feinstein AR. Context bias: a problem in diagnostic radiology. JAMA 1996; 276:1752-1755.
  13. Kundel HL. Disease prevalence and radiological decision making. Invest Radiol 1982; 17:107-109.
  14. Swensson RG, Hessel SJ, Herman PG. The value of searching films without specific preconceptions. Invest Radiol 1985; 20:100-114.
  15. Kundel HL. Disease prevalence and the index of detectability: a survey of studies of lung cancer detection by chest radiography. Proc SPIE 2000; 3981:135-144.
  16. Thaete FL, Fuhrman CR, Oliver JH, et al. Digital radiography and conventional imaging of the chest: a comparison of observer performance. AJR Am J Roentgenol 1994; 162:575-581.
  17. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992; 27:723-731.
  18. Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Acad Radiol 1995; 2:522-529.
  19. Obuchowski NA, Rockette HE. Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations. Commun Statistics 1995; 24:285-308.
  20. Abelson RP, Tukey JW. Efficient utilization of non-numerical information in quantitative analysis: general theory and the case of sample order. Ann Math Stat 1963; 34:1347-1369.
  21. Page EB. Ordered hypotheses for multiple treatments: a significance test for linear ranks. J Am Stat Assoc 1963; 58:216-230.
  22. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73:13-22.



This article has been cited by other articles:


Home page
RadiologyHome page
D. Gur, A. I. Bandos, C. S. Cohen, C. M. Hakim, L. A. Hardesty, M. A. Ganott, R. L. Perrin, W. R. Poller, R. Shah, J. H. Sumkin, et al.
The "Laboratory" Effect: Comparing Radiologists' Performance and Variability during Prospective Clinical and Laboratory Mammography Interpretations
Radiology, October 1, 2008; 249(1): 47 - 53.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Neuroradiol.Home page
P.M. Phal, L.P. Riccelli, P. Wang, G.M. Nesbit, and J.C. Anderson
Fracture Detection in the Cervical Spine with Multidetector CT: 1-mm versus 3-mm Axial Images
AJNR Am. J. Neuroradiol., September 1, 2008; 29(8): 1446 - 1449.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Gur, A. I. Bandos, and H. E. Rockette
Comparing Areas under Receiver Operating Characteristic Curves: Potential Impact of the "Last" Experimentally Measured Operating Point
Radiology, April 1, 2008; 247(1): 12 - 15.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
B. Sahiner, H.-P. Chan, M. A. Roubidoux, L. M. Hadjiiski, M. A. Helvie, C. Paramagul, J. Bailey, A. V. Nees, and C. Blane
Malignant and Benign Breast Masses on 3D US Volumetric Images: Effect of Computer-aided Diagnosis on Radiologist Accuracy
Radiology, March 1, 2007; 242(3): 716 - 724.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
S. H. Taplin, C. M. Rutter, and C. D. Lehman
Testing the effect of computer-assisted detection on interpretive performance in screening mammography.
Am. J. Roentgenol., December 1, 2006; 187(6): 1475 - 1482.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
G. T. Sica
Bias in Research Studies
Radiology, March 1, 2006; 238(3): 780 - 789.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
L. Monnier-Cholley, F. Carrat, B. P. Cholley, J.-M. Tubiana, and L. Arrive
Detection of Lung Cancer on Radiographs: Receiver Operating Characteristic Analyses of Radiologists', Pulmonologists', and Anesthesiologists' Performance
Radiology, December 1, 2004; 233(3): 799 - 805.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Gur
Imaging Technology and Practice Assessments: Diagnostic Performance, Clinical Relevance, and Generalizability in a Changing Environment
Radiology, November 1, 2004; 233(2): 309 - 312.
[Full Text] [PDF]


Home page
RadiologyHome page
N. A. Obuchowski
One Less Bias to Worry About [letter]
Radiology, July 1, 2004; 232(1): 302 - 302.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gur, D.
Right arrow Articles by Warfel, T. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gur, D.
Right arrow Articles by Warfel, T. E.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE