|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Breast Imaging |
1 From the Department of Radiology, University of Chicago, 5841 S Maryland Ave, MC2026, Chicago, IL 60637. From the 1999 RSNA scientific assembly. Received March 29, 2001; revision requested May 21; final revision received January 14, 2002; accepted February 6. Supported in part by U.S. Army Medical Research and Materiel Command grant DAMD 17-96-1-6058. Address correspondence to M.L.G. (e-mail: m-giger@uchicago.edu).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Six mammographers and six community radiologists participated in an observer study. These 12 radiologists interpreted, with and without the computer aid, 110 cases that were unknown to both the 12 radiologist observers and the trained computer classification scheme. The radiologists performances in differentiating between benign and malignant masses without and with the computer aid were evaluated with receiver operating characteristic (ROC) analysis. Two-tailed P values were calculated for the Student t test to indicate the statistical significance of the differences in performances with and without the computer aid.
RESULTS: When the computer aid was used, the average performance of the 12 radiologists improved, as indicated by an increase in the area under the ROC curve (Az) from 0.93 to 0.96 (P < .001), by an increase in partial area under the ROC curve (0.90A'z) from 0.56 to 0.72 (P < .001), and by an increase in sensitivity from 94% to 98% (P = .022). No statistically significant difference in specificity was found between readings with and those without computer aid (
= -0.014; P = .46; 95% CI: -0.054, 0.026), where
is difference in specificity. When we analyzed results from the mammographers and community radiologists as separate groups, a larger improvement was demonstrated for the community radiologists.
CONCLUSION: Computer-aided diagnosis can potentially help radiologists improve their diagnostic accuracy in the task of differentiating between benign and malignant masses seen on mammograms.
© RSNA, 2002
Index terms: Breast neoplasms, 00.31, 00.32 Breast neoplasms, radiography, 00.111, 00.119 Breast radiography, 00.111, 00.119 Computers, diagnostic aid Receiver operating characteristic curve (ROC)
| INTRODUCTION |
|---|
|
|
|---|
Recently, Jiang et al (5) and Chan et al (6) developed automated classification schemes based on features extracted by computers and performed observer studies to evaluate the effect of their classification schemes as an aid to radiologists in differentiating between benign and malignant breast lesions. Jiang et al used an artificial neural network to merge eight features of clustered microcalcifications to distinguish between benign and malignant disease. The performance of radiologists, in terms of the area under the receiver operating characteristic (ROC) curve (Az), was significantly (P < .001) improved in differentiating between benign and malignant clusters of microcalcifications when they used the information generated by the neural network. In addition, findings in an observer study showed that the use of the computer aid increased the number of malignant clusters of microcalcifications noted for biopsy and decreased the number of benign clusters noted for biopsy. Chan et al used a linear discriminant classifier to analyze 41 computer-extracted texture and morphologic features to classify benign and malignant mass lesions. The performance of radiologists in terms of Az also improved significantly (P = .007) for the task of differentiating between benign and malignant mass lesions when this computer aid was used.
We have developed a computerized scheme for the classification of mass lesions detected on mammograms (7,8). The scheme automatically extracts four characteristics of masses: spiculation, margin sharpness, density, and texture. An artificial neural network then merges the four features to generate an estimated likelihood of malignancy. The performance of the classification scheme is relatively unaffected by variation in the mammogram digitization or in case mix (9).
The purpose of our study was to evaluate the effectiveness of our automated classification scheme as an aid for radiologists reviewing clinical mammograms for which the diagnoses were unknown to both the radiologist and the computer. Unlike previous observer studies with computer-aided diagnosis, this experiment more closely simulates the likely eventual clinical application of computer-aided diagnosis in which it can be expected that the radiologist and computer will each be "seeing" a given case for the first time. In all prior studies, to our knowledge, a single database has been used for both training and testing, with use of a "leave-one-out" method to yield output for their observer studies.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Computerized Classification Scheme
Figure 1 illustrates schematically our automatic computerized classification scheme. The algorithm has three main components: (a) automated segmentation of a mass lesion from its surrounding parenchyma, (b) automated extraction of four features (ie, the spiculation, margin sharpness, density, and texture measures), and (c) automated classification (ie, estimation of the likelihood of malignancy for each case). Lesion segmentation begins with a region of interest of 512 x 512 pixels centered about the mass lesion in question. Since this study pertained to the classification of masses and not to the initial detection of masses, we manually extracted the 512 x 512 regions of interest from the digitized mammograms. The regions of interest serve as input to the classification scheme. Note that the regions of interest input to our classification scheme can be identified by a computerized detection method (10) or by a radiologist. Given a region of interest, the computerized classification method outputs a number (from 0 to 100) related to the likelihood of malignancy.
|
|
Observer Study
Twelve radiologists who are qualified to read mammograms according to the rules of the U.S. Food and Drug Administration (pursuant to the Mammography Quality Standards Act) participated in the observer study. The observers included six mammographers with a mean of 12 years of experience in interpreting mammograms (hereafter, mammographers) and six community hospitalbased radiologists with a mean of 9 years of experience in interpreting mammograms (hereafter, community radiologists). All observers signed an institutional review boardapproved consent form before participating in the observer study.
One reading session was held for each observer. During the session, two reading conditions were evaluated: (a) radiologist performance without the computer aid and (b) radiologist performance with the computer aid. For each clinical case, radiologists were shown the four standard mammographic views on a 17 x 17-inch (43 x 43-cm) video monitor, with arrows indicating the locations of the mass lesion. The monitor also displayed regions of interest containing the mass lesion in question from the craniocaudal, mediolateral oblique, or special views. The layout of the interface on the monitor is shown in Figure 3. The radiologists were able to enlarge the regions of interest to full resolution (0.1 mm/pixel) and were able to interactively change the contrast and brightness of the displayed images according to their preference.
|
|
In order for the observers to use the computer output comfortably and effectively, a brief training session was held before the observer study. To establish the observers confidence in using computer results, we briefly described to the observers our computer classification scheme and its performance in the task of differentiating between benign and malignant masses. The observers were told that in the classification scheme, features were used that are similar to those used by radiologists when interpreting mass cases. They were also informed about the performance of the computer in terms of sensitivity and positive predictive value at a given threshold on the computer-estimated likelihood of malignancy; at a threshold of 45%, the computer yields a sensitivity of 96% and a positive predictive value of 67%. The observers learned to adjust their confidence levels to that of the computer while they went through the training cases. Each observer viewed as many as 20 training cases (10 malignant and 10 benign cases) first without and then with the computer-estimated likelihood of malignancy. The true diagnosis was given immediately after each training case. The order of the training cases was the same for all the observers. However, the order of the 110 independent test cases was randomized differently for each observer. No time limit was imposed on observers, and each observer decided when the computer output would be shown.
Data Analysis
The confidence ratings (that a lesion is malignant) from each observer were analyzed by using ROC analysis (15,16). The Az and a partial area index (0.90A'z) were calculated to summarize the overall performance and the performance above 90% sensitivity, respectively, of each observer in the task of differentiating between benign and malignant mass lesions. In addition, sensitivity and specificity were calculated for each observer on the basis of their biopsy recommendations. These performance measures, obtained under the two test conditions (without and with the computer aid), indicated the effect of the computer aid on the radiologists performance in differentiating between benign and malignant mass lesions and in recommending patient management. Since radiologists may operate at different thresholds in recommending biopsy, the second question was important to evaluate the ultimate effect of the computer aid on patient management (ie, sensitivity and specificity). The Student t test for paired data (17) was used to assess the significance of differences between the performances with or without computer aid for a group of radiologists.
We analyzed these results separately for the six mammographers, the six community radiologists, and all 12 radiologists to assess the effect of the computer aid on the performance of radiologists who have different levels of experience in mammography. Note that the Student t test for paired data does not attempt to account for case-sample variation and, therefore, does not need to account for the correlation arising from the fact that all the observers read the same images. However, because the cases used in the study were consecutive lesions sampled at biopsy and because they were considered to be clinically representative, we do not expect the variation in case selection to have a strong effect on the results obtained in this study. In addition, the Student t test was used to evaluate the significance of the difference between the mean performances of the two groups of radiologists (ie, six mammographers and six community radiologists).
| RESULTS |
|---|
|
|
|---|
|
|
|
|
As shown in Table 1, the specificity calculated from individuals biopsy recommendations varied substantially among the radiologists under both reading conditions. The mean specificity decreased slightly for the 12 radiologists (from 0.64 ± 0.05 to 0.63 ± 0.04). However, findings with the Student t test for paired data failed to show a statistically significant decrease in specificity (
= -0.014; P = .46; 95% CI: -0.054, 0.026), where
is difference in specificity. Thus, use of the computer aid in this study did not appear to affect the number of benign cases sent for biopsy.
It is interesting to note that the cases recommended for biopsy under the two reading conditions varied across observers, as shown in Table 2. The patient management decision (whether or not to perform biopsy) was changed for 16 of the 50 cases (32%) when radiologists took the computer output into account. For malignant lesions, all of these changes for the 16 cases resulted in a change from follow-up to biopsy, as is illustrated in Figure 7a and 7b for the six mammographers and the six community radiologists, respectively. On average, 2.2 malignant cases were changed from follow-up to biopsy. It should be noted that for these malignant cases, only four patient management changes in four cases occurred among the six mammographers, whereas 22 changes in 15 cases from follow-up to biopsy occurred among the six community radiologists.
|
|
|
|
|
= 0.0167; P = .196; 95% CI: -0.010, 0.043). In addition, these differences in sensitivity, before and after using the computer aid, between the six mammographers and six community radiologists were evaluated by using the Student t test under a different assumption, namely, that the two populations have unequal variances. Under this assumption, both differences failed to achieve a statistically significant level at a critical P value (
) = .05: The difference in sensitivity between the two groups yielded a P value of .071 (
= 0.076; 95% CI: -0.011, 0.162) when the computer aid was not used, whereas the difference in sensitivity between the two groups yielded a P value of .26 (
= 0.017; 95% CI: -0.013, 0.045) when the computer aid was used. We tested the differences with the Student t test with these two different assumptions concerning the variances of the two populations because the population variances are unknown. According to Hays (17), for samples with equal or nearly equal size, relatively large difference in the population variances does not seem to have strong effect on the conclusion drawn from a t test on the basis of the equal variance assumption. With the equal but small sample size (six observers from each group) in our study, a statistical test of the homogeneity of variance may not be reliable. Thus, the conclusion derived from the t tests assuming equal variances, according to Hays (17), may be more applicable to our case. In fact, the P values of .071 and .26 from the t tests under the assumption of unequal variances do not differ substantially from the P values (.045 and .196) obtained from the t tests under the assumption of equal variances for the differences in sensitivity between the two groups of radiologists without and with the computer aid, respectively.
On the basis of the evidence from the t tests under the two assumptions, we conclude that the performance (in terms of sensitivity) of the mammographers was better than that of the community radiologists when the computer aid was not used, and this improvement was marginally significant (P = .045 or .071, depending on the test used). When the computer aid was used, however, the performances of the two groups with respect to making correct biopsy recommendations for malignant cases clearly failed to show a statistically significant difference.
| DISCUSSION |
|---|
|
|
|---|
As a second result, we have shown that use of computer aid has the potential to reduce the difference in performance between community radiologists and mammographers in the task of distinguishing between malignant and benign lesions (ie, in making correct biopsy recommendations for malignant cases). Such may become the eventual motivation for the application of computer-based classification algorithms in community practice. It is of interest to note that the computer-extracted features in our classification algorithm correspond to the major features used by radiologists in differentiating between benign and malignant mass lesions (18). Previously, we showed that our classification scheme achieved a performance similar to that of an experienced mammographer with the training database (8) and yielded a robust performance with the independent database. The robustness was evaluated in terms of the differences in performance, as indicated by both Az and 0.90A'z values, obtained with the training database and with the independent database. We found that the differences in Az and 0.90A'z failed to reach a statistically significant level (9).
In a clinical setting, neither radiologists nor a trained computer system would know the outcomes of cases presented for interpretation. To our knowledge, our study is the first of its kind in which the cases are distinct (independent) from those used for the training of the classification algorithm. That is, both the radiologist observers and the computer system were "viewing" the images for the first time.
We did not use the leave-one-out method for the cases in our observer study. The leave-one-out testing method has been widely used to prevent classifier overtraining and has served as a method to evaluate the ability of classifiers to generalize to new cases. The leave-one-out method may solve the problem of overtraining of the classifier. However, the results from a leave-one-out method may still be biased toward the training cases, because the overall training of a computer classification scheme includes not only the training of the classifier but also the training of other aspects of the computer method (eg, segmentation, feature extraction, and selection of features for input to the classifier).
We chose to use a video monitor for the observer study because it allowed the use of consecutive cases without problems related to mammogram availability. We believe that the image resolution required for the diagnosis of mass lesions seen at mammography is not as crucial as that required for the diagnosis of microcalcifications and that the resolution of 100 µm is therefore sufficient. The sensitivity of the mammographers in the task of identifying malignant lesions in this study (ie, 97% without aid) suggests that this is the case. In addition, high-resolution monitors are used for reading digital mammograms in many clinical practices. A commercial full-field digital mammography system with soft-copy display on high-resolution monitors has been approved recently by the U.S. Food and Drug Administration for use in the diagnosis of breast disease. It should be noted that the image resolutions used in our computerized analysis and in our image display are both at 100 µm, which is the same as the image resolution of the commercial full-field digital mammography system. Use of the video monitor for reading digitized mammographic images is a possible limitation of our study because the image quality degradation due to use of the video monitor and digitized images may theoretically affect radiologists performance in the task of differentiating between benign and malignant masses. However, findings in studies by others have shown that the diagnostic accuracies obtained by using conventional film and digitized mammographic images (soft-copy display on 1,024 x 1,024-resolution monitors) were at a similar level for the classification of mass lesions and microcalcifications (1921). In addition, results in ROC studies have shown that the reduction of pixel size from 100 to 35
50 µm on both digital and digitized mammographic images (printed as hard copies) did not yield a measurable improvement in the characterization of microcalcification clusters (22,23).
To our knowledge, the observer study presented herein is the first in which the cases for interpretation were unknown to both the radiologist observers and the computer. We have shown that the use of a diagnostic computer aid improved the abilities of both mammographers and community radiologists to differentiate between benign and malignant masses on mammograms, as indicated by the statistically significant improvement in values for areas under the ROC curve, Az and 0.90A'z. The sensitivities of their biopsy recommendations when they used the computer aid was also improved at a statistically significant level. However, the study had no effect on their performance regarding the number of benign cases sent for biopsy. In addition, our results show that when the computer aid was used, improvements for the community radiologists were larger than those observed for the mammographers.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Z.H., M.L.G., C.J.V., and C.E.M. are shareholders in R2 Technology, Los Altos, Calif. It is the policy of the University of Chicago that investigators disclose publicly actual or potential significant financial interests that may appear to be affected by the research activities.
Abbreviations: Az = area under the ROC curve,
0.90A'z = partial area under the ROC curve,
= difference in specificity,
ROC = receiver operating characteristic
Author contributions: Guarantors of integrity of entire study, Z.H., M.L.G.; study concepts and design, all authors; literature research, Z.H.; experimental studies, Z.H.; data acquisition, Z.H., C.J.V.; data analysis/interpretation, Z.H., M.L.G., C.E.M.; statistical analysis, Z.H., C.E.M.; manuscript preparation, Z.H., M.L.G.; manuscript definition of intellectual content, editing, revision/review, and final version approval, all authors.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
F. J. Gilbert, S. M. Astley, M. G.C. Gillan, O. F. Agbaje, M. G. Wallis, J. James, C. R.M. Boggis, S. W. Duffy, and the CADET II Group Single Reading with Computer-Aided Detection for Screening Mammography N. Engl. J. Med., October 16, 2008; 359(16): 1675 - 1684. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hock, R. Ouhadi, R. Materne, A.-S. Aouchria, I. Mancini, T. Broussaud, P. Magotteaux, and A. Nchimi Virtual Dissection CT Colonography: Evaluation of Learning Curves and Reading Times with and without Computer-aided Detection Radiology, September 1, 2008; 248(3): 860 - 868. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kakeda, Y. Korogi, H. Arimura, T. Hirai, S. Katsuragawa, T. Aoki, and K. Doi Diagnostic Accuracy and Reading Time to Detect Intracranial Aneurysms on MR Angiography Using a Computer-Aided Diagnosis System Am. J. Roentgenol., February 1, 2008; 190(2): 459 - 465. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sahiner, H.-P. Chan, M. A. Roubidoux, L. M. Hadjiiski, M. A. Helvie, C. Paramagul, J. Bailey, A. V. Nees, and C. Blane Malignant and Benign Breast Masses on 3D US Volumetric Images: Effect of Computer-aided Diagnosis on Radiologist Accuracy Radiology, March 1, 2007; 242(3): 716 - 724. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Burnside, D. L. Rubin, J. P. Fine, R. D. Shachter, G. A. Sisney, and W. K. Leung Bayesian Network to Predict Breast Cancer Risk of Mammographic Microcalcifications and Reduce Number of Benign Biopsy Results: Initial Experience Radiology, September 1, 2006; 240(3): 666 - 673. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Horsch, M. L. Giger, C. J. Vyborny, L. Lan, E. B. Mendelson, and R. E. Hendrick Classification of Breast Lesions with Multimodality Computer-aided Diagnosis: Observer Study Results on an Independent Clinical Data Set. Radiology, August 1, 2006; 240(2): 357 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Hadjiiski, B. Sahiner, M. A. Helvie, H.-P. Chan, M. A. Roubidoux, C. Paramagul, C. Blane, N. Petrick, J. Bailey, K. Klein, et al. Breast Masses: Computer-aided Diagnosis with Serial Mammograms Radiology, August 1, 2006; 240(2): 343 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
D J Manning, A Gale, and E A Krupinski Perception research in medical imaging Br. J. Radiol., August 1, 2005; 78(932): 683 - 685. [Full Text] [PDF] |
||||
![]() |
J. A. Baker, E. L. Rosen, M. M. Crockett, and J. Y. Lo Accuracy of Segmentation of a Commercial Computer-aided Detection System for Mammography Radiology, May 1, 2005; 235(2): 385 - 390. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Elmore, K. Armstrong, C. D. Lehman, and S. W. Fletcher Screening for Breast Cancer JAMA, March 9, 2005; 293(10): 1245 - 1256. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. E. Deurloo, S. H. Muller, J. L. Peterse, A. P. E. Besnard, and K. G. A. Gilhuijs Clinically and Mammographically Occult Breast Lesions on MR Images: Potential Effect of Computerized Assessment on Clinical Reading Radiology, March 1, 2005; 234(3): 693 - 701. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Li, M. Aoyama, J. Shiraishi, H. Abe, Q. Li, K. Suzuki, R. Engelmann, S. Sone, H. MacMahon, and K. Doi Radiologists' Performance for Differentiating Benign from Malignant Lung Nodules on High-Resolution CT Using Computer-Estimated Likelihood of Malignancy Am. J. Roentgenol., November 1, 2004; 183(5): 1209 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Hadjiiski, H.-P. Chan, B. Sahiner, M. A. Helvie, M. A. Roubidoux, C. Blane, C. Paramagul, N. Petrick, J. Bailey, K. Klein, et al. Improvement in Radiologists' Characterization of Malignant and Benign Breast Masses on Serial Mammograms with Computer-aided Diagnosis: An ROC Study Radiology, October 1, 2004; 233(1): 255 - 265. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||