Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print June 28, 2002, 10.1148/radiol.2242010703
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2242010703v1
224/2/560    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Huo, Z.
Right arrow Articles by Metz, C. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Huo, Z.
Right arrow Articles by Metz, C. E.
(Radiology 2002;224:560-568.)
© RSNA, 2002


Breast Imaging

Breast Cancer: Effectiveness of Computer-aided Diagnosis—Observer Study with Independent Database of Mammograms1

Zhimin Huo, PhD2, Maryellen L. Giger, PhD, Carl J. Vyborny, MD, PhD and Charles E. Metz, PhD

1 From the Department of Radiology, University of Chicago, 5841 S Maryland Ave, MC2026, Chicago, IL 60637. From the 1999 RSNA scientific assembly. Received March 29, 2001; revision requested May 21; final revision received January 14, 2002; accepted February 6. Supported in part by U.S. Army Medical Research and Materiel Command grant DAMD 17-96-1-6058. Address correspondence to M.L.G. (e-mail: m-giger@uchicago.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
PURPOSE: To evaluate the effectiveness of a computerized classification method as an aid to radiologists reviewing clinical mammograms for which the diagnoses were unknown to both the radiologists and the computer.

MATERIALS AND METHODS: Six mammographers and six community radiologists participated in an observer study. These 12 radiologists interpreted, with and without the computer aid, 110 cases that were unknown to both the 12 radiologist observers and the trained computer classification scheme. The radiologists’ performances in differentiating between benign and malignant masses without and with the computer aid were evaluated with receiver operating characteristic (ROC) analysis. Two-tailed P values were calculated for the Student t test to indicate the statistical significance of the differences in performances with and without the computer aid.

RESULTS: When the computer aid was used, the average performance of the 12 radiologists improved, as indicated by an increase in the area under the ROC curve (Az) from 0.93 to 0.96 (P < .001), by an increase in partial area under the ROC curve (0.90A'z) from 0.56 to 0.72 (P < .001), and by an increase in sensitivity from 94% to 98% (P = .022). No statistically significant difference in specificity was found between readings with and those without computer aid ({Delta} = -0.014; P = .46; 95% CI: -0.054, 0.026), where {Delta} is difference in specificity. When we analyzed results from the mammographers and community radiologists as separate groups, a larger improvement was demonstrated for the community radiologists.

CONCLUSION: Computer-aided diagnosis can potentially help radiologists improve their diagnostic accuracy in the task of differentiating between benign and malignant masses seen on mammograms.

© RSNA, 2002

Index terms: Breast neoplasms, 00.31, 00.32 • Breast neoplasms, radiography, 00.111, 00.119 • Breast radiography, 00.111, 00.119 • Computers, diagnostic aid • Receiver operating characteristic curve (ROC)


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Breast cancer is a leading cause of death among women in the United States (1). Early detection of breast cancer by means of screening mammography is the most effective way to reduce the mortality rate resulting from breast cancer (2,3). However, radiologists are unable to correctly classify all lesions detected at mammography as being benign or malignant. Computer-aided diagnosis has been investigated as a means to provide radiologists with objective information, such as estimates of the likelihood of malignancy, to aid in the classification of abnormalities detected at screening (4).

Recently, Jiang et al (5) and Chan et al (6) developed automated classification schemes based on features extracted by computers and performed observer studies to evaluate the effect of their classification schemes as an aid to radiologists in differentiating between benign and malignant breast lesions. Jiang et al used an artificial neural network to merge eight features of clustered microcalcifications to distinguish between benign and malignant disease. The performance of radiologists, in terms of the area under the receiver operating characteristic (ROC) curve (Az), was significantly (P < .001) improved in differentiating between benign and malignant clusters of microcalcifications when they used the information generated by the neural network. In addition, findings in an observer study showed that the use of the computer aid increased the number of malignant clusters of microcalcifications noted for biopsy and decreased the number of benign clusters noted for biopsy. Chan et al used a linear discriminant classifier to analyze 41 computer-extracted texture and morphologic features to classify benign and malignant mass lesions. The performance of radiologists in terms of Az also improved significantly (P = .007) for the task of differentiating between benign and malignant mass lesions when this computer aid was used.

We have developed a computerized scheme for the classification of mass lesions detected on mammograms (7,8). The scheme automatically extracts four characteristics of masses: spiculation, margin sharpness, density, and texture. An artificial neural network then merges the four features to generate an estimated likelihood of malignancy. The performance of the classification scheme is relatively unaffected by variation in the mammogram digitization or in case mix (9).

The purpose of our study was to evaluate the effectiveness of our automated classification scheme as an aid for radiologists reviewing clinical mammograms for which the diagnoses were unknown to both the radiologist and the computer. Unlike previous observer studies with computer-aided diagnosis, this experiment more closely simulates the likely eventual clinical application of computer-aided diagnosis in which it can be expected that the radiologist and computer will each be "seeing" a given case for the first time. In all prior studies, to our knowledge, a single database has been used for both training and testing, with use of a "leave-one-out" method to yield output for their observer studies.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Database
The database used in the observer study consisted of 50 biopsy-proven malignant masses, 50 biopsy-proven benign masses, and 10 cysts proved with fine-needle aspiration. The 110 cases were collected (Z.H.) from a list of consecutive lesions sampled at biopsy in our institution for which the screening and diagnostic mammograms were available. An institutional review board approved the protocol for retrospective use of the mammograms and clinical data. For each case, all four standard views and one special view (eg, a spot-compression view or a spot-compression magnification view selected by an experienced mammographer [C.J.V.]) were collected. In eight of the 110 cases, the mass appeared in only one mammographic projection. All the mammograms were digitized by using a laser scanner (Lumiscan 100; Lumisys, Sunnyvale, Calif) at a 0.1-mm pixel size and 12-bit quantization (subsequently scaled to 10-bit quantization).

Computerized Classification Scheme
Figure 1 illustrates schematically our automatic computerized classification scheme. The algorithm has three main components: (a) automated segmentation of a mass lesion from its surrounding parenchyma, (b) automated extraction of four features (ie, the spiculation, margin sharpness, density, and texture measures), and (c) automated classification (ie, estimation of the likelihood of malignancy for each case). Lesion segmentation begins with a region of interest of 512 x 512 pixels centered about the mass lesion in question. Since this study pertained to the classification of masses and not to the initial detection of masses, we manually extracted the 512 x 512 regions of interest from the digitized mammograms. The regions of interest serve as input to the classification scheme. Note that the regions of interest input to our classification scheme can be identified by a computerized detection method (10) or by a radiologist. Given a region of interest, the computerized classification method outputs a number (from 0 to 100) related to the likelihood of malignancy.



View larger version (33K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Schematic of the overall classification scheme. ROI = region of interest.

 
The segmentation algorithm is based on a gray-level region-growing technique that extracts sufficiently detailed information about the margins of masses to allow subsequent accurate classification of mammographic lesions as benign or malignant (11). The computer then calculates four features related to spiculation of the lesion, margin sharpness, density (mean gray level), and texture (8). These features were previously selected from more than 54 features on the basis of their performance levels and robustness. A conventional four-input artificial neural network with two hidden units (Fig 1) is used to merge these four features into an estimated likelihood of malignancy, by using internal parameters determined from prior training with cases different from those in this experiment. The performance of our classification scheme with a training database of 95 images in the task of differentiating between benign and malignant lesions yielded a mean Az of 0.90 ± 0.04 (standard error) and a partial area index 0.90A'z of 0.40 ± 0.17 from round-robin analysis. The index 0.90A'z is used to indicate the performance above 90% sensitivity (12). The corresponding ROC curve is shown in Figure 2.



View larger version (33K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. ROC curves obtained from the round-robin evaluation of the training database and from the independent evaluation of the new database.

 
The previously trained computer algorithm was used to classify the database used in the observer study (110 cases), taking as input the regions of interest containing the mass lesions from the craniocaudal, mediolateral oblique, or special views of the affected breast. The output was an estimated likelihood of malignancy for each view on the basis of the computer analysis. The mean value of the computer outputs from the craniocaudal, mediolateral oblique, or special views was then taken as the overall likelihood of malignancy for the case. We have investigated methods with which to combine information from multiple views of a mass lesion in the task of differentiating between benign and malignant masses. We have compared the performances of the mean, maximum, and minimum of the computer outputs for all three views (13). Among the three combination methods, the mean of the computer outputs from the three views yielded the best performance in terms of Az and 0.90A'z. The mean from the three views performed at a level significantly (P < .05) better than that of the maximum from the three views in terms of Az but not in terms of 0.90A'z, the performance at high sensitivity level. Although the performance of the maximum of the computer outputs from the three views is inferior to that of the minimum, the performance at high sensitivity (ie, 0.90A'z), for the maximum is higher than that for the minimum. The performance of the classification scheme in differentiating between benign and malignant masses for this study database yielded a mean Az of 0.90 ± 0.04 and a mean 0.90A'z of 0.65 ± 0.09. This ROC curve is shown in Figure 2.

Observer Study
Twelve radiologists who are qualified to read mammograms according to the rules of the U.S. Food and Drug Administration (pursuant to the Mammography Quality Standards Act) participated in the observer study. The observers included six mammographers with a mean of 12 years of experience in interpreting mammograms (hereafter, mammographers) and six community hospital–based radiologists with a mean of 9 years of experience in interpreting mammograms (hereafter, community radiologists). All observers signed an institutional review board—approved consent form before participating in the observer study.

One reading session was held for each observer. During the session, two reading conditions were evaluated: (a) radiologist performance without the computer aid and (b) radiologist performance with the computer aid. For each clinical case, radiologists were shown the four standard mammographic views on a 17 x 17-inch (43 x 43-cm) video monitor, with arrows indicating the locations of the mass lesion. The monitor also displayed regions of interest containing the mass lesion in question from the craniocaudal, mediolateral oblique, or special views. The layout of the interface on the monitor is shown in Figure 3. The radiologists were able to enlarge the regions of interest to full resolution (0.1 mm/pixel) and were able to interactively change the contrast and brightness of the displayed images according to their preference.



View larger version (113K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3a. (a) Computer interface used in this observer study without the computer output (top left). Bottom left: Arrows indicate the location of the mass lesion. Right: Top blue bar is an analog scale from 0% to 100%, where the radiologist observer indicates his or her confidence level that the lesion is malignant. Second blue bar lists two recommendations for patient treatment: no biopsy or biopsy (surgical or core). CC = craniocaudal, Mag = magnification, MLO = mediolateral oblique. (b) Computer interface used in this observer study with the computer output (top left). Bottom left: Arrows indicate the location of the mass lesion. Right: Top blue bar is an analog scale from 0% to 100%, where the radiologist observer indicates his or her confidence level that the lesion is malignant. Second blue bar lists two recommendations for patient treatment: no biopsy or biopsy (surgical or core). CC = craniocaudal, Mag = magnification, MLO = mediolateral oblique.

 


View larger version (114K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3b. (a) Computer interface used in this observer study without the computer output (top left). Bottom left: Arrows indicate the location of the mass lesion. Right: Top blue bar is an analog scale from 0% to 100%, where the radiologist observer indicates his or her confidence level that the lesion is malignant. Second blue bar lists two recommendations for patient treatment: no biopsy or biopsy (surgical or core). CC = craniocaudal, Mag = magnification, MLO = mediolateral oblique. (b) Computer interface used in this observer study with the computer output (top left). Bottom left: Arrows indicate the location of the mass lesion. Right: Top blue bar is an analog scale from 0% to 100%, where the radiologist observer indicates his or her confidence level that the lesion is malignant. Second blue bar lists two recommendations for patient treatment: no biopsy or biopsy (surgical or core). CC = craniocaudal, Mag = magnification, MLO = mediolateral oblique.

 
Each radiologist was asked two questions under each of the two conditions. First, "What is your confidence that a lesion is malignant, on a continuous scale ranging from 0% to 100%," and second, "What is your recommendation for patient management, that is, long-term or short-term follow-up or core or surgical biopsy." Each observer first interpreted the case without the computer aid and then was asked to consider the computer-estimated likelihood of malignancy (indicated on the upper left corner of the screen, as shown in Fig 3) and modify his or her confidence rating and biopsy recommendation whenever needed. It should be noted that this sequential reading order is appropriate for our specific evaluation since such reading order mimics that which would be used clinically (14). In our situation, the stand-alone imaging modality (ie, the mammograms themselves) is being compared with a combination of that modality and a supplemental one (ie, the computer output). In addition, the stand-alone modality (ie, the mammograms themselves) is expected to be read first before the combination in clinical practice.

In order for the observers to use the computer output comfortably and effectively, a brief training session was held before the observer study. To establish the observers’ confidence in using computer results, we briefly described to the observers our computer classification scheme and its performance in the task of differentiating between benign and malignant masses. The observers were told that in the classification scheme, features were used that are similar to those used by radiologists when interpreting mass cases. They were also informed about the performance of the computer in terms of sensitivity and positive predictive value at a given threshold on the computer-estimated likelihood of malignancy; at a threshold of 45%, the computer yields a sensitivity of 96% and a positive predictive value of 67%. The observers learned to adjust their confidence levels to that of the computer while they went through the training cases. Each observer viewed as many as 20 training cases (10 malignant and 10 benign cases) first without and then with the computer-estimated likelihood of malignancy. The true diagnosis was given immediately after each training case. The order of the training cases was the same for all the observers. However, the order of the 110 independent test cases was randomized differently for each observer. No time limit was imposed on observers, and each observer decided when the computer output would be shown.

Data Analysis
The confidence ratings (that a lesion is malignant) from each observer were analyzed by using ROC analysis (15,16). The Az and a partial area index (0.90A'z) were calculated to summarize the overall performance and the performance above 90% sensitivity, respectively, of each observer in the task of differentiating between benign and malignant mass lesions. In addition, sensitivity and specificity were calculated for each observer on the basis of their biopsy recommendations. These performance measures, obtained under the two test conditions (without and with the computer aid), indicated the effect of the computer aid on the radiologists’ performance in differentiating between benign and malignant mass lesions and in recommending patient management. Since radiologists may operate at different thresholds in recommending biopsy, the second question was important to evaluate the ultimate effect of the computer aid on patient management (ie, sensitivity and specificity). The Student t test for paired data (17) was used to assess the significance of differences between the performances with or without computer aid for a group of radiologists.

We analyzed these results separately for the six mammographers, the six community radiologists, and all 12 radiologists to assess the effect of the computer aid on the performance of radiologists who have different levels of experience in mammography. Note that the Student t test for paired data does not attempt to account for case-sample variation and, therefore, does not need to account for the correlation arising from the fact that all the observers read the same images. However, because the cases used in the study were consecutive lesions sampled at biopsy and because they were considered to be clinically representative, we do not expect the variation in case selection to have a strong effect on the results obtained in this study. In addition, the Student t test was used to evaluate the significance of the difference between the mean performances of the two groups of radiologists (ie, six mammographers and six community radiologists).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Table 1 lists individual and mean performances of the radiologists in terms of Az, 0.90A'z, sensitivity, and specificity obtained under the two reading conditions. The mean performances were calculated for the six mammographers, the six community radiologists, and all 12 radiologists. The corresponding ROC curves for the three groups shown in Figures 46 were generated by averaging the slope and intercept parameters of individual ROC curves for the six mammographers, the six community radiologists, and all 12 radiologists, respectively.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Performance of Individual Radiologists without or with the Computer Aid

 


View larger version (29K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. ROC curves of the six mammographers with and without the computer aid.

 


View larger version (29K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. ROC curves of the six community radiologists with and without the computer aid.

 


View larger version (29K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6. ROC curves of all 12 radiologist observers with and without the computer aid.

 
The performances of radiologists in terms of Az, 0.90A'zz, and sensitivity improved at statistically significant levels (P < .05) on the basis of the t test analyses for the groups of the six mammographers, the six community radiologists, and all 12 radiologists when the computer aid was used. It appears that larger improvements in all three performance indices (Az, 0.90A'z, and sensitivity) were obtained for the community radiologists when compared with those obtained for the mammographers, particularly, in terms of 0.90A'z (from 0.45 to 0.66 vs from 0.67 to 0.78, respectively) and sensitivity (from 0.90 to 0.97 vs from 0.97 to 0.99, respectively). However, we did not attempt to demonstrate the statistical significance of this trend.

As shown in Table 1, the specificity calculated from individuals’ biopsy recommendations varied substantially among the radiologists under both reading conditions. The mean specificity decreased slightly for the 12 radiologists (from 0.64 ± 0.05 to 0.63 ± 0.04). However, findings with the Student t test for paired data failed to show a statistically significant decrease in specificity ({Delta} = -0.014; P = .46; 95% CI: -0.054, 0.026), where {Delta} is difference in specificity. Thus, use of the computer aid in this study did not appear to affect the number of benign cases sent for biopsy.

It is interesting to note that the cases recommended for biopsy under the two reading conditions varied across observers, as shown in Table 2. The patient management decision (whether or not to perform biopsy) was changed for 16 of the 50 cases (32%) when radiologists took the computer output into account. For malignant lesions, all of these changes for the 16 cases resulted in a change from follow-up to biopsy, as is illustrated in Figure 7a and 7b for the six mammographers and the six community radiologists, respectively. On average, 2.2 malignant cases were changed from follow-up to biopsy. It should be noted that for these malignant cases, only four patient management changes in four cases occurred among the six mammographers, whereas 22 changes in 15 cases from follow-up to biopsy occurred among the six community radiologists.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Change in Biopsy Recommendation after Use of the Computer Aid

 


View larger version (18K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 7a. Graphs show the number of changes in biopsy recommendation (whether or not to perform biopsy) for malignant cases made by (a) the six mammographers and (b) the six community radiologists after use of the computer aid for each of the 16 malignant cases that had at least one change in biopsy recommendation after use of the computer aid.

 


View larger version (19K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 7b. Graphs show the number of changes in biopsy recommendation (whether or not to perform biopsy) for malignant cases made by (a) the six mammographers and (b) the six community radiologists after use of the computer aid for each of the 16 malignant cases that had at least one change in biopsy recommendation after use of the computer aid.

 
Patient management decisions were changed in 44 of the 60 benign cases (73%) by at least one radiologist when the computer output was used. This is demonstrated in Table 2 and in Figure 8a and 8b for the six mammographers and the six community radiologists, respectively. On average, 4.3 benign cases were changed from follow-up to biopsy and 3.4 benign cases were changed from biopsy to follow-up.



View larger version (23K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 8a. Graphs show the number of changes in biopsy recommendation (whether or not to perform biopsy) for the benign cases made by (a) the six mammographers and (b) the six community radiologists after use of the computer aid for each of the 44 benign cases that had at least one change in biopsy recommendation after use of the computer aid.

 


View larger version (24K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 8b. Graphs show the number of changes in biopsy recommendation (whether or not to perform biopsy) for the benign cases made by (a) the six mammographers and (b) the six community radiologists after use of the computer aid for each of the 44 benign cases that had at least one change in biopsy recommendation after use of the computer aid.

 
We investigated the difference in diagnostic performance between the six mammographers and the six community radiologists. Without the computer aid, the performance (in terms of sensitivity) of the mammographers was significantly greater than that of the community radiologists (P = .045). However, when the computer aid was used, findings with the Student t test for a difference between means (assuming that the two populations have the same variance) failed to show a statistically significant difference in sensitivity between the mammographers and the community radiologists ({Delta} = 0.0167; P = .196; 95% CI: -0.010, 0.043). In addition, these differences in sensitivity, before and after using the computer aid, between the six mammographers and six community radiologists were evaluated by using the Student t test under a different assumption, namely, that the two populations have unequal variances. Under this assumption, both differences failed to achieve a statistically significant level at a critical P value ({alpha}) = .05: The difference in sensitivity between the two groups yielded a P value of .071 ({Delta} = 0.076; 95% CI: -0.011, 0.162) when the computer aid was not used, whereas the difference in sensitivity between the two groups yielded a P value of .26 ({Delta} = 0.017; 95% CI: -0.013, 0.045) when the computer aid was used.

We tested the differences with the Student t test with these two different assumptions concerning the variances of the two populations because the population variances are unknown. According to Hays (17), for samples with equal or nearly equal size, relatively large difference in the population variances does not seem to have strong effect on the conclusion drawn from a t test on the basis of the equal variance assumption. With the equal but small sample size (six observers from each group) in our study, a statistical test of the homogeneity of variance may not be reliable. Thus, the conclusion derived from the t tests assuming equal variances, according to Hays (17), may be more applicable to our case. In fact, the P values of .071 and .26 from the t tests under the assumption of unequal variances do not differ substantially from the P values (.045 and .196) obtained from the t tests under the assumption of equal variances for the differences in sensitivity between the two groups of radiologists without and with the computer aid, respectively.

On the basis of the evidence from the t tests under the two assumptions, we conclude that the performance (in terms of sensitivity) of the mammographers was better than that of the community radiologists when the computer aid was not used, and this improvement was marginally significant (P = .045 or .071, depending on the test used). When the computer aid was used, however, the performances of the two groups with respect to making correct biopsy recommendations for malignant cases clearly failed to show a statistically significant difference.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
We have shown that information generated with our automatic computerized classification method improved the performances of radiologists in the task of differentiating between benign and malignant disease. The improvement was seen as an increase in Az values, in 0.90A'z, and in sensitivity. The P values from the Student t test for paired data indicate the benefit of the computer aid to radiologists in differentiating between benign and malignant masses and in making correct biopsy recommendations for malignant cases. However, probably because the radiologists varied greatly in their interpretation of benign masses and in their biopsy recommendation for such cases, our results failed to show a statistically significant difference in specificity with computer aid.

As a second result, we have shown that use of computer aid has the potential to reduce the difference in performance between community radiologists and mammographers in the task of distinguishing between malignant and benign lesions (ie, in making correct biopsy recommendations for malignant cases). Such may become the eventual motivation for the application of computer-based classification algorithms in community practice. It is of interest to note that the computer-extracted features in our classification algorithm correspond to the major features used by radiologists in differentiating between benign and malignant mass lesions (18). Previously, we showed that our classification scheme achieved a performance similar to that of an experienced mammographer with the training database (8) and yielded a robust performance with the independent database. The robustness was evaluated in terms of the differences in performance, as indicated by both Az and 0.90A'z values, obtained with the training database and with the independent database. We found that the differences in Az and 0.90A'z failed to reach a statistically significant level (9).

In a clinical setting, neither radiologists nor a trained computer system would know the outcomes of cases presented for interpretation. To our knowledge, our study is the first of its kind in which the cases are distinct (independent) from those used for the training of the classification algorithm. That is, both the radiologist observers and the computer system were "viewing" the images for the first time.

We did not use the leave-one-out method for the cases in our observer study. The leave-one-out testing method has been widely used to prevent classifier overtraining and has served as a method to evaluate the ability of classifiers to generalize to new cases. The leave-one-out method may solve the problem of overtraining of the classifier. However, the results from a leave-one-out method may still be biased toward the training cases, because the overall training of a computer classification scheme includes not only the training of the classifier but also the training of other aspects of the computer method (eg, segmentation, feature extraction, and selection of features for input to the classifier).

We chose to use a video monitor for the observer study because it allowed the use of consecutive cases without problems related to mammogram availability. We believe that the image resolution required for the diagnosis of mass lesions seen at mammography is not as crucial as that required for the diagnosis of microcalcifications and that the resolution of 100 µm is therefore sufficient. The sensitivity of the mammographers in the task of identifying malignant lesions in this study (ie, 97% without aid) suggests that this is the case. In addition, high-resolution monitors are used for reading digital mammograms in many clinical practices. A commercial full-field digital mammography system with soft-copy display on high-resolution monitors has been approved recently by the U.S. Food and Drug Administration for use in the diagnosis of breast disease. It should be noted that the image resolutions used in our computerized analysis and in our image display are both at 100 µm, which is the same as the image resolution of the commercial full-field digital mammography system. Use of the video monitor for reading digitized mammographic images is a possible limitation of our study because the image quality degradation due to use of the video monitor and digitized images may theoretically affect radiologists’ performance in the task of differentiating between benign and malignant masses. However, findings in studies by others have shown that the diagnostic accuracies obtained by using conventional film and digitized mammographic images (soft-copy display on 1,024 x 1,024-resolution monitors) were at a similar level for the classification of mass lesions and microcalcifications (1921). In addition, results in ROC studies have shown that the reduction of pixel size from 100 to 35~50 µm on both digital and digitized mammographic images (printed as hard copies) did not yield a measurable improvement in the characterization of microcalcification clusters (22,23).

To our knowledge, the observer study presented herein is the first in which the cases for interpretation were unknown to both the radiologist observers and the computer. We have shown that the use of a diagnostic computer aid improved the abilities of both mammographers and community radiologists to differentiate between benign and malignant masses on mammograms, as indicated by the statistically significant improvement in values for areas under the ROC curve, Az and 0.90A'z. The sensitivities of their biopsy recommendations when they used the computer aid was also improved at a statistically significant level. However, the study had no effect on their performance regarding the number of benign cases sent for biopsy. In addition, our results show that when the computer aid was used, improvements for the community radiologists were larger than those observed for the mammographers.


    ACKNOWLEDGMENTS
 
The authors are grateful to the 12 radiologists who participated in the observer study. The authors thank Roger Engelmann, MS, for developing the computer software package for the display interface and Robert M. Nishikawa, PhD, and Yulei Jiang, PhD, for their useful discussion in the computer interface design.


    FOOTNOTES
 
2 Current address: Health Imaging Research Lab, Eastman Kodak, Rochester, NY. Back

Z.H., M.L.G., C.J.V., and C.E.M. are shareholders in R2 Technology, Los Altos, Calif. It is the policy of the University of Chicago that investigators disclose publicly actual or potential significant financial interests that may appear to be affected by the research activities.

Abbreviations: Az = area under the ROC curve, 0.90A'z = partial area under the ROC curve, {Delta} = difference in specificity, ROC = receiver operating characteristic

Author contributions: Guarantors of integrity of entire study, Z.H., M.L.G.; study concepts and design, all authors; literature research, Z.H.; experimental studies, Z.H.; data acquisition, Z.H., C.J.V.; data analysis/interpretation, Z.H., M.L.G., C.E.M.; statistical analysis, Z.H., C.E.M.; manuscript preparation, Z.H., M.L.G.; manuscript definition of intellectual content, editing, revision/review, and final version approval, all authors.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Parker SL, Tong T, Bolden S, Wingo PA. Cancer Statistics. CA Cancer J Clin 1997; 47:5-27.[Medline]
  2. Smith RA. Screening women aged 40–49: where are we today? J Natl Cancer Inst 1995; 87:1198-1199.[Free Full Text]
  3. Tabar L, Fagerberg G, Chen RH. Efficacy of breast screening by age: new results from the Swedish two county trial. Cancer 1995; 75:1412-1419.
  4. Giger M, Huo Z, Kupinski M, Vyborny C. Computer-aided diagnosis in mammography. In: Sonka M, Fitzpatrick M, eds. Handbook of medical imaging. Washington, DC: SPIE, 2000; 2:915-986.
  5. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K. Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999; 6:22-33.[CrossRef][Medline]
  6. Chan HP, Sahiner B, Helvie MA, et al. Improvement of radiologists’ characterization of mammographic masses by using computer-aided diagnosis: an ROC study. Radiology 1999; 212:817-827.[Abstract/Free Full Text]
  7. Huo Z, Giger ML, Vyborny CJ, et al. Analysis of spiculation in the computerized classification of mammographic masses. Med Phys 1995; 22:1569-1579.[CrossRef][Medline]
  8. Huo Z, Giger ML, Vyborny CJ, Wolverton DE, Schmidt RA, Doi K. Automated computerized classification of malignant and benign masses on digitized mammograms. Acad Radiol 1998; 5:155-168.[CrossRef][Medline]
  9. Huo Z, Giger ML, Vyborny CJ, Wolverton DE, Metz CE. Computerized classification of benign and malignant masses on digitized mammograms: a study of robustness. Acad Radiol 2000; 7:1077-1084.[CrossRef][Medline]
  10. Yin FF, Giger ML, Doi K, Metz C, Vyborny CJ, Schmidt RA. Computerized detection and analysis of masses in digital mammograms: analysis of bilateral-subtraction images. Med Phys 1991; 18:955-963.[CrossRef][Medline]
  11. Huo Z, Giger ML. Evaluation of a computer segmentation method based on performances of an automated classification method. Proc SPIE 2000; 3981:16-21.[CrossRef]
  12. Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristics partial area index for highly sensitive diagnostic tests. Radiology 1996; 201:745-750.[Abstract/Free Full Text]
  13. Huo Z, Giger ML, Vyborny CJ. Computerized analysis of multiple-mammographic views: potential usefulness of special view mammograms in computer-aided diagnosis. IEEE Trans Med Imaging 2001; 20:1285-1292.[CrossRef][Medline]
  14. Metz C. Fundamental ROC analysis. In: Beutel J, Metter R, Kundel H, eds. Handbook of medical imaging. Vol 1. Washington, DC: SPIE, 2000; 751-764.
  15. Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21:720-733.[Medline]
  16. Metz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989; 24:234-245.[Medline]
  17. Hays WL. Statistics Philadelphia, Pa: Harcourt Brace College, 1994.
  18. D’Orsi CJ, Kopans DB. Mammographic feature analysis. Semin Roentgenol 1993; 28:204-230.[CrossRef][Medline]
  19. Nab HW, Karssemeijer N, Erning LJV, Hendriks JH. Comparison of digital and conventional mammography: a ROC study of 270 mammograms. Med Inform 1991; 17:125-132.
  20. Karssemeijer N, Frieling JTM, Hendricks JHCL. Spatial-resolution in digital mammography. Invest Radiol 1993; 28:413-419.[CrossRef][Medline]
  21. Powell KA, Obuchowski NA, Chilcote WA, Barry MM, Gannobcik SN, Cardenosa G. Film-screen versus digitized mammography: assessment of clinical equivalence. AJR Am J Roentgenol 1999; 173:889-894.[Abstract/Free Full Text]
  22. Levy LD, Muller SL, Priday K, Rick A. Impact of pixel size for the differentiation of benign and malignant microcalcifications (abstr). Radiology 2000; 217(P):105.[Abstract/Free Full Text]
  23. Chan HP, Helvie MA, Petrick N, et al. Observer performance study of the effects of pixel size on the characterization of malignant and benign microcalcifications. Acad Radiol 2001; 8:454-466.[CrossRef][Medline]



This article has been cited by other articles:


Home page
NEJMHome page
F. J. Gilbert, S. M. Astley, M. G.C. Gillan, O. F. Agbaje, M. G. Wallis, J. James, C. R.M. Boggis, S. W. Duffy, and the CADET II Group
Single Reading with Computer-Aided Detection for Screening Mammography
N. Engl. J. Med., October 16, 2008; 359(16): 1675 - 1684.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. Hock, R. Ouhadi, R. Materne, A.-S. Aouchria, I. Mancini, T. Broussaud, P. Magotteaux, and A. Nchimi
Virtual Dissection CT Colonography: Evaluation of Learning Curves and Reading Times with and without Computer-aided Detection
Radiology, September 1, 2008; 248(3): 860 - 868.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
S. Kakeda, Y. Korogi, H. Arimura, T. Hirai, S. Katsuragawa, T. Aoki, and K. Doi
Diagnostic Accuracy and Reading Time to Detect Intracranial Aneurysms on MR Angiography Using a Computer-Aided Diagnosis System
Am. J. Roentgenol., February 1, 2008; 190(2): 459 - 465.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
B. Sahiner, H.-P. Chan, M. A. Roubidoux, L. M. Hadjiiski, M. A. Helvie, C. Paramagul, J. Bailey, A. V. Nees, and C. Blane
Malignant and Benign Breast Masses on 3D US Volumetric Images: Effect of Computer-aided Diagnosis on Radiologist Accuracy
Radiology, March 1, 2007; 242(3): 716 - 724.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
E. S. Burnside, D. L. Rubin, J. P. Fine, R. D. Shachter, G. A. Sisney, and W. K. Leung
Bayesian Network to Predict Breast Cancer Risk of Mammographic Microcalcifications and Reduce Number of Benign Biopsy Results: Initial Experience
Radiology, September 1, 2006; 240(3): 666 - 673.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
K. Horsch, M. L. Giger, C. J. Vyborny, L. Lan, E. B. Mendelson, and R. E. Hendrick
Classification of Breast Lesions with Multimodality Computer-aided Diagnosis: Observer Study Results on an Independent Clinical Data Set.
Radiology, August 1, 2006; 240(2): 357 - 368.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
L. Hadjiiski, B. Sahiner, M. A. Helvie, H.-P. Chan, M. A. Roubidoux, C. Paramagul, C. Blane, N. Petrick, J. Bailey, K. Klein, et al.
Breast Masses: Computer-aided Diagnosis with Serial Mammograms
Radiology, August 1, 2006; 240(2): 343 - 356.
[Abstract] [Full Text] [PDF]


Home page
Br. J. Radiol.Home page
D J Manning, A Gale, and E A Krupinski
Perception research in medical imaging
Br. J. Radiol., August 1, 2005; 78(932): 683 - 685.
[Full Text] [PDF]


Home page
RadiologyHome page
J. A. Baker, E. L. Rosen, M. M. Crockett, and J. Y. Lo
Accuracy of Segmentation of a Commercial Computer-aided Detection System for Mammography
Radiology, May 1, 2005; 235(2): 385 - 390.
[Abstract] [Full Text] [PDF]


Home page
JAMAHome page
J. G. Elmore, K. Armstrong, C. D. Lehman, and S. W. Fletcher
Screening for Breast Cancer
JAMA, March 9, 2005; 293(10): 1245 - 1256.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
E. E. Deurloo, S. H. Muller, J. L. Peterse, A. P. E. Besnard, and K. G. A. Gilhuijs
Clinically and Mammographically Occult Breast Lesions on MR Images: Potential Effect of Computerized Assessment on Clinical Reading
Radiology, March 1, 2005; 234(3): 693 - 701.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
F. Li, M. Aoyama, J. Shiraishi, H. Abe, Q. Li, K. Suzuki, R. Engelmann, S. Sone, H. MacMahon, and K. Doi
Radiologists' Performance for Differentiating Benign from Malignant Lung Nodules on High-Resolution CT Using Computer-Estimated Likelihood of Malignancy
Am. J. Roentgenol., November 1, 2004; 183(5): 1209 - 1215.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
L. Hadjiiski, H.-P. Chan, B. Sahiner, M. A. Helvie, M. A. Roubidoux, C. Blane, C. Paramagul, N. Petrick, J. Bailey, K. Klein, et al.
Improvement in Radiologists' Characterization of Malignant and Benign Breast Masses on Serial Mammograms with Computer-aided Diagnosis: An ROC Study
Radiology, October 1, 2004; 233(1): 255 - 265.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
2242010703v1
224/2/560    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Huo, Z.
Right arrow Articles by Metz, C. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Huo, Z.