|
|
||||||||
Breast Imaging |
1 From the Departments of Biomedical Engineering and Radiology, Digital Imaging Research Division, Duke University Medical Center, DUMC 3302, Durham, NC 27710. Received July 23, 2001; revision requested September 4; revision received October 12; accepted December 10. Supported in part by U.S. Public Health Service grants R29-CA75547, R21-CA092573, and R21-CA81309 awarded by the National Cancer Institute; Whitaker Foundation grants RG-97-0322 and SO-97-0035; U.S. Army Medical Research and Materiel Command grant DAMD17-99-1-9174 awarded by the U.S. Army; and Susan G. Komen Breast Cancer Foundation grants 9803 and BCTR2000730A. Address correspondence to M.K.M. (e-mail: markey@duke.edu).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: A feed-forward, back-propagation artificial neural network (BP-ANN) was trained in a round-robin (leave-one-out) manner to predict biopsy outcome from mammographic findings (according to the Breast Imaging Reporting and Data System) and patient age. The BP-ANN was trained by using a large (>1,000 cases) heterogeneous data set containing masses and microcalcifications. The performances of the BP-ANN on masses and microcalcifications were compared with use of receiver operating characteristic analysis and a z test for uncorrelated samples.
RESULTS: The BP-ANN performed significantly better on masses than microcalcifications in terms of both the area under the receiver operating characteristic curve and the partial receiver operating characteristic area index. A similar difference in performance was observed with a second model (linear discriminant analysis) and also with a second data set from a similar institution.
CONCLUSION: Masses and calcifications should be considered separately when evaluating CAD systems for breast cancer diagnosis.
© RSNA, 2002
Index terms: Breast neoplasms, 00.31, 00.32 Breast neoplasms, calcification, 00.81 Breast neoplasms, diagnosis, 00.129 Computers, diagnostic aid Computers, neural network
| INTRODUCTION |
|---|
|
|
|---|
CAD of breast cancer is the application of computational techniques to the problem of interpreting breast images, usually mammograms (79). There are two major topics in breast cancer CAD: detection of mammographic lesions and diagnosis of cancer from identified lesions. In the detection task, the goal is to assist a radiologist in the identification, and often the localization, of lesion-containing regions of mammograms. In the diagnosis task, the goal is to assist a radiologist in determining whether an identified breast lesion is an indication of cancer. This study focused on the diagnosis of breast lesions that had already been identified by radiologists as suspicious enough to warrant biopsy. In other words, these cases are generally considered indeterminate and more challenging, and any reduction in the number of benign biopsies represents an improvement over the status quo, provided high sensitivity is maintained.
Most breast biopsy is performed on lesions that manifest mammographically as either a mass or a cluster of microcalcifications (10). CAD systems for detection generally perform better on calcifications than on masses, as shown in two review articles (8,11) and a recent study from a commercial CAD vendor (12). CAD systems for diagnosis that are based on features automatically extracted from the images are typically designed for either masses or calcifications alone. We are unaware of any previous attempts to compare the performance on masses and calcifications within a single study. Given the differences in databases and techniques with CAD systems for diagnosis, direct comparison of the published performances on masses and calcifications is not possible. However, the authors of classification studies on masses (13,14) report performances that are better than those reported in studies on calcifications (15,16). CAD systems for diagnosis that are based on findings extracted by radiologists are often trained and evaluated over heterogeneous data sets including both masses and calcifications, and the performances on masses and calcifications are not reported separately (1720). The purpose of our study was to compare the performance of a CAD system for diagnosis of already detected lesions, based on radiologist-extracted findings on masses and calcifications.
| MATERIALS AND METHODS |
|---|
|
|
|---|
We collected data on 1,530 nonpalpable mammographically suspicious breast lesions on which biopsy (core or excisional) was performed from 1990 to 2000 at Duke University Medical Center. The data were collected over several discontinuous time periods, but were collected consecutively within each time period. Of the 1,530 cases, 61 were removed because it was not certain that they were nonpalpable. In addition, 16 cases were removed because the radiologists assessment of the likelihood of malignancy was unavailable. Thus, the primary data consisted of 1,453 approximately consecutive, nonpalpable, mammographically suspicious breast lesions. Experienced mammographers summarized each case according to the Breast Imaging Reporting and Data System (BI-RADS) lexicon (21). Each of the cases was read by one of seven readers. The 475 cases collected from 1990 to 1996 were read retrospectively, and the 978 cases collected from 1996 to 2000 were read prospectively.
Of the 1,453 cases, 508 (35%) were found to be malignant at biopsy. For the purposes of this study, a case was considered a "mass case" if mass features were present and no values were missing for any of the mass or calcification features. Likewise, a case was considered a "calcification case" if calcification features were present, but no mass features were present, and no values were missing for any of the mass or calcification features. There were 615 cases with masses, including 65 cases with calcifications in addition to a mass. There were 622 cases with calcifications that did not have masses as well. The PPVs for the mass cases (223/615 = 36%) and the calcification cases (209/622 = 34%) were similar (P = .65,
2 test for independence; 95% CI for malignancy fraction = -0.027, 0.080). The remaining 216 cases consisted of cases with neither a mass nor calcifications (n = 132) and cases with incomplete descriptions of the mass or calcifications that were present (n = 84). A mass was considered incompletely described if there were missing values for some of the mass or calcification features. Likewise, a calcification was considered incompletely described if there were missing values for some of the calcification features. The cases without a mass or calcifications were described by other findings, such as architectural distortion. When the value was missing for a feature, it was encoded in the same manner as if the finding was not present. All 1,453 cases, including the 216 cases with neither a mass nor calcifications, were used in building the CAD models for diagnosis.
A second data set consisted of 1,000 consecutive mammographically suspicious breast lesions on which excisional biopsy was performed from 1990 to 1997 at the University of Pennsylvania Medical Center. Experienced mammographers summarized each case according to the BI-RADS lexicon (21). Each of the cases was read retrospectively by one of 11 readers. Of the 1,000 cases, 396 (40%) were found to be malignant at biopsy. There were 481 cases with masses, including 10 cases with calcifications in addition to a mass. There were 449 cases with calcifications that did not also have masses. The PPV observed for the masses (191/481 = 40%) was the same as that for the calcifications (178/449 = 40%). There were 70 other cases, most (n = 68) of which were cases with incompletely described masses or calcifications. All 1,000 cases, including the incompletely described ones, were used in training the CAD models for diagnosis.
Specifically, the BI-RADS features collected were mass margin, mass shape, mass density, mass size, calcification morphology, calcification distribution, and associated and special findings. Although not a part of the BI-RADS specification, the number of calcifications is routinely collected at both institutions and was also included. The number of calcifications was indicated as no calcifications present, fewer than five, five to 10, or more than 10 calcifications present. The location of the lesion was also included and was encoded as posterior, central, axillary tail, subareolar, lower inner quadrant, lower outer quadrant, upper inner quadrant, or upper outer quadrant.
In addition to the BI-RADS findings, patient age was collected. For the cases from Duke University Medical Center, the mean age was 56 years, with a range of 2387 years. For the cases from the University of Pennsylvania Medical Center, the mean age was 55 years, with a range of 1792 years. Age is known to be an important risk factor for breast cancer. Increasing age is associated with increasing risk of breast cancer; a 60-year-old white American woman has a 14-fold increase in her chances of developing breast cancer relative to a 30-year-old white American woman (5). In agreement with the epidemiologic data, some evidence exists that age is a particularly valuable input in our predictive models (22).
For the cases from Duke University Medical Center, the mammographers indicated on a scale of 15 their assessment of the likelihood of malignancy. These assessment data were not available for the cases collected at the University of Pennsylvania Medical Center. An assessment of 1 indicated benign findings; 2, likely benign findings; 3, indeterminate findings; 4, likely malignant findings; and 5, malignant findings. The mammographers assessment of malignancy was collected at the same time as the BI-RADS descriptors. As mentioned, some of the cases were read retrospectively and some were read prospectively, and although several mammographers participated in the study, each case was read by a single mammographer. Notice that this assessment is not the same as the BI-RADS clinical assessment. Moreover, this assessment does not directly correspond to the clinical task of deciding whether a patient should be referred to biopsy or follow-up. Since all the cases in the data set were subjected to biopsy, the mammographers were by definition performing with 100% relative sensitivity and 0% relative specificity on this data set (PPV, 508/1,453 = 35%). (Notice that these relative measures are not indicative of the radiologists performances over a general screening or diagnostic mammography patient population in which most actually benign cases are correctly referred to follow-up.) Nevertheless, their assessment of the likelihood of malignancy is useful as an approximation to an internal intermediate state in the decision process.
Artificial Neural Network
A feed-forward back-propagation artificial neural network (BP-ANN) can learn a function mapping inputs to outputs by being trained with cases of input-output pairs (2325). The network inputs were the BI-RADS features and patient age. The network had a single hidden layer and one output node indicating malignancy. Each neuron in the network used a logistic activation function, y = 1/(1 + e-x). The BP-ANN was trained to minimize the sum-of-squares error by using the back-propagation algorithm (2325). A binary variable indicating benign or malignant was used as the network targets. The target values were clipped to 0.1 and 0.9 to ensure that the network weights remained finite (sigmoid units cannot produce 0 or 1). The network weights were updated after the presentation of each case (stochastic gradient descent), which can help alleviate the problem of local minima. A momentum term was used, which can also help the network escape local minima. The training cases were presented to the network in a round-robin (leave-one-out) manner. To avoid overtraining, network training ended when the average testing error on the left-out cases began to increase (early stopping). The network parameters (learning rate, momentum, and number of hidden nodes in the single hidden layer) were empirically optimized. The custom neural network software used was written by members of our laboratory and has been used in several previous publications (22).
Linear Discriminant Analysis
Linear discriminant analysis (LDA) was performed on the data collected at Duke University Medical Center. LDA is a common statistical technique for linear classification. The same input findings were used, and the cases were used in a round-robin fashion as with the BP-ANN. The LDA was computed by using the implementation in SAS software (SAS Institute, Cary, NC).
Receiver Operating Characteristic
The models were evaluated in terms of their receiver operating characteristic (ROC) curves. ROC curves enable the user to evaluate a model in terms of the trade-offs between sensitivity and specificity (26,27). The performance of classification methods can be evaluated by directly comparing their ROC curves or by comparing indices calculated from their curves. The most commonly used index is the area under the ROC curve (Az). Notice that the values for Az range from 0.5 for chance to 1.0 for a perfect classifier.
In breast cancer diagnosis, the decision task is whether to refer a suspicious case to biopsy or recommend follow-up imaging. A true-positive finding would be an actual cancer that was correctly referred to biopsy. A true-negative finding would be an actual benign lesion that was correctly recommended for follow-up imaging. The cost of missing a cancer (false-negative finding) far outweighs that of an unnecessary benign biopsy (false-positive finding). As a result, we were most concerned about the high sensitivity region of the curve, so we also used the partial area index (0.90Az') calculated on that portion of the curve (true-positive fraction, 0.91.0) (28,29). The partial area index is the partial area normalized such that it ranges from 0.05 for chance to 1.0 for a perfect classifier. ROC analysis was performed by using software modified and provided by Charles Metz at the University of Chicago. The modified LABROC4 software (maximum likelihood, semiparametric fit) was used to calculate the ROC curves and the curve indices, Az and 0.90Az'. Statistical comparisons were made with use of a standard z test since there was no correlation between the mass and calcification cases. A P value of less than .01 was considered to indicate a statistically significant difference.
| RESULTS |
|---|
|
|
|---|
2 test for independence; 95% CI for malignancy fraction = -0.027, 0.080). Notice as well that since each case was read by a single mammographer and the study included seven readers, the assessment was pooled across mammographers.
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
A BP-ANN trained in a round-robin fashion on a heterogeneous set of biopsy-proved breast lesions was found to perform significantly better on masses than calcifications in terms of the ROC area and the partial area index. This difference was seen with use of two data sets collected at different institutions, which argues that this phenomenon is not a function of a particular data set. A similar difference in performance on masses and calcifications was seen when another predictive model, LAD, was used. Moreover, in a separate study conducted at Duke University Medical Center, a similar difference in performance was observed with a constraint satisfaction neural network (30). This indicates that the observed performance differential is not specific to BP-ANN models. However, it is possible that if some other classification technique were used, such differences would not be observed between masses and calcifications. Finally, when the mammographers assessment of the likelihood of malignancy was used as a decision variable, it was found that they too seemed to be able to more accurately assess the masses than the calcifications. Notice, however, that there is no corresponding difference in their clinical recommendations, based on the PPV of biopsy for those two subsets of cases. Taken together, these findings suggest that masses and calcifications should be considered separately when evaluating CAD systems for breast cancer diagnosis. It should be recalled that the "masses" in this study included both calcified and noncalcified masses and that the presence of calcifications in addition to a primary mass lesion may affect the classification of that mass by either a computational technique or a mammographer.
Recent work by Huo et al (14,31) describes a CAD system for diagnosis of breast masses that handles spiculated and nonspiculated masses separately and is superior to a CAD system that was developed on a mixture of spiculated and nonspiculated masses. The work described herein can be interpreted as further evidence of the effect of distinct subsets on the performance of the breast cancer CAD models for diagnosis. As larger databases become available for developing CAD models for diagnosis, it may be beneficial to develop modular systems with submodels that are specialized for subsets of the data. Alternatively, when a single CAD model for diagnosis is developed over a heterogeneous data set, such as one containing both mass and calcification cases, these results suggest that it would be appropriate to evaluate the performance of the overall model over the subsets of interest.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Author contributions: Guarantor of integrity of entire study, M.K.M.; study concepts and design, M.K.M., J.Y.L., C.E.F.; literature research, M.K.M., J.Y.L.; experimental studies, M.K.M., J.Y.L., C.E.F.; data acquisition and analysis/interpretation, M.K.M., J.Y.L., C.E.F.; statistical analysis, M.K.M., J.Y.L., C.E.F.; manuscript preparation, definition of intellectual content, editing, revision/review, and final version approval, M.K.M., J.Y.L., C.E.F.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A Karahaliou, S Skiadopoulos, I Boniatis, P Sakellaropoulos, E Likaki, G Panayiotakis, and L Costaridou Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis Br. J. Radiol., August 1, 2007; 80(956): 648 - 656. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Helvie, L. Hadjiiski, E. Makariou, H.-P. Chan, N. Petrick, B. Sahiner, S.-C. B. Lo, M. Freedman, D. Adler, J. Bailey, et al. Sensitivity of Noncommercial Computer-aided Detection System for Mammographic Breast Cancer Detection: Pilot Clinical Trial Radiology, April 1, 2004; 231(1): 208 - 214. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |