Published online before print January 22, 2004, 10.1148/radiol.2303030089
(Radiology 2004;230:820-823.)
© RSNA, 2004
Computer-aided Classification of BI-RADS Category 3 Breast Lesions1
Shalom S. Buchbinder, MD,
Isaac S. Leichter, PhD,
Richard B. Lederman, MD,
Boris Novak, PhD,
Philippe N. Bamberger, PhD,
Miryam Sklair-Levy, MD,
Gail Yarmish, MD and
Scott I. Fields, MD
1 From the Department of Radiology, Albert Einstein College of Medicine, Montefiore Medical Center, Bronx, NY (S.S.B., G.Y.); School of Engineering, Jerusalem College of Technology, Israel (I.S.L., B.N., P.N.B.); and Department of Radiology, Hadassah University Hospital, Jerusalem, Israel (R.B.L., M.S.L., S.I.F.). From the 2002 RSNA scientific assembly. Received January 16, 2003; revision requested March 25; revision received July 29; accepted September 5. Address correspondence to S.S.B., Staten Island University Hospital, 475 Seaview Ave, Staten Island, NY 10305-3498 (e-mail: sbuchbinder@siuh.edu).
 |
ABSTRACT
|
|---|
PURPOSE: To evaluate a system for computer-aided classification (CAC) of lesions assigned to Breast Imaging Reporting and Data System (BI-RADS) category 3 at conventional mammographic interpretation.
MATERIALS AND METHODS: A CAC system was used to analyze 106 cases of lesions (42 malignant) that at blinded retrospective interpretation were assigned to BI-RADS category 3 by at least two of four radiologists. The CAC system automatically extracted from the digitized mammograms quantitative features that characterized the lesions. The system then used a classification scheme to score the lesions by the likelihood of their malignancy on the basis of these features. The classification scheme was trained with 646 pathologically proved cases (323 malignant), and the results were tested with receiver operating characteristic (ROC) analysis by using the jackknife method. Sensitivity, specificity, positive predictive value, and accuracy were calculated. Category 3 lesions were stratified among BI-RADS categories 25 according to CAC-assigned lesion score, and this classification was compared with the results of pathologic analysis.
RESULTS: Jackknife analysis of CAC results in the training data set yielded a sensitivity of 94%, specificity of 78%, positive predictive value of 81%, and area under the ROC curve of 0.90. Of the 42 malignant lesions that had been classified at conventional interpretation as probably benign, nine were assigned by the CAC system to BI-RADS category 4, and 29 were assigned to category 5. The CAC system correctly upgraded the BI-RADS classification of these 38 lesions (sensitivity, 90%) and incorrectly upgraded the classification of only 20 benign lesions (specificity, 69%).
CONCLUSION: The CAC system scored 38 of the 42 malignant lesions initially assigned to BI-RADS category 3 as BI-RADS category 4 or 5, and thus correctly upgraded the category in 90% of these lesions.
© RSNA, 2004
Index terms: Breast neoplasms, 00.30 Breast neoplasms, diagnosis, 00.11 Computers, diagnostic aid
 |
INTRODUCTION
|
|---|
"Probably benign findingshort interval follow-up suggested" is the third of five numbered categories for summarizing and classifying mammographic interpretation that are outlined in the Breast Imaging Reporting and Data System (BI-RADS) (1) of the American College of Radiology. "Probably benign" also was approved by the federal government as a category of final assessment and is incorporated in the Mammography Quality Standards Act (2). This assessment category is commonly applied to lesions detected mammographically that are considered to have a very low probability of malignancy. However, no empirically validated data are available to support the use of this assessment category in particular cases (1). The percentage of mammograms that are assigned to category 3 varies among radiologists from 1.4% to 14.0% (3,4). The extent of variation indicates that this category is not used in a similar manner by all radiologists. Although category 3 is used relatively infrequently, most radiologists think this category is needed for classifying lesions that are not clearly benign but do not require biopsy. Because of the inherent ambiguity of such a classification, however, controversy has arisen about how best to manage lesions assigned to this category (58). As indicated by the results of numerous studiesin particular, those from the University of California at San Francisco and the Hospital Pereira Rossell in Montevideo, Uruguaythe estimated probability of malignancy in lesions of this category is lower than 2% (4,9).
Although the use of this assessment category appears widespread among radiologists, some prefer not to recommend follow-up examinations at 6-month intervals because of fear of provoking unnecessary anxiety in the patient, increasing costs, and "potentially fanning the flames of screening skeptics" (5). The generally accepted practice nevertheless appears to be monitoring of category 3 lesions with follow-up examinations every 6 months for 12 years, depending on the case (6). If the lesion appearance on follow-up mammograms is unchanged, a return to routine annual screening is thought reasonable. This approach is useful for avoiding unnecessary biopsies in a large number of cases, but it also requires the cooperation of the patient. The patient incurs higher radiation exposures and may undergo a protracted period of increased anxiety. In addition, diagnosis of the small number of lesions that eventually are found malignant will have been delayed. It is therefore desirable to find ways of minimizing the use of this category while avoiding an increase in the number of unnecessary biopsies.
It has been demonstrated that malignant lesions initially considered to be probably benign on the basis of rigorous diagnostic criteria are typically at an early stage at the time of definitive diagnosis and that a postponement of biopsy in favor of monitoring has a minimal effect on the prognosis (6). This approach has many inherent advantages over biopsy at the time of initial evaluation: It allows for overall decreases both in the number of unnecessary biopsies (presumably those with low positive predictive value) and in biopsy-associated morbidity and substantial associated costs (10,11). We believe, however, that it is possible to further refine the use of this assessment category by using computer-aided classification (CAC). Although earlier studies showed an improvement in mammographic interpretation with use of CAC (1214), to our knowledge none have evaluated the performance of a CAC system specifically in classification of BI-RADS category 3 lesions. The use of CAC for this purpose, we expected, might result in negative screening results or definitive diagnoses of benign lesion in many patients, whereas other patients might be identified as requiring immediate biopsy. The purpose of our study, therefore, was to evaluate the performance of a CAC system in identifying malignancies among lesions assigned to BI-RADS category 3 at retrospective conventional mammographic interpretation.
 |
MATERIALS AND METHODS
|
|---|
Case Selection
A total of 752 cases of mammographically detected and pathologically proved mass lesions (365 malignant, 387 benign) were retrospectively culled from the archives of three university-affiliated medical institutions after an initial search of pathology department records to identify patients who had undergone breast biopsy. Consecutive cases that included a finding of malignancy at pathologic analysis and a finding of lesion at mammography performed within 2 months before biopsy were selected for the study. Cases that included a finding of benign lesion at pathologic analysis were selected for the study if they also included a finding of a lesion at mammography performed within 1 year prior to biopsy. The mean age of the patients was 54.2 years (range, 2785 years), and each patient had a mammographically identified lesion in which biopsy had been performed. The institutional review boards approved the use of these cases and did not require patient informed consent because the study was retrospective and patient anonymity was strictly preserved in all aspects.
Of these 752 cases, 547 underwent conventional retrospective interpretation. The study protocol specified that each case should be interpreted retrospectively four times, once each by four different radiologists in the study. In actuality, 529 cases were interpreted according to this protocol; 18 cases, however, were interpreted by only three radiologists. Each radiologist was blinded to the results of pathologic analysis, and each radiologist independently classified the lesions by using the BI-RADS categories. For this purpose, the radiologists were provided with the four standard mammographic views on which the initially reported findings that indicated the need for biopsy were clearly demarcated. Additional mammographic views available from the same examination and prior examinations also were provided at the request of the radiologist. In 106 cases (42 malignant lesions), the lesion was classified as BI-RADS category 3 by at least two radiologists, who suggested short-interval follow-up to monitor the lesion.
CAC Analysis
After the retrospective conventional interpretation, all of the mammograms were digitized at high resolution (600 dpi, 12 bits) by using a prototype CAC system described elsewhere (15), and the digital images were displayed on the computer monitor. An ellipse that encompassed the lesion but that did not necessarily correspond to its border was interactively defined on the digital image by one of the radiologists (R.B.L.), who was familiar with the CAC system and was also blinded to the results of pathologic analysis. Then the CAC system extracted quantitative features that characterized the lesion. Neither the findings at pathologic analysis nor the results of conventional mammographic interpretation were available to the radiologist during use of the CAC system for automated extraction of lesion features.
During this stage of classification, the CAC system extracted 50 features that characterized the findings according to spiculation, lesion shape, and definition of the mass margins. Spiculation was considered to be indicated by lines radiating from a centroid instead of by a saw-toothed border with a distinct margin. This analysis therefore also could be applied to areas of architectural distortion, to focal asymmetries, to masses that appeared smoothly marginated, and to masses in which the margins were partly obscured. For each mass, all extracted features and all findings at pathologic analysis were used as inputs for a stepwise discriminant analysis in which the power of each feature to discriminate benign lesions from malignant lesions was assessed. Features that contributed substantially to discrimination were selected by the stepwise discriminant analysis procedure for incorporation into a pattern recognition scheme that was developed to classify each lesion according to a score based on the extracted lesion features.
The pattern recognition scheme, which was based on the discriminant analysis method (16), classified each lesion by means of a single score derived from a combination of the features extracted by the CAC system and selected by the stepwise discriminant analysis procedure. To construct the classification scheme, the CAC system used a training procedure on a database of cases for which the extracted features were provided, along with the pathologic result, for each lesion. The training database was composed of the 646 cases (323 malignant and 323 benign lesions) remaining after the selection of 106 test cases from among the 752 cases culled from archives. After training was completed, the classification scheme, which was embedded in the software, assigned each lesion in the 106 test cases a single classifier or score on a continuous scale such that the higher the score, the higher the probability of malignancy.
After the lesions were scored according to the likelihood of their malignancy, limit values for score were calculated to stratify the lesions into score groups. The first limit was the score value below which no malignant lesions in the training set were found, the second limit was the score value below which 5% of the lesions were malignant (any small proportion of malignant lesionseg, 2%might have been chosen for this criterion), and the third limit was the score value above which 90% of the malignant lesions in the training set were found. Using these three limit values, the CAC system automatically stratified the lesions according to score into four groups that corresponded to BI-RADS categories 25.
Statistical Analysis
The jackknife technique (17) was applied to evaluate the performance of the CAC system in classifying the training set of 646 lesions (323 malignant), which did not include the 106 lesions that were classified as category 3 lesions at conventional retrospective interpretation. The jackknife or "leave-one-out" technique consisted of 646 rounds. In each round, the features extracted by the CAC system, as well as the findings at pathologic analysis, were provided for 645 cases, while the findings at pathologic analysis were withheld for the case being analyzed. The classification scheme assigned the lesion in that case a single score based exclusively on a combination of extracted features. Using the three limit values described earlier (see CAC Analysis), the CAC system then automatically stratified the cases by score into four groups signifying BI-RADS categories 25. In the statistical analysis, categories 2 and 3 were considered to indicate negative findings, and categories 4 and 5 were considered to indicate positive findings.
The results were evaluated with receiver operating characteristic (ROC) analysis (18). To calculate the specificity, positive predictive value, and accuracy of classification by the CAC system, a specific cut point value was defined that allowed discrimination between benign lesions and malignant lesions with an acceptable level of sensitivity. Because mammography is primarily a screening examination, a high level of sensitivity is required, even at the expense of specificity.
 |
RESULTS
|
|---|
The performance of the CAC system in classifying the training set of 646 lesions, using the jackknife method, yielded an area under the ROC curve of 0.90 (Fig 1). The cut point value that corresponded to 94% sensitivity in discrimination of benign from malignant lesions resulted in a specificity of 78%, a positive predictive value of 81%, and accuracy of 86% (Table).

View larger version (28K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1. ROC curve for CAC system performance assessed with the jackknife method. The three points on either side of the curve represent the lesion score-based limits. Az = area under the curve.
|
|
After the CAC system had been trained with the database of 646 lesions, it was used to classify the 106 masses that had been categorized at conventional interpretation as probably benign (BI-RADS category 3) by at least two radiologists. Figure 2 shows the percentages of benign lesions and malignant lesions assigned by the CAC system to each BI-RADS category. Among malignant lesions, the CAC system assigned one lesion to BI-RADS category 2, three lesions to category 3, nine lesions to category 4, and 29 lesions to category 5. The CAC system thus correctly upgraded the BI-RADS category in 38 of 42 cases, yielding a sensitivity of 90% for classification of the subgroup of malignant masses that had been underrated by at least two radiologists during retrospective conventional interpretation. Among benign lesions, the CAC system assigned 12 lesions to category 2, 32 lesions to category 3, 11 lesions to category 4, and nine lesions to category 5. The CAC system thus incorrectly upgraded the BI-RADS category in 20 of 64 cases, yielding a specificity of 69% for classification of the subgroup of benign masses.

View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2. Bar graph shows the distribution of pathologically proved benign and malignant lesions among BI-RADS categories 1-5 after classification by the CAC system. Note that the likelihood of malignancy increases consistently from the lower-number categories to the higher-number categories.
|
|
 |
DISCUSSION
|
|---|
The CAC system correctly upgraded the classification of 38 (90%) of 42 malignant lesions that had been classified as BI-RADS category 3 lesions at conventional mammographic interpretation. In addition, evaluation of CAC system performance with the separate training database of 323 malignant and 323 benign cases demonstrated a sensitivity of 94%, a specificity of 69%, and a positive predictive value of 81%. These values represent a considerable improvement over those reported elsewhere for conventional interpretation (19,20). The CAC system did not simply upgrade all category 3 lesions, but, instead, it successfully identified almost all of the malignant lesions and thereby eliminated the delay in their diagnosis.
Because the study design was retrospective, the radiologists knew that each lesion had been pathologically proved. This knowledge may have influenced their assessments and introduced a bias that would not have been present in a prospective clinical evaluation. In addition, although the number of actual category 3 lesions was small, the limited use of this category required that a large number of cases be accumulated to obtain the database. Although the results of this study show that cases currently assigned to category 3 could correctly be assigned to higher- or lower-numbered categories, they do not justify the elimination of this BI-RADS category.
Our study results, however, do indicate that a documented assessment of lesion benignity based on CAC methods may help mammographers to appropriately limit their use of the "probably benign" category. Although not all cases assigned to category 3 would be definitively classified either as negative or benign (BI-RADS categories 1 and 2, respectively) or as suspicious for or highly suggestive of malignancy (categories 4 and 5, respectively), a substantial shift away from the ambiguous category 3 to other, definitive assessment categories would streamline screening mammography and reduce the number of close interval follow-up examinations. This may further increase the use and acceptance of screening mammography and ultimately improve overall patient care.
 |
STATISTICAL CONSULTANT COMMENTARY
|
|---|
The authors have used the jackknife technique to evaluate the performance of their computer-aided classification system for mass lesions. The jackknife method is a general approach for testing hypotheses and calculating CIs in scenarios in which better methods are not easily used (Manly BFJ, Randomization, bootstrap and Monte Carlo methods in biology, 2nd ed, London, England: Chapman & Hall, 1997; Efron B, Tibshirami RJ, An introduction to the bootstrap, London, England: Chapman & Hall, 1993). The jackknife can be thought of as a method for converting the problem of estimating any population parameter into the problem of estimating a population mean.
Suppose that we have a sample of n values given by X1, X2, ... , Xn and that the sample mean
is used to estimate the population mean. Next, suppose the sample mean is calculated with the jth observation Xj left out:
By combining the formulas for
and
-j, we can show that the missing data value Xj can be expressed as
Thus, the sample value Xj can be determined from the overall mean and the mean with Xj removed. This construct can be extended to other general parameters like those discussed in the preceding article. It is important that as the sample size n grows large, the jackknife estimators become unbiased.
 |
FOOTNOTES
|
|---|
The proprietary software for classification of mammographic lesions utilized in the study was developed by CadVision Medical Technologies. S.S.B. holds a small stock position (less than 2%) in CadVision Medical Technologies. I.S.L., R.B.L., B.N., and P.N.B. are part-time employees of CadVision Medical Technologies.
Abbreviations: BI-RADS = Breast Imaging Reporting and Data System,
CAC = computer-aided classification,
ROC = receiver operating characteristic
Author contributions: Guarantors of integrity of entire study, all authors; study concepts and design, S.S.B., I.S.L., R.B.L.; literature research, R.B.L., G.Y., I.S.L.; clinical studies, M.S.L., R.B.L., S.I.F., S.S.B.; data acquisition, S.S.B., G.Y., S.I.F., M.S.L.; data analysis/interpretation, B.N., R.B.L., I.S.L.; statistical analysis, B.N., I.S.L., R.B.L.; manuscript definition of intellectual content, P.N.B., S.S.B., I.S.L., B.N.; manuscript editing, S.S.B., I.S.L., R.B.L.; manuscript preparation, revision/review, and final version approval, all authors
 |
REFERENCES
|
|---|
- American College of Radiology. Breast Imaging Reporting and Data System (BI-RADS) 3rd ed. Reston, Va: American College of Radiology, 1998.
- American College of Radiology. Mammography quality control manual Reston, Va: ACR Committee on Quality Assurance in Mammography, 1999.
- Caplan LS, Blackman D, Nadel M, Monticilio DL. Coding mammograms using the classification "probably benign findingshort interval follow-up suggested.". AJR Am J Roentgenol 1999; 172:339-342.[Abstract/Free Full Text]
- Sickles EA. Periodic mammographic follow-up of probably benign lesions: results in 3,184 consecutive cases. Radiology 1991; 179:463-468.[Abstract/Free Full Text]
- Rubin E. Six-month follow-up: an alternative view. Radiology 1999; 213:15-18.[Free Full Text]
- Sickles EA. Probably benign breast lesions: when should follow-up be recommended and what is the optimal follow-up protocol? Radiology 1999; 213:11-14.[Free Full Text]
- Sickles EA. Commentary on Dr Rubins viewpoint. Radiology 1999; 213:19-20.[Free Full Text]
- Rubin E. Commentary on Dr Sickless viewpoint. Radiology 1999; 213:21.[Free Full Text]
- Varas X, Leborgne F, Leborgne JH. Nonpalpable probably benign lesions: role of follow-up mammography. Radiology 1992; 184:409-414.[Abstract/Free Full Text]
- Lindfors KK, OConnor J, Acredolo CR, Liston SE. Short interval follow-up versus immediate core biopsy of benign breast lesions: assessment of patient stress. AJR Am J Roentgenol 1998; 171:55-58.[Abstract/Free Full Text]
- Lindfors KK, Rosenquist CJ. Needle core biopsy guided with mammography: a study of cost-effectiveness. Radiology 1994; 190:217-222.[Abstract/Free Full Text]
- Huo Z, Giger ML, Vyborny CJ, et al. Analysis of spiculation in the computerized classification of mammographic masses. Med Phys 1995; 22:1569.[CrossRef][Medline]
- Huo Z, Giger ML, Vyborny CJ, Wolverton DE, Schmidt RA, Doi K. Automated computerized classification of malignant and benign masses on digitized mammograms. Acad Radiol 1998; 5:155-168.[CrossRef][Medline]
- Rangayyan RM, El-Faramawy NM, Desautels JE, Alim OA. Measures of acutance and shape for classification of breast tumors. IEEE Trans Med Imaging 1997; 16:799-810.[CrossRef][Medline]
- Fields S, Leichter I, Bamberger P, et al. Clinical evaluation of computerized enhancement and analysis of mammographic findings. In: Doi K, Giger ML, Nishikawa RM, Schmidt RA, eds. Digital mammography 96. Amsterdam, the Netherlands: Elsevier, 1996; 81-86.
- Kendall MG, Stuart A. The advanced theory of statistics Vol 3. Design and analysis, and time-series. London, England: Griffin, 1968; 314-341.
- Efron B. The jackknife, the bootstrap and other resampling plans Philadelphia, Pa: Society for Industrial and Applied Mathematics, 1982.
- Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21:720-733.[Medline]
- Sickles EA. Mammographic features of early breast cancer. AJR Am J Roentgenol 1984; 143:461-464.[Abstract/Free Full Text]
- de Lafontan B, Daures JP, Salicru B, et al. Isolated clustered microcalcifications: diagnostic value of mammographyseries of 400 cases with surgical verification. Radiology 1994; 190:479-483.[Abstract/Free Full Text]