Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2241011062
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Petrick, N.
Right arrow Articles by Hadjiiski, L. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Petrick, N.
Right arrow Articles by Hadjiiski, L. M.
(Radiology 2002;224:217-224.)
© RSNA, 2002


Breast Imaging

Breast Cancer Detection: Evaluation of a Mass-Detection Algorithm for Computer-aided Diagnosis—Experience in 263 Patients1

Nicholas Petrick, PhD, Berkman Sahiner, PhD, Heang-Ping Chan, PhD, Mark A. Helvie, MD, Sophie Paquerault, PhD and Lubomir M. Hadjiiski, PhD

1 From the Department of Radiology, University of Michigan Medical Center, CGC B2102, Box 0904, 1500 E Medical Center Dr, Ann Arbor, MI 48109-0904. From the 2001 RSNA scientific assembly. Received June 18, 2001; revision requested August 8; revision received November 7; accepted January 7, 2002. Supported by USPHS grant CA 48129 and research grant DAMD 17-96-1-6254 from the U.S. Army Medical Research and Materiel Command. N.P. supported by the Whitaker Foundation and USPHS grant CA 79943. B.S. supported by Career Development Award DAMD 17-96-1-6012. L.M.H. supported by USAMRMC grant DAMD 17-98-1-8211. Address correspondence to N.P. (e-mail: petrick@umich.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
PURPOSE: To evaluate the performance of a computer-aided diagnosis (CAD) mass-detection algorithm in marking preoperative masses.

MATERIALS AND METHODS: Digitized mammograms were processed with an adaptive enhancement filter followed by a local border refinement stage. Features were then extracted from each detected structure and used to identify potential masses. The performance of the algorithm was evaluated in independent cases obtained from 263 patients from two institutions. Each case contained one or more pathologically proved breast masses. Contralateral mammograms obtained in the same patients that did not contain a visible lesion were used to estimate the CAD marker rate for the algorithm. The tradeoff between detection sensitivity and the number of CAD marks was analyzed in this study.

RESULTS: Malignant masses were detected with the computer in 87% (135 of 156), 83% (130 of 156), and 77% (120 of 156) of the malignant cases at CAD marker rates of 1.5, 1.0, and 0.5 marks per mammogram, respectively. The difference between malignant mass-detection performance in subsets of cases collected at each institution was found to be less than 1%. The detection accuracy for benign masses was lower than that for malignant masses.

CONCLUSION: This mass-detection algorithm had a high sensitivity for detection of malignant masses. It may be useful as a second opinion in mammographic interpretation.

© RSNA, 2002

Index terms: Breast neoplasms, diagnosis, 00.31, 00.32 • Breast neoplasms, radiography, 00.112 • Computers, diagnostic aid


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Breast cancer is one of the leading causes of death among American women between 40 and 55 years of age (1). Women who undergo regular mammographic screening have a statistically significant reduction in breast cancer mortality compared with women who do not undergo screening (2). In addition, independent double reading of mammograms by two radiologists increases the sensitivity of mammographic screening (3). Results of studies indicate that a 4%–15% increase in the number of detected cancers is possible with double reading (35). However, because it entails higher cost and increased workload, double reading by two radiologists may be impractical in a general screening situation. Computer-aided diagnosis (CAD) may be a cost-effective alternative to double reading.

Efforts to evaluate the usefulness of CAD in reducing the rate of missed cancers are ongoing. A prospective study of 12,860 patients in a community breast cancer center that used a commercial CAD system (ImageChecker V2.0; R2 Technologies, Los Altos, Calif) reported a cancer detection rate of 81.6% (40 of 49), with eight of the cancers initially detected only with the CAD system (6). This corresponds to a 20% (41 vs 49) increase in the number of cancers detected. These results demonstrate that use of a CAD system can reduce the rate of missed cancers when CAD results are used as a second opinion, even if not all cancers can be detected with the CAD system.

These results do not distinguish between cancers that appear on mammograms as masses alone, microcalcification clusters alone, or as a combination of mass and cluster. We define a "preoperative mass" as a palpable or nonpalpable mass that is identified during clinical or mammographic evaluation and either is selected for biopsy based on the results of the examination or is followed up and proves to be benign. Castellino et al (7) reported that the latest version of the R2 ImageChecker achieved a sensitivity for mass detection of 85.7% at a marker rate of 0.5 mark per image for 677 preoperative masses; this represents an improvement over the sensitivity of 74.7% at a marker rate of 1.0 mark per image achieved in a previous release (V2.0). Researchers who evaluated the Second Look system (CADx Medical Systems, Laval, Quebec, Canada) reported a mass-detection sensitivity of 84% at a marker rate of 1.1 marks per image with mammograms obtained from a database of 149 preoperative masses (8).

The purpose of this study was to evaluate the performance of our CAD mass-detection algorithm in marking preoperative masses.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Data Sets
This study involved the collection of mammograms and biopsy records for evaluation with a CAD mass-detection algorithm. These data were collected with institutional review board approval; the institutional review board of the University of Michigan Medical Center determined that, with our protocol of maintaining patient confidentiality, no patient informed consent was needed for data collection.

Training Cases
The clinical mammograms used for training the algorithm parameters, referred to as the training cases, were selected from the files of patients who had undergone mammographic evaluation and biopsy at our institution. A multiple-reading paradigm, in which a resident or fellow previewed each case and an official interpretation was then rendered by an attending radiologist, was typically used during the initial evaluation of each case.

The mammograms were acquired with MinR/MinR or MinR/MRE screen-film systems (Eastman Kodak, Rochester, NY) with dedicated processing. Series of consecutive malignant and consecutive benign masses from several years were collected with a computerized biopsy registry. The selection criterion used by the radiologists was that a biopsy-proved mass smaller than 2.5 cm appeared on the mammogram. Cases with microcalcifications or architectural distortions without a visible mass were excluded, as were cases with masses larger than 2.5 cm. The data set consisted of 253 mammograms in 102 patients who were examined between 1981 and 1989. The training set included 128 malignant and 125 benign masses. Sixty-three of the malignant masses and six of the benign masses were judged to be spiculated by a Mammography Quality Standards Act (MQSA)–approved radiologist.

The mammograms were digitized with a DIS-1000 laser film scanner (Lumisys, Sunnyvale, Calif) with a pixel size of 100 µm and 12-bit gray-level resolution. The gray levels were linearly proportional to optical density (OD) in the 0.1–2.8 OD range and gradually fell off in the 2.8–3.5 OD range.

Independent Test Cases
We analyzed the performance of the trained mass-detection algorithm with independent mammographic cases. These cases were collected from two different institutions (the University of Michigan Medical Center and the University of South Florida) and were not used in the training process. Series of consecutive malignant and consecutive benign masses were collected with a biopsy registry from each institution, in a similar manner to the process used in the collection of the training cases. Refer to the previous discussion on training-case selection for more details.

The first group of mammograms of preoperative masses, referred to as group 1, was selected from the files of 127 patients who underwent mammographic evaluation and biopsy at our institution (institution 1) between 1990 and 1999. The group 1 cases came from the same institution as the training cases and had at least one proven breast mass visible at mammography. Again, during the initial evaluation of these cases, a resident or fellow typically previewed each case; an official interpretation was then rendered by an attending radiologist (prior to MQSA in 1994) or an MQSA-approved radiologist.

Each case consisted of a single craniocaudal view and either a mediolateral oblique view or a lateral view of the breast containing the mass. For simplicity, we will refer to all views other than the craniocaudal view as the mediolateral oblique view in the following discussions, with the understanding that this also includes some lateral views. If both breasts of a patient had a mass, each breast was considered to be a separate case for data analysis. With this breast-based definition, a total of 138 cases (276 mammograms) were available.

The mammograms were acquired with MinR/MRE screen-film systems with dedicated processing in the years before 1997 (154 mammograms) and a Kodak 2000 screen-film system (Eastman Kodak) during and after 1997 (122 mammograms). Each case contained one or more preoperative masses that were identified prospectively during initial clinical evaluation or mammographic interpretation. The independent group 1 mammograms were digitized with a LS 85 laser film scanner (Lumisys) at 50 µm and 12-bit gray-level resolution. The gray levels were calibrated to be linearly proportional to OD in the 0.1–4.0 OD range. The images were reduced to a 100-µm pixel size by averaging 2 x 2 pixel neighborhoods before mass detection was performed.

Clinical cases from a public database available from the University of South Florida (USF) (institution 2) were also analyzed (9). We evaluated 142 craniocaudal and mediolateral oblique mammogram pairs obtained at USF in 136 patients between 1992 and 1998. These 142 USF cases will be referred to as the group 2 cases in the following discussions. Each group 2 case contained at least one proven breast mass visible at mammography. Additional information on the USF database can be found in the literature (9). For compatibility with the group 1 database, we selected only those USF mammograms digitized with the Lumisys 200 laser film scanner. Again, this scanner digitized the images at 50 µm and 12-bit gray-level resolution, but the gray levels were calibrated to be linearly proportional to OD in the 0.1–3.6 OD range. The group 2 cases came from a different institution than the training cases.

We used lesion-free mammograms of the breast contralateral to those breasts that contained an abnormality to estimate the CAD marker rate for the algorithm. These mammograms are referred to as normal cases in this study. In our analysis, "normal" implies only that a mammogram did not contain a visible mass at the time of the mammographic examination and at the time of a second review by an MQSA-approved radiologist during data collection. A total of 251 mammograms from the 127 group 1 patients and 252 mammograms from the 136 group 2 patients were included as normal mammograms. There were fewer normal than abnormal mammograms because seven of the 263 combined group 1 and group 2 patients had visible lesions in both the right and left breasts and because not all contralateral mammograms were digitized.

Table 1 summarizes the group 1 and 2 test cases used to evaluate the mass-detection algorithm. It includes the numbers of malignant and benign masses separated by whether they were visible in both views or in only a single view. Figure 1 shows the distributions of lesion subtlety (1 = subtle, 5 = obvious) on the mammograms obtained from the group 1 and 2 databases, as ranked by a radiologist (M.A.H. for the group 1 mammograms) who evaluated each individual mass.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Summary of Mammograms, Patients, and Masses in Group 1 and Group 2 Databases

 


View larger version (40K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Histogram summarizes the subtlety of the lesions observed on the 138 mammographic pairs and the 142 mammographic pairs obtained from the group 1 and group 2 databases, respectively, as ranked by the radiologist reviewing the cases. Each mass on each mammogram was rated independently by the radiologist. For comparison purposes, the plot is of the percentage of masses falling within each category. The total number of masses in each group can be found in Table 1.

 
The subtlety rankings used in the collection of the group 1 cases were defined as follows: (a) great call—expected only from an experienced breast radiologist; (b) very good call—expected only from a radiology fellow or an experienced breast radiologist; (c) good call—expected from a resident, fellow, or experienced breast radiologist; (d) evident masses visible without close inspection—expected from a good medical student; and (e) obvious mass—evident to basically anyone regardless of experience.

The subtlety ratings for all group 2 masses were retrieved from the USF database and were also based on a five-point rating system. However, the USF ratings for the group 2 cases did not use the same subtlety definitions as those described earlier in this paragraph for the group 1 cases. Instead, the ratings were defined as follows: 1, subtle; 2, twice as subtle as rating 1; 3, three times a subtle as rating 1; and so forth.

The mammographic lesion size of each group 1 mass was measured by the radiologist during initial case evaluation. The malignant group 1 masses had a mean size, SD, and median size of 15.4 mm, 12.0, and 12.0 mm, respectively. The benign group 1 masses had a mean size, SD, and median size of 13.4 mm, 11.8, and 10.0 mm, respectively. Radiologist-measured mass sizes were not used for the group 2 cases because we found that the boundaries of the masses, which were hand-drawn by the reviewing radiologists, were much larger than the actual mammographic lesion size. Therefore, mass size information is not reported for the group 2 cases.

The institutional review board of our institution did not require the collection of racial or ethnic information from these patients, so no statistics on racial or ethnic composition are available for the group 1 cases. However, because the cases were randomly sampled from the records of patients undergoing mammography at our hospital, the racial and ethnic composition of the group of patients in this study is expected to be similar to that of our patient population. The ethnicity statistics for our mammography screening patient population in 1998 and 1999 are given in Table 2. Table 2 also includes the patient ethnicity statistics for the group 2 cases, which were provided in the USF public database.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Summary of Ethnic Composition of Group 1 and Group 2 Patient Populations

 
Mass-Detection Algorithm
Algorithm description.—Our mass-detection scheme uses adaptive enhancement, object-based border refinement, and feature classification to identify potential breast masses. The block diagram for the scheme is shown in Figure 2. The first step is the digitization of a mammogram. The digitized mammogram is then processed with an initial segmentation step, in which a DWCE filter is used for preprocessing. The DWCE filter was developed to accentuate mammographic structures before edge detection by adaptively enhancing local contrast. After DWCE filtering, edge detection is used to define the borders of the enhanced structures. This results in a set of detected structures. Each of these structures is then processed by a local refinement stage. First, the algorithm identifies seed locations by locating all local maxima within each object with an ultimate erosion technique (10) and then selecting all connected pixels with gray values in the range Mi ± 0.01 · Mi, where Mi is the gray level of the ith local maximum. K-means clustering is then applied to a 25 x 25-mm background-corrected region of interest (11) centered on each seed object to refine the initial object border (12).



View larger version (49K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Block diagram of the mass-detection system evaluated in this study. DWCE = density-weighted contrast enhancement.

 
The purpose of the local refinement stage is to improve the accuracy of object borders found by the DWCE segmentation, because DWCE segmentation tends to underestimate the size of breast structures. The local refinement stage was also found to be effective in splitting large connected regions into smaller breast structures. The final stage is to classify each detected object as a breast mass or as a normal structure based on extracted morphologic and textural features. To overcome the problems associated with the large number of initial structures, we perform the feature classification in two stages. Eleven morphologic features are initially used with a threshold and a linear classifier to remove detected normal structures that are substantially different from breast masses. Texture-based classification then follows this morphologic reduction stage. Fifteen global and local multiresolution textural features based on spatial gray-level dependence matrices are used as inputs to a linear discriminant classifier, which merges the input features into a single discriminant score for each detected object. Decision thresholds based on this score and on the maximum number of marks allowed per image are then used to identify potential breast masses. Further details on the mass-detection algorithm can be found in the literature (1316).

Algorithm training.—The computer program was trained with the entire training data set of 253 mammograms. The training process included adjusting the filters, clustering, selected features, and classification thresholds. Once training was completed, the parameters and all thresholds were fixed for testing. The training data set was then resubstituted into the algorithm and was found to have a mammogram-based (ie, when each mass on each mammogram was considered as an independent sample) training sensitivity of 81% (205 of 253) overall and 85% (109 of 128) for malignant masses. The mass-detection algorithm produced 2.9 marks per mammogram on average at this sensitivity level in the training cases. It is important to note that the detection classifiers considered only classification between breast masses and normal tissue, not classification between malignant and benign masses. Therefore, no distinction was made between malignant and benign masses in the training process.

Definition of True-Positive and False-Positive Markers
For the group 1 cases, the smallest bounding box containing the entire mass identified by a radiologist was used as the truth. For the group 2 cases, we used a bounding box around the radiologist-outlined mass region provided with each image. Our definition of a true-positive finding was based on the percentage of overlap between the bounding box of an identified structure and the bounding box of the true mass. On the basis of findings in the training set, we chose an overlap threshold of 25%. This value corresponds to the minimum overlap between the bounding box of a detected object and the bounding box of a true mass for the object to be considered as a true-positive detection. The 25% threshold was selected because it was found to match well with true-positive visual identifications. The detected objects were first labeled automatically by the computer with this criterion. All of the true-positive masses were then visually reviewed to make sure that the program highlighted the true lesion and not a neighboring structure. Marks that were found to match neighboring structures were eliminated as true-positive marks.

The number of false-positive marks produced by the algorithm was determined by counting the markings produced in normal cases. We used a total of 251 normal mammograms from group 1 and 252 normal mammograms from group 2 to estimate the marker rate. The true-positive fraction, calculated from the abnormal cases, and the average number of marks per image, calculated from the normal cases, were determined for a fixed set of thresholds at the final texture-classification stage. The true-positive fraction and the average number of marks per mammogram as the decision threshold varied were then used to plot the free-response receiver operating characteristic (FROC) performance curves for malignant and benign masses with the different data sets.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Test performance results are presented on a per mammogram and a per case basis. In the former, the craniocaudal and mediolateral oblique views are considered independently, so that a lesion visible in the craniocaudal view is considered as a true-positive mark and the same lesion visible in the mediolateral oblique view is a second true-positive mark. In the latter, a mass is considered detected if it is detected in either the craniocaudal or the mediolateral oblique view. The per case evaluation takes into consideration that, in clinical practice, once the computer alerts the radiologist to a cancer on one view, it is unlikely that the radiologist will miss the cancer. The per case approach is often used by researchers in reporting CAD performance (5,8,17).

Results are also presented for two different true-positive scoring methods. The individual scoring method considers each mass on a mammogram or in a case as a different true-positive finding. The grouped scoring method considers all malignant masses on a mammogram or in a case as a single true-positive finding (5). The rationale for group scoring is that a radiologist may not need to be alerted to all malignant lesions in a mammogram or case before taking action. Therefore, multiple detections in a mammogram or case may not substantially enhance the power of CAD.

FROC performance curves, which were calculated on the basis of individual mass scoring, are shown in Figure 3 for the group 1 cases. Similar data are presented for the group 2 cases in Figure 4. These figures include per case and per mammogram performance curves for the detection of both malignant and benign masses and are included to show the true-positive fraction achievable for a large range of marker rates.



View larger version (44K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. FROC performance curves for group 1 calculated on the basis of scoring of individual masses. Per case and per mammogram performance curves for detection of both malignant and benign masses are depicted. The curves show the true-positive (TP) fraction achievable for a large range of mass marker rates. It is evident that the performance of the algorithm in the group 1 cases was better in detection of malignant versus benign masses, with an approximately constant difference in the true-positive fractions between the two throughout the entire CAD marker range plotted.

 


View larger version (45K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. FROC performance curves for group 2 calculated on the basis of scoring of individual masses. Per case and per mammogram performance curves for the detection of both malignant and benign masses are depicted. The curves show the true-positive (TP) fraction achievable for a large range of mass marker rates. The performance of the algorithm in group 2 cases is again better in detection of malignant versus benign masses, but the difference between the two is not constant as a function of CAD marker rate. It is evident that overall performance of the algorithm in group 2 benign masses is much worse than its performance in group 1 benign masses, while the performance difference between its assessment of group 1 malignant masses and its assessment of group 2 malignant masses is small.

 
An approximately constant difference in the true-positive fractions between the malignant and benign masses in group 1 across the entire FROC curve can be observed. The group 1 per case difference is in the range of 9%–12%, from 3.0 to about 0.25 marks per image, with the per mammogram results following the same trend. The difference in true-positive fractions between the malignant and the benign masses is larger for the group 2 database, but the difference is not constant as a function of the CAD marker rate. Here the per case difference starts at about 12% at 3.0 marks per image, increases to 19% at 1.5 marks per image, and then increases to 34% at 1.0 mark per image; the per mammogram performance again follows a similar trend. It is clear that the performance in the group 2 benign cases is much lower than that in the group 1 benign cases. However, the difference in detection performance between the group 1 and group 2 malignant masses is small.

The per case and per mammogram FROC performance curves in malignant masses, calculated on the basis of grouped mass scoring, are shown in Figure 5. These curves show how the true-positive fraction varies as a function of the marker rate for group scoring, which was expected to be our most clinically relevant measure of algorithm performance. It is evident that the algorithm provided consistent malignant mass–detection performance for both independent test sets over a wide range of marker rates.



View larger version (46K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. FROC performance curves for group 1 and group 2 calculated on the basis of grouped mass scoring. Per case and per mammogram performance curves for detection of malignant masses are depicted. These curves show how the true-positive (TP) fraction varies as a function of marker rate for group scoring, which is expected to be the most clinically relevant measure of algorithm performance. The algorithm showed consistent malignant mass-detection performance for both independent test sets over a wide range of marker rates.

 
In the group 1 database, 34% (49 of 146) of the malignant and 5% (8 of 159) of the benign masses were spiculated. Thirty-three percent (65 of 197) and 0% (0 of 132) of the masses in the group 2 malignant and benign cases, respectively, were spiculated. In our training set, 49% (63 of 128) of the malignant lesions and 6% (8 of 125) of the benign lesions were judged as spiculated by radiologists. A comparison between the performance of the algorithm in spiculated masses and its performance in nonspiculated masses is shown in Figure 6. The curve for spiculated benign masses is not included because of the small number of lesions in this category. The resulting curves indicate that the detection algorithm is better suited for detecting spiculated masses.



View larger version (44K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6. Combined group 1 and group 2 FROC performance curves in spiculated and nonspiculated masses calculated on the basis of scoring of individual masses. The benign spiculated mass curve is not shown because of the small number of cases in this category. The curves indicate that the CAD algorithm was more effective in detecting spiculated masses than it was in detecting nonspiculated ones. TP = true-positive.

 
Finally, we analyzed the sensitivity achieved by the mass-detection algorithm at three fixed normal marker rates. These marker rates were selected because they represent potential operating points for clinical implementation of a CAD algorithm, on the basis of the results of previously published studies (7,8). The results at these fixed marker levels are summarized in Table 3. Our best estimates for the clinical performance of our mass-detection program are found in the rows for combined grouped malignant masses in the table, which describe the fact that 87% (135 of 156), 83% (130 of 156), and 77% (120 of 156) of the malignant masses were detected at marker rates of 1.5, 1.0, and 0.5 marks per mammogram, respectively.


View this table:
[in this window]
[in a new window]

 
TABLE 3. Summary of per Case Mass-Detection Performance at Marker Rates of 0.5, 1.0, and 1.5 Marks per Image

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The performance of our mass-detection algorithm, assessed with independent test cases collected at two different institutions, clearly indicates that it is effective in detecting breast masses. Because these results were based on a large independent test set from two institutions, and the algorithm parameters were not adjusted on the basis of any characteristics of the test set, the performance results estimated in this study should be close to the true performance of the algorithm in the patient population.

The estimated performance values for our detection algorithm compare well with published performance results for commercial CAD vendors (an 85.7% true-positive fraction at 0.5 mark per image with the R2 program and an 84% true-positive fraction at 1.1 marks per image with the CADx program [7,8]), as well as with results for other algorithms currently being developed in research laboratories. This indicates that our mass-detection algorithm could be beneficial to radiologists as a second opinion. It also indicates that, although different methods are used in different detection programs, they all may be able to result in effective CAD and may lead to improvements in mammographic screening if the algorithms are properly trained.

We first compared the performance of the algorithm in malignant lesions with its performance in benign lesions and found that the detection performance in malignant masses is better than that in benign masses. One possible cause for the performance differences between benign and malignant mass detection is a difference in lesion subtlety. We observed a small difference in the subtlety ratings between the benign and malignant masses in the group 1 database, with the malignant masses being slightly more obvious than the benign masses. This same trend holds for the group 2 masses as well. However, it should be noted that the subtlety distributions between group 1 and group 2 differ considerably, as will be discussed in the following paragraphs. The observed difference in subtlety ratings between benign and malignant masses for both groups 1 and 2 is not particularly large, so a subtlety difference does not seem to fully explain the large disparities observed in the FROC curves.

Another factor that probably contributed to the observed difference is that malignant masses are more likely to be spiculated than are benign masses; the performance of our algorithm in spiculated masses is superior to its performance in nonspiculated masses. It is evident that the detection algorithm is better suited for detecting spiculated masses, especially at the lower marker rates, although no special efforts were made to train the algorithm to detect spiculated masses. We surmise that the texture-analysis function of the algorithm acquired a higher sensitivity to spiculated masses during the training process because of the relatively large fraction of spiculated lesions in the training set. Even though the detection algorithm had a higher sensitivity in detecting spiculated masses, the large number of nonspiculated masses in the training set (182 of 253) still trained the algorithm to be sensitive to nonspiculated malignant masses. The sizable difference between the curves for detection of malignant and benign nonspiculated masses suggests that some additional, as yet undetermined, factors may also have contributed to the observed performance difference between the assessment of malignant masses and the assessment of benign masses.

We also observed differences in algorithm performance between masses in group 1 and masses in group 2. The performance rates in detecting malignant masses were quite similar between the two groups, but the detection of benign lesions differed considerably between the groups. One potential factor is that 94% (147 of 157) of the benign masses in the group 1 database were later selected for biopsy. This high rate of biopsy of benign lesions suggests that the group 1 masses were judged by the radiologist to be similar enough to malignant masses to warrant biopsy (ie, the vast majority of the lesions were American College of Radiology Breast Imaging Reporting and Data System [BI-RADS] categories 4 and 5). We therefore can expect the detection performance in these benign masses to be somewhat similar to that in the malignant masses for group 1. The number of biopsies of benign lesions was not available in the group 2 database, but it is likely that a smaller fraction of the benign lesions were selected for biopsy, resulting in the presence of a larger fraction of BI-RADS category 2 or 3 lesions in this group. If this is true, then the characteristics of the group 2 benign masses would not have matched the characteristics of the benign masses in our training set very well; the group 2 benign masses therefore may have been more difficult to detect.

Another factor that may have contributed to this performance difference is a difference in the OD ranges of the digitizers used to acquire the cases at each institution. The OD ranges were 0–3.5, 0–4.0, and 0–3.6 for the Lumisys digitizers used to digitize the training, group 1, and group 2 mammograms, respectively. The smaller OD range of the digitizer used to digitize the group 2 mammograms may have caused a decrease in the detection performance for subtle low-density lesions compared with the group 1 performance in similar cases. However, the group 2 digitizer had an advantage in many of the cases because it better matched the OD range of the digitizer used to acquire the training set. Because of the presence of other factors such as case variability, it is difficult to distinguish the relative importance of these competing effects on the performance of the algorithm.

When we compared the subtlety ratings between the group 1 and group 2 databases, we observed a large disparity in the radiologists’ ratings. One may conclude that the group 2 cases were easier than the group 1 cases in terms of both malignant and benign masses. However, this does not agree with our detection results. The detection performance in the group 1 benign cases was much better than that in the group 2 benign cases, even though the group 1 lesions were rated as more subtle. The more "obvious" malignant masses in the group 2 database resulted in only a small (1%–2%) gain in the detection performance when compared with the group 1 malignant cases. Likewise, visual comparison of the cases did not reveal such a large difference between the databases. The group 2 subtlety distribution does not match well with what is expected in clinical practice because it is highly skewed toward obvious. One would expect that a randomly drawn sample from the patient population would follow a distribution more similar to the group 1 histogram. Therefore, the subtlety difference was most likely caused by a difference in the subjective criteria used to define lesion subtlety instead of a true difference in subtlety between the cases. It is likely that the individual radiologists at the different institutions used different scales. The radiologists reading cases from institution 1 appeared to have spread their subtlety ratings across the multiple categories, while the radiologists at institution 2 seemed to have basically used a binary decision of visible or not visible.

The results suggest that caution must be taken when comparing detection results obtained in cases from different databases. Even if subtlety ratings are available, the rating criteria may be subject to large inter- and intraobserver variations. This is especially true if the databases are collected by different institutions. Comparisons between lesions rated at a single institution with a consistent rating criterion (eg, comparing malignant and benign lesions from the same data set) are less problematic.

The preoperative masses evaluated in this preliminary study were all characterized during mammographic evaluation on the basis of multiple reading of the case by a resident or fellow and an attending MQSA-approved radiologist. Clearly, CAD was not used as an aid to the radiologist during initial case interpretation. The collected data were simply used to characterize the expected performance of the algorithm and to provide a benchmark for comparison with other CAD algorithms. Evaluation studies are now underway to estimate how well our mass-detection algorithm performs with mammograms in which the lesions are not initially deemed actionable. Good CAD performance in these cases may lead to earlier cancer detection.

Simply evaluating CAD performance in preoperative and early cases will not directly measure the effectiveness of our algorithm as an aid to the radiologist. The true clinical performance of a CAD scheme must be established through a properly designed prospective clinical study such as the one reported in reference 6. This type of prospective study will be undertaken in the future to determine if our CAD algorithm aids radiologists in detecting breast cancer earlier and if it affects their recall rate. We are also developing new techniques to both improve the detection performance and reduce the marker rate of the algorithm by fusing single-view information and information from different mammographic views of the same breast (18,19).


    ACKNOWLEDGMENTS
 
A special thanks to Christopher Washington, BS, for downloading the USF database and converting the data format so that the cases could be included in this study.


    FOOTNOTES
 
The content of this publication does not necessarily reflect the position of the government, and no official endorsement of any equipment or product should be inferred.

Abbreviations: BI-RADS = Breast Imaging Reporting and Data System, CAD = computer-aided diagnosis, DWCE = density-weighted contrast enhancement, FROC = free-response receiver operating characteristic, MQSA = Mammography Quality Standards Act, OD = optical density, USF = University of South Florida

Author contributions: Guarantor of integrity of entire study, N.P.; study concepts and design, N.P., H.P.C., B.S., M.A.H.; literature research, N.P., H.P.C.; clinical studies, H.P.C., N.P., M.A.H.; data acquisition and analysis/interpretation, all authors; statistical analysis, N.P.; manuscript preparation, N.P.; manuscript definition of intellectual content, N.P., H.P.C., B.S., M.A.H.; manuscript editing, revision/review, and final version approval, all authors.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Greenlee RT, Hill-Harmon MB, Murray T, Thun M. Cancer statistics, 2001. CA Cancer J Clin 2001; 51:15-36.[Abstract/Free Full Text]
  2. Tabar L, Fagerberg C, Gad A, et al. Reduction in mortality from breast cancer after mass screening with mammography. Lancet 1985; 1:829-832.[Medline]
  3. Thurfjell EL, Lernevall KA, Taube AAS. Benefit of independent double reading in a population-based mammography screening program. Radiology 1994; 191:241-244.[Abstract/Free Full Text]
  4. Beam V, Sullivan D, Layde P. Effect of human variability on independent double reading in screening mammography. Acad Radiol 1996; 3:891-897.[CrossRef][Medline]
  5. Warren Burhenne LJ, Wood SA, D’Orsi CJ, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000; 215:554-562.[Abstract/Free Full Text]
  6. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001; 220:781-786.[Abstract/Free Full Text]
  7. Castellino RA, Roehrig J, Zhang W. Improved computer-aided detection (CAD) algorithm for screening mammography (abstr). Radiology 2000; 217(P):400.
  8. Brem RF, Schoonjans JM, Hoffmeister J, Raza S, Baum JK. Evaluation of breast cancer with a computer-aided detection system by mammographic appearance, histology and lesion size (abstr). Radiology 2000; 217(P):400.
  9. Heath M, Bowyer K, Kopans D, et al. Current status of the digital database for screening mammography. In: Karssemeijer N, Thijssen M, Hendriks J, van Erning L, eds. Digital mammography. Dordrecht, the Netherlands: Kluwer, 1998; 457-460.
  10. Russ JC. The image processing handbook Boca Raton, Fla: CRC, 1992.
  11. Sahiner B, Chan HP, Petrick N, et al. Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images. IEEE Trans Med Imaging 1996; 15:598-610.[Medline]
  12. Chan HP, Petrick N, Sahiner B. Computer-aided breast cancer diagnosis. In: Jain A, Jain A, Jain S, Jain L, eds. Artificial intelligence techniques in breast cancer diagnosis and prognosis. River Edge, NJ: World Scientific, 2000; 179-264.
  13. Petrick N, Chan HP, Sahiner B, Wei D. An adaptive density-weighted contrast enhancement filter for mammographic breast mass detection. IEEE Trans Med Imaging 1996; 15:59-67.
  14. Petrick N, Chan HP, Sahiner B, Helvie MA. Combined adaptive enhancement and region-growing segmentation of breast masses on digitized mammograms. Med Phys 1999; 26:1642-1654.[CrossRef][Medline]
  15. Petrick N, Chan HP, Sahiner B, Helvie MA, Paquerault S. Evaluation of an automated computer-aided diagnosis system for the detection of masses on prior mammograms. Proc SPIE 2000; 3979:967-973.[CrossRef]
  16. Petrick N, Sahiner B, Chan HP, Helvie MA, Paquerault S. Preclinical evaluation of a CAD algorithm for early detection of breast cancer. Proc IWDM 2000; :328-333.
  17. Birdwell RL, Ikeda DM, O’Shaughnessy KF, Sickles EA. Mammographic characterization of 111 missed cancers later detected by screening mammography (abstr). Radiology 1999; 213(P):240.
  18. Paquerault S, Petrick N, Chan HP, Sahiner B, Dolney AY. Improvement of mammographic lesion detection by fusion of information from different views. Proc SPIE 2001; 4322:1883-1889.[CrossRef]
  19. Sahiner B, Petrick N, Chan HP, Paquerault S, Helvie MA, Hadjiiski LM. Recognition of lesion correspondence on two mammographic views: a new method of false-positive reduction for computerized mass detection. Proc SPIE 2001; 4322:649-655.[CrossRef]



This article has been cited by other articles:


Home page
RadiologyHome page
M. E. Baker, L. Bogoni, N. A. Obuchowski, C. Dass, R. M. Kendzierski, E. M. Remer, D. M. Einstein, P. Cathier, A. Jerebko, S. Lakare, et al.
Computer-aided Detection of Colorectal Polyps: Can It Improve Sensitivity of Less-Experienced Readers? Preliminary Findings
Radiology, October 1, 2007; 245(1): 140 - 149.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
V. R. Pai, N. E. Gregory, A. E. Swinford, and M. Rebner
Ductal Carcinoma in Situ: Computer-aided Detection in Screening Mammography
Radiology, December 1, 2006; 241(3): 689 - 694.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
M. J. Morton, D. H. Whaley, K. R. Brandt, and K. K. Amrami
Screening Mammograms: Interpretation with Computer-aided Detection--Prospective Evaluation
Radiology, May 1, 2006; 239(2): 375 - 383.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
M. A. Helvie, L. Hadjiiski, E. Makariou, H.-P. Chan, N. Petrick, B. Sahiner, S.-C. B. Lo, M. Freedman, D. Adler, J. Bailey, et al.
Sensitivity of Noncommercial Computer-aided Detection System for Mammographic Breast Cancer Detection: Pilot Clinical Trial
Radiology, April 1, 2004; 231(1): 208 - 214.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
J. A. Baker, E. L. Rosen, J. Y. Lo, E. I. Gimenez, R. Walsh, and M. S. Soo
Computer-Aided Detection (CAD) in Screening Mammography: Sensitivity of Commercial CAD Systems for Detecting Architectural Distortion
Am. J. Roentgenol., October 1, 2003; 181(4): 1083 - 1088.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
D. W. Bates and A. A. Gawande
Improving Safety with Information Technology
N. Engl. J. Med., June 19, 2003; 348(25): 2526 - 2534.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Petrick, N.
Right arrow Articles by Hadjiiski, L. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Petrick, N.
Right arrow Articles by Hadjiiski, L. M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE