Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2262011843
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Chen, C.-M.
Right arrow Articles by Chiou, S.-Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chen, C.-M.
Right arrow Articles by Chiou, S.-Y.
(Radiology 2003;226:504-514.)
© RSNA, 2003


Breast Imaging

Breast Lesions on Sonograms: Computer-aided Diagnosis with Nearly Setting-Independent Features and Artificial Neural Networks1

Chung-Ming Chen, PhD, Yi-Hong Chou, MD, Ko-Chung Han, MS, Guo-Shian Hung, MS, Chui-Mei Tiu, MD, Hong-Jen Chiou, MD and See-Ying Chiou, MD

1 From the Institute of Biomedical Engineering, National Taiwan University, 1, Section 1, Jen-Ai Rd, Taipei 100, Taiwan (C.M.C., K.C.H., G.S.H.); Department of Radiology, Division of Ultrasound, Taipei Veterans General Hospital and National Yang Ming University, Taiwan (Y.H.C., C.M.T., H.J.C.); and Department of Radiology, Division of Ultrasound, Taipei Veterans General Hospital (S.Y.C.). Received November 19, 2001; revision requested January 28, 2002; revision received May 17; accepted June 27. Supported by National Science Council grant NSC90-2213-E-002-103, Taiwan. Address correspondence to C.M.C. (e-mail: ming@lotus.mc.ntu.edu.tw).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
PURPOSE: To develop a computer-aided diagnosis (CAD) algorithm with setting-independent features and artificial neural networks to differentiate benign from malignant breast lesions.

MATERIALS AND METHODS: Two sets of breast sonograms were evaluated. The first set contained 160 lesions and was stored directly on the magnetic optic disks from the ultrasonographic (US) system. Four different boundaries were delineated by four persons for each lesion in the first set. The second set comprised 111 lesions that were extracted from the hard-copy images. Seven morphologic features were used, five of which were newly developed. A multilayer feed-forward neural network was used as the classifier. Reliability, extendability, and robustness of the proposed CAD algorithm were evaluated. Results with the proposed algorithm were compared with those with two previous CAD algorithms. All performance comparisons were based on paired-samples t tests.

RESULTS: The area under the receiver operating characteristic curve (Az) was 0.952 ± 0.014 for the first set, 0.982 ± 0.004 for the first set as the training set and the second set as the prediction set, 0.954 ± 0.016 for the second set as the training set and the first set as the prediction set, and 0.950 ± 0.005 for all 271 lesions. At the 5% significance level, the performance of the proposed CAD algorithm was shown to be extendible from one set of US images to the other set and robust for both small and large sample sizes. Moreover, the proposed CAD algorithm was shown to outperform the two previous CAD algorithms in terms of the Az value.

CONCLUSION: The proposed CAD algorithm could effectively and reliably differentiate benign and malignant lesions. The proposed morphologic features were nearly setting independent and could tolerate reasonable variation in boundary delineation.

© RSNA, 2003

Index terms: Breast neoplasms, diagnosis, 00.30 • Breast neoplasms, US, 00.1298 • Computers, diagnostic aid • Computers, neural network • Images, analysis


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Breast cancer is one of the leading causes of death for women in many countries (1). For early detection of breast cancer, mammography is currently the most widely used screening modality, but it has a low negative predictive value. Many investigators have found that more than 60% of masses referred for breast biopsy on the basis of mammographic findings are actually benign (2,3).

Breast sonography was shown to be an effective adjunct to mammography in reducing the number of negative biopsy results (48). For example, with deliberately devised sonographic features, Stavros et al (7) were able to attain the overall sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of 98.4%, 67.8%, 72.9%, 38%, and 99.5%, respectively. Potentially effective as it is, breast sonography remains controversial for screening because interpretation of the ultrasonographic (US) images is greatly influenced by the scanning techniques and the sonographic features of the suspected abnormality. Breast sonologists with different experiences might have different interpretations of the sonograms. To minimize the effect of the operator-dependent nature inherent in US, many computerized approaches have been proposed to assist differentiation between benign and malignant breast lesions (915).

The general idea of computer-aided diagnosis (CAD) for breast sonography is to convert the visually extractable sonographic features into mathematic models and to characterize the lesions with the mathematic features based on the classification schemes. The mathematic features may be categorized into two classes, namely, the regional features and the morphologic features. The regional features characterize the image properties evolved from the intensity distribution (eg, echogenicity, echotexture), whereas the morphologic features describe the shape and contour of the lesion. As an example, with use of the mathematic features that quantify lesion margin, shape, homogeneity, and posterior acoustic attenuation pattern, Giger et al (12) achieved values for the area under the receiver operating characteristic (ROC) curve (Az) of 0.94 and 0.87 for the entire database and the equivocal database on the basis of linear discriminant analysis (LDA).

Although promising performances have been reported, CAD for breast sonography is still impractical for routine use because previous mathematic features depend on either the setting of the US systems or the contour extraction process. It is easy to show that most regional features vary nonlinearly with the system setting. For instance, the co-occurrence matrix used by Garra et al (10) may fluctuate with such system parameters as the time-gain compensation, total gain, and focal depth. To avoid this problem, many previous CAD algorithms necessitate that all breast images be obtained with the same system parameter setting (1315). This constraint is clinically undesirable. On the other hand, since the morphologic features (eg, the contour gradients [11]) are derived from the contour, they are more susceptible to the contour extraction process than are the regional features. Ideally, this problem may be solved by using automatic contour extraction schemes. However, automatic contour extraction on a US image is a difficult task in general, and no satisfactory approaches exist so far, to our knowledge.

The purpose of this study was to develop a CAD algorithm with setting-independent features and artificial neural networks to differentiate benign from malignant breast lesions. More specifically, this study was aimed to design a set of morphologic features that were nearly independent of not only the system setting but also the contour extraction process.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
The proposed CAD algorithm was composed of three essential components, namely, feature extraction, feature selection, and classification. To relax the constraints on the system settings, the morphologic features rather than the regional features were adopted. The potential dependence of the morphologic features on the contour extraction process was minimized by capturing important topologic properties of the lesions, which may not vary drastically with the delineated contour. Feature selection was necessary to alleviate dimensionality (16). A set of essential morphologic features that yielded the curse of the best performance was selected on the basis of stepwise logistic regression (17). Classification was accomplished with a multilayer feed-forward neural network (MFNN) (18) on the basis of the essential morphologic features. The advantage of the MFNN is that arbitrarily complex convex separation surfaces can be approximated.

Study Subjects and Image Acquisition
Two sets of breast US images were used in this study that were selected randomly from the database of a medical center in Taiwan. The image data were collected during a period of 4 years. The institutional review boards agreed that the patient images and clinical information could be used for study without written consent if anonymity was maintained. This regulation was carefully followed in the present study.

The first set of sonograms was obtained from September 9, 1996, to June 6, 2000, in 160 female patients (age range, 16–85 years; mean age, 46 years). The sonograms depicted 160 breast lesions, including 42 cysts, 49 fibroadenomas, and 69 carcinomas, that were pathologically proven. They were stored directly (by using the system built-in function) on a US system (HDI 3000; Advanced Technological Laboratory, Bothell, Wash) equipped with a broadband 5–10-MHz linear electronically focused transducer and cine loop capability.

The second set of sonograms was obtained from January 1, 1997, to December 31, 1998, in 111 women (age range, 18–82 years; mean age, 42 years). The sonograms depicted 111 breast lesions that were pathologically proven, including 40 fibroadenomas and 71 carcinomas. They were obtained with the same US system that was used for the first set. Unlike the first set of US images, the second set comprised hard-copy images. The lesions were extracted from these sonograms by first digitizing these images with film scanners (HP6300C; Hewlett-Packard, Palo Alto, Calif).

No constraint was imposed on the system settings during acquisition of these images. The sonologists were free to adjust the system settings to obtain the best views. In both sets, the lesion boundaries were delineated manually. The first set of lesions served as the primary basis for performance evaluation and comparison of the complete morphologic and regional information preserved in the directly stored US images. To take into account the potential variation of delineation among different persons, the first set of lesions was delineated by four graduate students (K.C.H. and others), and each student was supervised by one of four attending physicians (Y.H.C., C.M.T., H.J.C., S.Y.C.) with 22, 7, 7, and 3 years of experiences in breast sonography, respectively. For each lesion, size variation was defined as the ratio of the SD to the mean of the sizes of the four delineated lesion boundaries. The mean ± SD of variations of lesion size for all lesions in the first set was 9.1% ± 7.7.

The second set of lesions served as a larger number of samples with which to evaluate the extendability and robustness of the proposed CAD algorithm. The second set of lesions was delineated by only one graduate student (G.S.H.), under the supervision of an attending physician (Y.H.C.) with 22 years of experience in breast sonography. Two of the five graduate students (K.C.H., G.S.H.) were involved in development of the CAD algorithm.

Feature Extraction
Seven morphologic features were extracted from each lesion to account for such sonographic features as shape, contour, and size. Five of these morphologic features were newly developed, including the number of substantial protuberances and depressions (NSPD), lobulation index (LI), elliptic-normalized circumference (ENC), elliptic-normalized skeleton (ENS), and long axis to short axis (L:S) ratio. The other two features were clinically useful indicators (19): depth-to-width (D:W) ratio and size of the lesion.

NSPD.—The spiculation (7) and irregular shape and contour (8) of a lesion are two important sonographic features that characterize a malignant breast lesion. The NSPD is an effective descriptor in a lesion to quantify these two sonographic features. With a geographic analogy, a protuberance and a depression are like a peninsula and a bay, respectively. As an example, Figure 1 shows typical protuberances and depressions in a malignant breast lesion. Since protuberances and depressions may easily result from a wobbly delineation process, as described in the Appendix, only the substantial protuberances and depressions defined by the representative convex and concave points, respectively, were used to characterize a breast lesion.



View larger version (124K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Sonogram shows the inner gray contour as the lesion border and the outer white polygon as the convex hull of the lesion. Protuberances and depressions in the malignant breast lesion are indicated.

 
As derived in the Appendix, given a threshold {theta}p, let {Lambda} = {{lambda}1, {lambda}2,..., {lambda}p} and {Omega} = {{omega}1, {omega}2,..., {omega}d} be the set of representative convex and concave points of a lesion boundary, where p and d are the numbers of points in each set. The NSPD, denoted by NSPD({theta}p), is defined as p + d, where {theta}p {20°, 30°, 40°, 50°, 60°}. Ideally, a malignant breast lesion has a larger NSPD.

LI.—LI was devised to characterize the size distribution of the lobes in a lesion. As illustrated in Figure 2, a lobe is defined as the gray region enclosed by the lesion contour and the dashed line connected by two adjacent representative concave points. The size of the lobe is the area of the gray region. Suppose a breast lesion has Nl lobes and the size of the ith lobe is Ai, i = 1, to Nl. Let Amax and Amin denote the sizes of the largest and the smallest lobes. The LI is then defined as



View larger version (25K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Schematic shows lobes in a lesion, where {{omega}1, {omega}2, {omega}3, {omega}4} are four representative concave points in {Omega} and {A1, A2, A3, A4} are the sizes of the lobes.

 
The LI can correctly characterize a benign lesion with multiple large lobes of similar sizes. This type of benign lesion may be misclassified as a malignant lesion with the NSPD.

ENC.—Anfractuosity is a common morphologic characteristic of malignant lesion boundaries that provides at least two visually appreciable geometric features. One feature is the multiple protuberances and depressions that may be well described with the NSPD. The other feature is the lengthened circumference due to the circuitous boundaries that define the protuberances and depressions. Since the boundary of a smaller lesion would appear to be more winding than that of a larger lesion with the same circumference, the circumference itself is not a good descriptor with which to characterize the anfractuosity of the lesion boundary. Alternatively, a more reasonable approach is quantification of the anfractuosity with the percentage of circumference increment relative to a lesion-dependent baseline. An ideal baseline would be a smooth curve such that the lesion boundary would look like twining around the curve.

To quantify the anfractuosity of a lesion contour, the circumference ratio of the lesion and its equivalent ellipse is proposed, which is termed the ENC. The equivalent ellipse of a lesion (20,21) is an ellipse with the same area and center of mass as those of the lesion when the interiors of the lesion and its equivalent ellipse are both set to the same constant gray level. For instance, Figure 3 shows the equivalent ellipse and boundary of a malignant lesion. Perceptually, one can see that the equivalent ellipse roughly captures the shape of the lesion, and the lesion boundary meanders around the equivalent ellipse.



View larger version (168K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Sonogram shows the equivalent ellipse (thin line) of a malignant breast lesion, the boundary of which is marked by the thick line.

 
ENS.—A skeleton is an effective representation of a region (22) that is used frequently in such areas as computer vision and pattern recognition. Let R denote a region and BR the set of boundary points of the region R. The skeleton of a region, R, is a set of points X that satisfy for each skeleton point x X, where x is within R and there exist at least two boundary points, pi and pj, in BR such that d(x, pi) = d(x, pj) = min {d(x, pk)|pk BR}, where d( · ) is any preferred distance metric (eg, Euclidean, city block). With the malignant breast lesion shown in Figure 3 as an example, the skeleton of the lesion is given in Figure 4.



View larger version (168K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Sonogram shows the skeleton of a malignant breast lesion. The boundary of the lesion is marked by the thick line, and the skeleton is indicated by the thin line segments.

 
The skeleton is sensitive to the anfractuous property of the lesion boundary. The more protuberances and depressions contained in the lesion boundary, the more complex the skeleton is. Therefore, it seems to be reasonable to quantify the shape complexity by the number of points in the skeleton. Nevertheless, the number of skeleton points is also a function of the lesion size. Just as for the ENC, to eliminate the size effect, it is suggested that the number of skeleton points be normalized by the circumference of the equivalent ellipse of the lesion, which gives the ENS.

In addition to these four descriptors—NSPD, LI, ENC, and ENS—which capture the contour and shape characteristics, three more mathematic features are considered to incorporate two clinically useful indicators. The first feature is the D:W ratio of the lesion (11,12,19). The depth and the width of a lesion are the horizontal and vertical edge lengths, respectively, of the minimal circumscribed rectangle of the lesion. The larger the D:W ratio, the more likely the lesion is malignant. Since the D:W ratio may vary with the scanning angle and the compressing pressure, however, we suggest use of another quantity to describe the shape of the lesion, namely, the L:S ratio. The L:S ratio is the length ratio of the major (long) axis to the minor (short) axis of the equivalent ellipse of the lesion. Clearly, the L:S ratio is independent of the scanning angle but may be affected by the compressing pressure. The last feature is the size of the lesion (ie, the area within the lesion boundary). Clinically, the larger the breast lesion, the more likely the lesion is malignant.

Feature Selection and Classification
Feature selection is usually used to select a set of features that potentially yield the best performance with the given classifier. These selected features are referred to as the substantial features. The classifier is then trained with the substantial features to determine the mathematic model that describes the relation of these features.

The substantial features were selected for each training data set on the basis of the forward sequential search approach (23) with the logistic discrimination function (17). To minimize the estimation bias (16), the classification accuracy {eta}(Y) for a feature set Y was evaluated by means of the leave-one-out cross-validation strategy for each training data set. More specifically, set the counter nc = 0, for every li in the training data set {Phi} with m data, then construct the logistic discrimination function with the training data {Phi}i = {Phi} - {li} with the feature set Y. If li is predicted correctly with the derived logistic discrimination function, then increase nc by 1. After all li values have been evaluated, then compute {eta}(Y) = nC/m.

Feature selection was performed in two stages for each training data set. In the first stage, the best NSPD value was selected from five candidate NSPD values that corresponded to five {theta}p values (ie, {theta}p {20°, 30°, 40°, 50°, 60°}). Then, in the second stage, the selected NSPD value along with the other six features were used to select the essential features that yielded the best classification accuracy for the underlying training data set.

The classifier used in the present study is an MFNN. Once the essential features were selected by means of the logistic discrimination function for a set of training data {Phi}, the training data were used to train the MFNN to divide the training data into benign and malignant categories. As depicted in Figure 5, the MFNN used in this study was a two-layer feed-forward neural network with one hidden layer. The number of inputs for the MFNN was set to be the same as the number of essential features, and the number of neurons in the output layer was set to 1 for the underlying two-class classification. Some suggestions were made previously (eg, the Kolmogorov theorem [24]) to determine the number of neurons in the hidden layer, but none of them led to satisfactory performance. Instead, the number of neurons in the hidden layer was determined through exhaustive experiments to be two to 10 neurons. As a result, the number of neurons in the hidden layer was set to two because results with two neurons gave the best performance for almost all cases evaluated.



View larger version (35K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. Two-layer feed-forward neural network used for classification. zi = the ith feature, vij = the weight of the synapse that connects the jth input to the ith neuron in the hidden layer, yi = the ith neuron in the hidden layer, wi = the weight of the synapse that connects the ith neuron in the hidden layer to the output, {varphi} = the sigmoidal activation function, and o = the output.

 
The training algorithm used to train the MFNN was the widely used error back-propagation training algorithm (18). The training data with the essential features were fed into the MFNN in a cyclic manner. For each datum, the estimated output o was computed on the basis of the synaptic weights determined in the previous iterations and the sigmoidal activation function {varphi}. The discrepancy between the desired output and the estimated output was back propagated to modify the synaptic weights until the discrepancy was within the acceptable range. Since the final output o was a number between 0 and 1, a threshold TNN, where NN is neural network, was required to assign the datum to the benign or the malignant category. In the present study, TNN was determined on the basis of the value that resulted in a dichotomization with the best classification accuracy for the training data, or TNN varied from 0 to 1 to generate the ROC curve.

Comparative Performance Analysis
Five experiments were conducted in the present study for performance analysis. Two performance measures were reported for each analysis. One measure was the Az value, which was calculated by using commercially available statistical software (SPSS for Windows, version 10; SPSS, Chicago, Ill). The other measure was the best classification accuracy (TP + TN)/(TP + TN + FP + FN) along with the associated sensitivity (TP/[TP + FN]), specificity (TN/[TN + FP]), positive predictive value (TP/[TP + FP]), and negative predictive value (TN/[TN + FN]), where TP is the number of true-positive findings (ie, a malignant lesion is considered to be malignant); TN, true-negative; FP, false-positive; and FN, false-negative.

For performance comparison, the Az values were used because the best classification accuracy is not necessarily the preferred criterion for classification. Sometimes, one would rather have a higher sensitivity or specificity than have the best accuracy. Except for the third experiment, all performance measures were derived on the basis of the leave-one-out cross-validation strategy.

To evaluate the reliability of the proposed morphologic features, in the first experiment, denoted as C160, the proposed CAD algorithm was evaluated with the four sets of boundaries drawn independently by four people for each of the 160 breast lesions in the first set of US images. To justify the necessity for feature selection, the second experiment, denoted as C160A, repeated the first experiment but without feature selection (ie, all seven features were used by the MFNN classifier). A paired-samples t test was used to test if incorporation of feature selection would yield a better performance (ie, if C160 was significantly better than C160A), with the significance level set at {alpha} = .05.

Results of the third experiment validated the extendibility of the proposed CAD algorithm. Since the two sets of US images used in the present study originated from two archiving media, degradation for the boundary definition of the lesions that was caused by the acquisition procedure or the archiving medium was potentially different for each set. Therefore, these two sets of US images might be considered as samples from two different sample spaces. In this experiment, we attempted to investigate how well the classifier derived on the basis of images from one sample space could be extended to those from the other sample space. Two implementations were performed. One implementation was performed with the first set of US images as the training set and the second set as the prediction set, which was denoted as C271f. In reverse, the other implementation was performed with the second set of US images as the training set and the first set as the prediction set, which was denoted as C271r. The training set was trained on the basis of the leave-one-out cross-validation strategy. Recall that only the first set of US images had four sets of boundaries. Paired-samples t tests were used to test if C271f and C271r had the same or better performance than did C160 and if C271f and C271r had the same performance with the significance level set at {alpha} = .05.

The fourth experiment, denoted as C271LC, was performed to investigate the robustness of the proposed CAD algorithm. That is, we attempted to evaluate how well the performance achieved in the first experiment with a smaller sample size could be reproduced with a larger number of samples from heterogeneous sample spaces. All 271 breast lesions were involved in the fourth experiment, and the leave-one-out cross-validation strategy was used. To validate the robustness of the proposed CAD algorithm, paired-samples t test was used to determine if C271LC had the same performance as C160 with the significance level set at {alpha} = .05.

For comparative study, in the fifth experiment, the proposed CAD algorithm was compared with two previous CAD algorithms with the first set of breast lesions. The first algorithm was proposed by Giger et al (12), which was denoted as LDAGiger. The Giger algorithm included four mathematic features, namely, normalized radial gradient, D:W ratio, coarseness, and the mean gray-level difference between the region of interest within the lesion and that posterior to the lesion, denoted by µ1 - µ2. The classification scheme was the LDA. Since the MFNN is usually superior to the LDA, for a fair comparison with our approach, as a modified implementation, which was denoted as MFNNGiger, the MFNN was used to replace the LDA. The number of neurons in the hidden layer was also determined by searching in the range of two to 10 neurons. As a result, 10 neurons were used in the hidden layer for the modified Giger algorithm.

The second CAD algorithm to be compared with the proposed CAD algorithm was proposed by Chen et al (13), which was denoted as MFNNChen. The Chen algorithm was based solely on a texture feature (ie, normalized autocorrelation coefficients obtained from a rectangular region of interest that enclosed the lesion). The size of the region of interest was 1–2 mm extended from the lesion margin in all directions. The feature vector contained 5 x 5 autocorrelation coefficients. The classifier used by Chen et al (13) was an MFNN—with 25 inputs, 10 hidden nodes, and one output node—that was also trained with the error back-propagation training algorithm.

To test if C160 had significantly better performance than that of LDAGiger, MFNNGiger, and MFNNChen, paired-samples t tests were applied with the significance level set at {alpha} = .05. Moreover, a paired-samples t test was used to determine the relative performance among these three implementations.

In addition to these five experiments, the performance of each individual feature, including the proposed seven features and the Giger features, was evaluated by means of logistic discrimination analysis (17) based on the leave-one-out cross-validation strategy. Paired-samples t tests were used to compare the performances of every pair of individual features. It should be emphasized that all algorithms were evaluated for statistical robustness with four collections of lesion boundaries. As a summary, Table 1 lists the notations for the implementations performed in this study.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Notations Denoting Various Implementations

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Results of the first experiment for the reliability evaluation of the proposed CAD algorithm are reported in Table 2. For each implementation, the Az value and standard error of the ROC curve were listed, as well as the performance data at the best classification accuracy, including accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Table 2 shows that the proposed CAD algorithm has achieved reasonably high performances for all four sets of lesion boundaries in terms of the Az value and the classification accuracy, the means of which, denoted as µC160, were as high as 0.952% and 92.8%, respectively. Moreover, differences in performance induced by the potential variations in boundary definitions of the four sets of lesion boundaries were not significant. The SDs of the Az values and the best classification accuracies, denoted as {sigma}C160, were 0.014% and 1.9%, respectively. On the basis of the experimental results, the proposed CAD algorithm potentially may be capable of tolerating the variations in boundary definition due to manual delineation by different persons.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Results of First Experiment with First Set of Breast Lesions

 
For the second experiment, the mean and SD of the Az value, which are labeled as µC160A and {sigma}C160A, respectively, and the performance data at the best classification accuracy are summarized in Table 3. The test hypothesis and result of the paired-samples t test, labeled as T1, to compare C160 with C160A are given in Table 4. Since P = .020 < {alpha} = .05, the null hypothesis should be rejected (ie, C160 was superior to C160A at the 5% significance level), which justified the advantage of adopting feature selection in the proposed CAD algorithm. The means and SDs of the Az values and the performance data at the best classification accuracy for C271f, C271r, and C271LC are provided in Table 4. These three implementations attained reasonably high performances, which were greater than 0.954% and 91.4% for the mean Az and the mean best accuracy, respectively.


View this table:
[in this window]
[in a new window]

 
TABLE 3. Means and SDs of Performances Achieved in C160A, C271f, C271r, and C271LC

 

View this table:
[in this window]
[in a new window]

 
TABLE 4. Results of Paired-Sample t Tests to Compare Means of Different Implementations on the Basis of the Proposed CAD Algorithms

 
For pairwise performance comparisons between C160 versus C271f, C160-C271r, C271f-C271r, or C160-C271LC, the test hypotheses and results of four corresponding paired-samples t tests—labeled as T2, T3, T4, and T5, respectively—are listed in Table 5. With P = .019 < {alpha} = .05, results of the paired-samples t test T2 suggested that the null hypothesis be rejected, which implied that the performance of C271f was significantly higher than C160 at the 5% significance level. Results of the paired-samples t tests T3T5 suggest that these three null hypotheses be accepted, since all three P values were greater than {alpha} = .05. In other words, at the 5% significance level, the performance of C160 was the same as that of C271r and C271LC, and there was no significant difference between the performances of C271f and C271r. From the test results of T2T4, it might be concluded that in comparison with use of only the first set of US images, the proposed classifier derived on the basis of the images from one sample space might be generalized to the images from another sample space without performance degradation at the 5% significance level. Moreover, the test result of T5 validated that the proposed CAD algorithm was robust in the sense that its performance would be the same for both small and large sample sizes at the 5% significance level.


View this table:
[in this window]
[in a new window]

 
TABLE 5. Most Frequently Selected Features for Each Collection of Lesion Boundaries

 
The significant features selected for classification varied with the collection of lesion boundaries, and, more specifically, they varied with the training set. Table 5 lists the most frequently selected features for each collection of lesion boundaries. Recall that feature selection was performed on the basis of the leave-one-out cross-validation strategy. Given N lesion boundaries in a collection, the frequency of a feature was the number of times that the feature was selected for the N possible training sets in the leave-one-out cross-validation process. From Table 5, it is clear that NSPD was the most important feature, which might be either NSPD and 40° or NSPD and 50°. Furthermore, ENS, ENC, L:S ratio, or D:W ratio might be combined with NSPD to achieve a better performance.

With the first set of US images, the mean and SD of the Az value and the performance data at the best classification accuracy for LDAGiger, MFNNGiger, and MFNNChen are provided in Table 6. The performances attained with LDAGiger and MFNNChen were substantially lower than those reported in references 12 and 13. For pairwise performance comparisons among C160, LDAGiger, MFNNGiger, and MFNNChen, the test hypotheses and results of the paired-samples t tests are given in Table 7. With P values less than {alpha} = 0.05, all null hypotheses should be rejected. It might be concluded that the relative performances of these four algorithms were MFNNChen < LDAGiger < MFNNGiger < C160, where A < B denotes that A is worse than B at the 5% significance level.


View this table:
[in this window]
[in a new window]

 
TABLE 6. Means and SDs of Performances Achieved in LDAGiger, MFNNGiger, and MFNNChen

 

View this table:
[in this window]
[in a new window]

 
TABLE 7. Paired-Sample t Tests to Compare Means of C160, LDAGiger, MFNNGiger, and MFNNChen

 
To evaluate the performance of each individual feature, Figures 6 and 7 show the mean Az and the mean best accuracy for each proposed feature with the first set of US images and all 271 US images, respectively. Results of paired-samples t tests suggested that NSPD, LI, ENS, and ENC are better than lesion size, L:S ratio, and D:W ratio at the 5% significance level in both Figures 6 and 7. For example, consider four pairwise comparisons, including NSPD lesion size, LI lesion size, ENC lesion size, and ENS lesion size. The P values for these four pairs were .01, .043, .025, and .03, respectively, when the first set of US images were used, and they were .005, .02, .017, and .005, respectively, when all 271 US images were used. The null hypothesis for each pair A versus B was µA - µB <= 0, where µA and µB stand for the mean Az values attained with features A and B, respectively. Since all these P values were less than {alpha} = 0.05, NSPD, LI, ENC, and ENS were better than lesion size at the 5% significance level. Furthermore, results of paired-samples t tests suggest that the performances of NSPD, LI, and ENC remained the same for different sample sizes at the 5% significance level. That is, NSPD, LI, and ENC were robust. The P values of these three tests were .204, .405, and .587, respectively. Notably, the mean Az values and the mean best accuracies of NSPD in both Figures 6 and 7 were greater than 0.94 and 0.91, respectively. In particular, use of NSPD or ENS alone could outperform LDAGiger, MFNNGiger, and MFNNChen at the 5% significance level.



View larger version (24K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6. In bar graph, each pair of bars illustrates the mean Az value and mean best classification accuracy achieved with a proposed morphologic feature with the first set of US images. Error bars indicate 1 SD. The first four features (ie, NSPD, LI, ENS, and ENC) are better than the other three at the 5% significance level.

 


View larger version (24K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 7. In bar graph, each pair of bars depicts the mean Az value and mean best classification accuracy achieved with a proposed morphologic feature when applied to four collections of lesion boundaries. Each collection comprised one of the four sets of lesion boundaries in the first set of US images and the lesion boundaries in the second set of US images. Error bars indicate 1 SD. At the 5% significance level, the first four features (ie, NSPD, LI, ENS, and ENC) are better than the other three. Moreover, the performances of NSPD, LI, and ENC are the same with the first set of US images and with all 271 images.

 
Figure 8 demonstrates the mean Az and mean best accuracy for each Giger feature with the first set of US images. On the basis of results of paired-samples t tests, it could be shown that the proposed NSPD, ENC, and ENS were significantly better than all the Giger features, with P values less than .001 when the first set of US images were used. The proposed LI and lesion size had the same performance as normalized radial gradient at the 5% significance level.



View larger version (18K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 8. In bar graph, each pair of vertical bars indicates the mean Az value and mean best classification accuracy achieved with a Giger feature with the first set of US images. Error bars indicate 1 SD. All these features (NRG = normalized radial gradient) are worse than NSPD, ENS, and ENC at the 5% significance level.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
The performance of radiologists in the interpretation of mammograms and breast sonograms is said to be less than ideal (2,7). Although performance can often be improved by having two radiologists review mammograms or two sonologists review US images, this strategy is not easily available. Therefore, varieties of CAD efforts have been attempted in the imaging evaluation of breast diseases. Artificial neural networks have been applied in mammography and proved to be of potential help in the mammographic decision-making task of distinguishing between benign and malignant lesions (25). To our knowledge, however, there is only a limited number of articles that deal with CAD in breast sonography (915), and the previously proposed CAD algorithms are not yet mature for clinical utility.

On the basis of the high performance mean and low performance variation achieved with four sets of lesion boundaries in the first, third, and fourth experiments in the present study, the proposed CAD algorithm was shown to be an effective and robust approach to differentiation of benign from malignant breast lesions. Results of the third and fourth experiments further validated the extendibility and robustness of the proposed CAD algorithm. At the 5% significance level, the promising results obtained in these two experiments suggest that with the proposed CAD algorithm, the classifier trained by the images directly captured and stored in the electronic storage media may be applied to the hard-copy images and vice versa. Moreover, the proposed CAD algorithm was robust in the sense that performance remained the same for both small and large sample sizes.

The high performance of the proposed CAD algorithm resulted mainly from the effective and reliable morphologic features and incorporation of feature selection. Results of the evaluation of each individual feature showed that NSPD, ENC, and ENS were better than all the Giger features and even only one of the first two would outperform LDAGiger, MFNNGiger, and MFNNChen at the 5% significance level. On the other hand, NSPD, LI, and ENC were shown to give reliable performance for different sample sizes. They were intrinsically reliable because a small local variation in contour delineation would not lead to a dramatic change in feature values. For example, the NSPD was used to count NSPDs in a lesion boundary. Reasonable variation in contour delineation might alter the shape of the lesion boundary but would not cause a big change in the NSPD. For LI and ENC, any reasonable variation in local delineation would be diluted by their own normalization factors, which were the mean area of the lobes for LI and the circumference of the equivalent ellipse for ENC. Since these normalization factors were usually on the order of 100 or 1,000, the potential value changes in LI and ENC would be small relative to the dynamic ranges of these four features.

On the basis of the effective features, feature selection was a necessary and beneficial step to further integrate the differential power of each individual feature, while accounting for the problem of "curse of dimensionality." The curse of dimensionality suggests that the sampling density of the training data is too low to promise a meaningful estimation of a high-dimensional classification function with all seven features with the available finite number of training data (16). As verified in the second experiment, performance with feature selection (ie, C160) was superior to that without feature selection (C160A). It should be emphasized that feature selection is basically a learning process, and the best features vary as the training data change. This finding means that for a practical CAD system, feature selection should be performed frequently to allow learning from the changing training data sets.

The proposed CAD algorithm was shown to be better than the algorithms of Giger et al (12) and Chen et al (13). Setting dependence is one of the major reasons that the previous CAD algorithms are impractical for clinical use in the differentiation of benign from malignant breast lesions. This problem is particularly serious for those algorithms based on the regional features. For example, the Chen algorithm (13) was able to attain Az values and classification accuracy as high as 0.956 and 95%, respectively, when the system setting was basically fixed. Results of the fifth experiment, however, showed that the Chen algorithm performed poorly with the first set of US images, which were acquired without any constraint imposed on the system setting. Any nonlinear change in the system setting may cause a nonnegligible variation in the normalized autocorrelation coefficients used with the Chen algorithm, even for the same lesion.

Similarly, the Giger algorithm (12) also had the setting-dependence problem because of the two regional features involved (ie, coarseness and mean gray-level difference between the region of interest within the lesion and that posterior to the lesion [µ1 - µ2]). Worse than the normalized autocorrelation coefficients, these two regional features may give different values, even with a linear change in the system setting. On the other hand, the morphologic feature of normalized radial gradient was sensitive to the local delineation as a result of the gradient type of information. That is, a small zigzag in the contour might result in a drastic change in the gradient. The experimental result showed that the performance of normalized radial gradient was substantially worse than that of NSPD, ENS, and ENC. Figure 8 reveals that none of these four features could provide sufficient differential power by itself.

The setting dependence of the regional features and the high sensitivity to the local delineation might account for the discrepancy between the high performance reported by Giger et al (12) and the low performance achieved in our fifth experiment. Although the performance improved with MFNN as the classifier, the best performance of the Giger mathematic features was still inferior to that of the proposed CAD algorithm. Moreover, the reliability of the Giger features remained questionable because of their setting and operator dependence.

The bar graphs in Figures 6 and 7 suggest that the mathematic features based on the aspect ratio of the lesion (ie, D:W ratio and L:S ratio) were not effective for differentiating a malignant from a benign lesion, though D:W ratio is considered as a clinically useful indicator (19). In particular, the L:S ratio was devised to eliminate the dependence on the scanning angle that is inherent in the D:W ratio. The low classification accuracy of the L:S ratio seemed to imply that the aspect ratio of a lesion is not a useful indicator for lesion malignancy.

In conclusion, setting independence is clearly a crucial property for a CAD algorithm to be used in practice. To assist differential diagnosis of benign and malignant breast lesions without imposing constraints on system settings, we propose, on the basis of findings in the present study, a new CAD algorithm with nearly setting-independent morphologic features and an artificial neural network as the classifier. The proposed morphologic features were by no means comprehensive, though the experimental results supported that NSPD, LI, and ENC are effective and reliable. We believe that further exploration of the setting-independent regional features that may faithfully characterize echotexture, sound transmission, and angular margin would be required to form a complete set of mathematic features for CAD of breast lesions.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Suppose the contour of a breast lesion is composed of N consecutive discrete points. Let pi stand for the ith point in the contour of a breast lesion, assuming that the points in the contour are numbered clockwise. For each point pi, the depression depth hi is the shortest distance from pi to the convex hull of the breast lesion. The convex hull of a breast lesion is the smallest convex set of points that enclose the lesion, as depicted in Figure 1. Denote i as the normal vector of pi in the contour of a breast lesion. The k-curve angle of pi in the contour is defined as {theta}i(k) = sgn( i+k x i-k)cos-1( i+k · i-k), where i+k x i-k is the outer product of i+k and i-k, i+k · i-k is the inner product of i+k and i-k, and sgn takes the polarity of i+k x i-k, which would be positive if i+k x i-k points upward.

A point pi is defined as a convex point if {theta}i(k) >= {theta}p, where {theta}p is a prespecified positive threshold value. Similarly, a point pi is defined as a concave point if hi >= {surd}2 pixels and hi is the local maximum among the neighborhood of pi and {theta}i(k) <= {theta}d, where {theta}d is a prespecified negative threshold value. The threshold {surd}2 pixels, which is the distance between two diagonal pixels, is set to eliminate undesirable depression caused by the unsteady delineation process. If two consecutive convex points do not have any concave point in between, the one with the smaller k-curve angle is eliminated. Likewise, if two consecutive concave points do not have any convex point in between, the one with the smaller depression depth is removed. Let {Lambda} = {{lambda}1, {lambda}2,..., {lambda}p} and {Omega} = {{omega}1, {omega}2,..., {omega}d} be the set of points after redundant points have been removed, where p and d are the numbers of points in each set. Then, each point {lambda}j in {Lambda} is called a representative convex point that defines a substantial protuberance and each point {omega}j in {Omega} is called a representative concave point that defines a substantial depression.

Empirically, {theta}d was determined in consideration of two conflicting observations. On one hand, it is common to find a depression with a slowly varying contour (ie, the k-curve angle is small) so that {theta}d should be kept as small as possible. On the other hand, it is easy to generate a depression with a small k-curve angle and a small depression depth simply owing to a wobbly delineation process, which may be considered as a noise. As a compromise, {theta}d was set to -20°, which tolerated 10° of aberration from the ideal contour for each side of a concave point.

To determine k, consider a depression with the smallest possible depression depth (ie, hi = {surd}2 pixels). One may easily obtain that k {approx} 8 pixels for {theta}d = -20° by approximating the depression as a triangle and determining k with the Pythagorean theorem. When the discrete property of a digital image is taken into account, it is appropriate to use either k = 7 or 8 to evaluate a depression under the lower-bound condition (ie, given hi = {surd}2 pixels and {theta}d = -20°). In this study, k was set to 7 to allow the smaller depressions and protuberances. Since there was no reasonable constraint for {theta}p, we decided to determine {theta}p through learning from data. More specifically, five {theta}p values were considered in this study (ie, {theta}p {20°, 30°, 40°, 50°, 60°}), the best of which was determined by using feature selection.


    FOOTNOTES
 
Abbreviations: Az = area under the ROC curve, CAD = computer-aided diagnosis, D:W = depth to width, ENC = elliptic-normalized circumference, ENS = elliptic-normalized skeleton, LDA = linear discriminant analysis, LI = lobulation index, L:S = long axis to short axis, MFNN = multilayer feed-forward neural network, NSPD = number of substantial protuberances and depressions, ROC = receiver operating characteristic

Author contributions: Guarantors of integrity of entire study, C.M.C., Y.H.C.; study concepts and design, C.M.C., Y.H.C.; literature research, C.M.C., Y.H.C., K.C.H., G.S.H.; clinical studies, Y.H.C., C.M.T., H.J.C., S.Y.C.; experimental studies, C.M.C., K.C.H., C.M.T., H.J.C., S.Y.C.; data acquisition, K.C.H., G.S.H., C.M.T., H.J.C., S.Y.C.; data analysis/interpretation, C.M.C., Y.H.C., K.C.H., G.S.H.; statistical analysis, C.M.C.; manuscript preparation, definition of intellectual content, editing, revision/review, and final version approval, C.M.C., Y.H.C.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 

  1. Pisani P, Parkin DM, Ferlay J. Estimates of the worldwide mortality from 18 major cancers in 1985: implications for prevention and projections of future burden. Int J Cancer 1993; 55:891-903.[Medline]
  2. Brown ML, Houn F, Sickles EA, Kessler LG. Screening mammography in community practice: positive predictive value of abnormal findings and yield of follow-up diagnostic procedures. AJR Am J Roentgenol 1995; 165:1373-1377.[Abstract/Free Full Text]
  3. Zheng Y, Greenleaf JF, Gisvold JJ. Reduction of breast biopsies with a modified self-organizing map. IEEE Trans Neural Networks 1997; 8:1386-1396.[CrossRef]
  4. Fornage BD, Sneige N, Faroux MJ, Andry E. Sonographic appearance and ultrasound-guided fine-needle aspiration biopsy of breast carcinomas smaller than 1cm. J Ultrasound Med 1990; 9:559-568.[Abstract]
  5. Bassett LW, Ysrael M, Gold RH, Ysrael C. Usefulness of mammography and sonography in women less than 35 years of age. Radiology 1991; 180:831-835.[Abstract/Free Full Text]
  6. Jackson VP. The role of US in breast imaging. Radiology 1990; 177:305-311.[Free Full Text]
  7. Stavros AT, Thickman D, Rapp CL, Dennis MA, Parker SH, Sisney GA. Solid breast nodules: use of sonography to distinguish between benign and malignant lesions. Radiology 1995; 196:123-134.[Abstract/Free Full Text]
  8. Skaane P, Engedal K. Analysis of sonographic features in the differentiation of fibroadenoma and invasive ductal carcinoma. AJR Am J Roentgenol 1998; 170:109-114.[Abstract/Free Full Text]
  9. Goldberg V, Manduca A, Ewert DL, Gisvold JJ, Greenleaf JF. Improvement in specificity of ultrasonography for diagnosis of breast tumors by means of artificial intelligence. Med Phys 1992; 19:1475- 1481.[CrossRef][Medline]
  10. Garra BS, Krasner BH, Horii SC, Ascher S, Mun SK, Zeman RK. Improving the distinction between benign and malignant breast lesions: the value of sonographic texture analysis. Ultrason Imaging 1993; 15:267-285.[CrossRef][Medline]
  11. Ruggiero C, Bagnoli F, Sacile M, Rescinito CG, Sardanelli F. Automatic recognition of malignant lesions in ultrasound images by artificial neural networks In: Proceedings of the 20th Annual International Conference of the IEEE/EMBS. New York, NY: Institute of Electrical and Electronic Engineers, 1998; 872-875.
  12. Giger M, Al-Hallaq H, Hui Z, et al. Computerized analysis of lesions in US images of the breast. Acad Radiol 1999; 6:665-674.[CrossRef][Medline]
  13. Chen DR, Chang RF, Huang YL. Computer-aided diagnosis applied to US of solid breast nodules by using neural networks. Radiology 1999; 213:407-412.[Abstract/Free Full Text]
  14. Chen DR, Chang RF, Huang YL. Breast cancer diagnosis using self-organizing map for sonography. Ultrasound Med Biol 2000; 26:405-411.[CrossRef][Medline]
  15. Chang RF, Kuo WJ, Chen DR, Huang YL, Lee JH, Chou YH. Computer-aided diagnosis for surgical office based breast ultrasound. Arch Surg 2000; 135:696-699.[Abstract/Free Full Text]
  16. Cherkassky V, Mulier F. Learning from data: concepts, theory, and methods New York, NY: Wiley, 1998.
  17. Dillon WR, Goldstein M. Multivariate analysis: method and applications New York, NY: Wiley, 1984.
  18. Zurada JM. Introduction to artificial neural systems Boston, Mass: PWS, 1992.
  19. Tohno E, Cosgrove DO, Sloane JP. Ultrasound diagnosis of breast diseases Edinburgh, Scotland: Churchill Livingstone, 1994; 50-73.
  20. Prokop RJ. The technique of standard moments for global feature object representation. Thesis Ithaca, NY: Cornell University, 1990.
  21. Reeves AP, Prokop RJ, Anfrews SE, Kuhl FP. Three-dimensional shape analysis using moments and Fourier descriptors. IEEE Trans PAMI 1988; 10:937-943.
  22. Haralick RM, Shapiro LG. Computer and robot vision New York, NY: Addison-Wesley, 1993.
  23. He DC, Wang L, Guibert J. Texture discrimination based on an optimal utilization of texture features. Pattern Recognition 1989; 21:141-146.[CrossRef]
  24. Tsoukalas LH, Uhrig RE. Fuzzy and neural approaches in engineering New York, NY: Wiley, 1997.
  25. Wu Y, Giger ML, Doi K, Vyborny CJ, Schmidt RA, Metz CE. Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer. Radiology 1993; 187:81-87.[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
J Ultrasound MedHome page
J.-W. Jeong, D. C. Shin, S.-H. Do, C. Blanco, N. E. Klipfel, D. R. Holmes, L. J. Hovanessian-Larsen, and V. Z. Marmarelis
Differentiation of Cancerous Lesions in Excised Human Breast Specimens Using Multiband Attenuation Profiles From Ultrasonic Transmission Tomography
J. Ultrasound Med., March 1, 2008; 27(3): 435 - 451.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
B. Sahiner, H.-P. Chan, M. A. Roubidoux, L. M. Hadjiiski, M. A. Helvie, C. Paramagul, J. Bailey, A. V. Nees, and C. Blane
Malignant and Benign Breast Masses on 3D US Volumetric Images: Effect of Computer-aided Diagnosis on Radiologist Accuracy
Radiology, March 1, 2007; 242(3): 716 - 724.
[Abstract] [Full Text] [PDF]


Home page
J Ultrasound MedHome page
C. M. Sehgal, T. W. Cary, S. A. Kangas, S. P. Weinstein, S. M. Schultz, P. H. Arger, and E. F. Conant
Computer-Based Margin Analysis of Breast Sonography for Differentiating Malignant and Benign Masses
J. Ultrasound Med., September 1, 2004; 23(9): 1201 - 1209.
[Abstract] [Full Text] [PDF]


This Article