|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Breast Imaging |
1 From the Department of Radiology, University of Pittsburgh and Magee-Womens Hospital of the University of Pittsburgh Medical Center, 300 Halket St, Imaging Research, Suite 4200, Pittsburgh, PA 15213-3180. Received March 9, 2004; revision requested May 11; final revision received July 21; accepted August 4. Supported in part by grants CA77850 and CA84241 from the National Cancer Institute, National Institutes of Health. Address correspondence to D.G. (e-mail: gurd@upmc.edu).
| ABSTRACT |
|---|
|
|
|---|
MATERIALS AND METHODS: Institutional review board approved study protocol, and informed consent was waived. All screening mammograms that had been interpreted by MQSA-qualified radiologists between January 1, 2001, and March 31, 2003, were reviewed. Group recall rates, biopsy rates, and detected cancer rates for nine calendar quarters were computed and attributed to performance date of original screening mammogram. Type of biopsy performed was classified as follows: stereotactic vacuum-assisted biopsy, ultrasonography (US)-guided core biopsy, US-guided fine-needle aspiration biopsy, surgical excision, and multiple biopsies.
2 Test for trend (two sided) and linear regression were used to assess trends over time for recall and biopsy rates, biopsy rates according to type of biopsy performed, and percentage of biopsy results positive for cancer.
RESULTS: Group recall rate did not show a statistically significant trend during period studied (P = .59). Biopsy rates increased significantly from 13.02 to 20.12 per 1000 screening examinations (P < .001). A corresponding substantial decrease was seen in percentage of biopsies in which malignancy was found, although this trend was not statistically significant (P = .24). A significant increase (from 4.72 to 9.88 per 1000 screening examinations) was found in rate of stereotactic vacuum-assisted 11-gauge core biopsies performed (P < .001).
CONCLUSION: Observed increase in biopsy rates reinforces the need to carefully select patients for biopsy to achieve efficient, efficacious, and cost-effective programs for early detection of breast cancers.
© RSNA, 2005
| INTRODUCTION |
|---|
|
|
|---|
Efforts to decrease unnecessary breast biopsies decrease the overall cost of screening mammography and decrease patient anxiety and discomfort. Computer-aided diagnosis (CAD) systems are currently being developed to help radiologists determine whether a region that has been identified as suspicious at mammography is likely to represent a benign or a malignant lesion (46). Several studies have been conducted to assess factors that may influence reader performance as measured by means of recall rates, biopsy rates, and breast cancer detection rates (712). Observer-related factors include reader experience (7,8), reader volume (9), and reader subspecialty (10), among others. In one recent Finnish report (13), it was noted that after 11 years of screening mammography, the ratio of malignant to benign biopsies had quadrupled; this was attributed in part to the increased experience and training of participating radiologists. Other recent studies (14,15) suggest that reader performance is strongly related to a number of factors, including minimum volume, association with academic institutions, access to audit materials, and educational opportunities. There are only limited data examining whether this is actually the case in the clinical environment (16).
Thus, the purpose of our study was to retrospectively evaluate whether the recall and biopsy rates for a group of mammography readers in an academic setting who met the standards of the Mammography Quality Standards Act of 1992 (MQSA) demonstrated any change over time. In addition, we also retrospectively evaluated the percentage of positive biopsy results, according to type, during a 27-month period (nine consecutive calendar quarters).
| MATERIALS AND METHODS |
|---|
|
|
|---|
The data sources used in our analysis are databases of procedure scheduling, procedure completion, radiology reporting, and procedure-related outcomes as determined from relevant pathology reports. These databases have been assembled from original reports for several reasons, including but not limited to quality assurance purposes that are required by the MQSA (19,20). The databases were reviewed by an independent information systems person, whose task is to perform queries on the system databases, and by two members of our research team (L.A.H., A.H.K.). The reporting procedures used in our practice remained the same throughout the study period.
Definitions and Data Collection
For this study, we define recall cases to be those cases for which further evaluation (eg, additional mammographic views, breast US) was recommended during the interpretation of the screening mammograms. These mammograms were classified as Breast Imaging Reporting and Data System (BI-RADS) category 0. We excluded recommendations for recall made for technical reasons (eg, suboptimal exposure, motion artifacts), which account for approximately 1% of the cases. However, recalls due to palpable findings during clinical breast examinations that are routinely performed in all patients by the mammography technologists before screening mammography were included and amount to approximately 1% of recall cases. In general, these cases were recalled regardless of the imaging findings (negative or positive) and were assumed to be proportional to the total volume of mammograms being interpreted during each period. Recalls due to unavailability of previously obtained mammograms for comparison during the scheduled interpretation (<4%) were included in the study because these examinations are interpreted as soon as comparison images are made available, and the number of these examinations is assumed to be distributed proportionately to the volume of mammograms read during each period. Time intervals for the analyses were defined according to calendar quarters.
Recall rates for each quarter were computed directly from mammography interpretation records (ie, number of screening mammograms given a BI-RADS rating of 0 after exclusion of "technical recalls" from the total number of screening mammograms obtained). The cases that ultimately resulted in a biopsy were included in our biopsy count, and a biopsy that resulted in a pathologically verified cancer was attributed to the screening mammographer as a "positive cancer detection." Recalls, biopsies, and detected cancers were attributed to the performance date of the original screening mammogram. In addition, we classified biopsies according to the type of biopsy performed. Biopsy procedures were classified (informatics person, A.H.K.) into the following categories: US-guided 14-gauge core biopsy, US-guided fine-needle (either 21- or 19-gauge) aspiration biopsy (US-guided cyst aspiration for symptomatic relief is not considered a biopsy at our institution and is not included herein), stereotactic (11-gauge) vacuum-assisted core biopsy, surgical excision biopsy, and multiple biopsies in patients who underwent more than one type of invasive procedure, typically on the same day.
Statistical Analysis
On the basis of the criteria stated earlier, data were collected and tallied for each calendar quarter (A.H.K.). Recall and biopsy rates, percentage of malignant biopsies, and biopsy rates according to type of biopsy were then calculated for each calendar quarter (A.H.K., D.G.). We used the
2 test for trend (two sided) to assess whether a trend existed over time among the recall and biopsy rates, biopsy rates by type, and percentage of malignant biopsies. We used linear regression analyses to illustrate the direction of the trend. Linear regression analyses were done by using Stata software (Intercooled Stata 7.0; Stata, College Station, Tex). We used a Yates-corrected
2 test (two sided) to test for differences between more experienced radiologists (more than 10 years of experience) and less experienced radiologists (10 years of experience or less) with regard to cancer detection rates. All statistical tests were two sided, and a statistically significant difference was indicated when the P value was less than .05. Changes (slopes) were estimated and reported in terms of increases (decreases) per 1000 screening examinations per calendar quarter. We assumed for the purpose of this effort that number of patients lost to follow-up remained constant during the period in question.
| RESULTS |
|---|
|
|
|---|
|
|
2 test for trend, we found no significant trend for recall rate during the study period (P = .59). However, the increasing trend for biopsy rate was significant (P < .001), and a corresponding, albeit nonsignificant, decreasing trend was found for the percentage of malignant biopsies (P = .24). For biopsies according to type, the
2 test for trend revealed a significant increasing trend for stereotactic and surgical biopsies (P < .001 and P = .01, respectively). A borderline significant increasing trend was found for US-guided core biopsy (P = .05), and US-guided fine-needle aspiration and multiple biopsies had no significant trend (P = .80 and P = .90, respectively).
A linear least squaress fit between biopsy rate as a function of the number of screening examinations performed in each calendar quarter (Fig 1) had a slope of 0.72 per 1000 screening examinations per quarter (95% confidence interval [CI]: 0.26, 1.19). A corresponding negative slope (Fig 2) for the fraction of malignant biopsies was 0.51% per quarter (95% CI: 1.26, 0.25). Table 3 provides the slopes and corresponding 95% CIs for linear least squares fits for each type of biopsy. Stereotactic vacuum-assisted biopsies (Fig 3) had the highest slope of 0.44 per 1000 examinations per quarter (95% CI: 0.16, 0.73). The second highest slope we observed (0.18 per 1000 examinations per quarter) was for US-guided core biopsy. Sixty-one percent (44 of 72) of the total change was due to an increase in the number of stereotactic vacuum-assisted biopsies. It is interesting that most (nine of 14) of the surgical biopsies performed during the first quarter of 2003 (Table 2) resulted from the detection of microcalcifications, several of which were too small and faint to permit adequate targeting for stereotactic biopsy. This substantial increase in surgical biopsies during the first quarter of 2003 affected the findings of significant increasing trend (with use of the
2 test for trend).
|
|
|
|
2 test (two sided), we found no statistically significant difference in malignant cancer detection rates between radiologists who had 10 years of experience or less and those who had more than 10 years of experience (P = .32). | DISCUSSION |
|---|
|
|
|---|
Clearly, an observational-type study such as ours has several inherent limitations. These include, but are not limited to, the population being studied (eg, the fraction of repeat examinations vs first screening procedure), the incomplete data ascertainment due to loss to follow-up if a woman chooses to undergo biopsy at another institution or ignore the recommendation altogether, and the generally changing environment. As important, perhaps, is the limitation that, despite the fact that all of our radiologists can be considered well-trained academic mammographers, the population of radiologists is not constant over time, and the diagnostic procedures that follow a recommendation for recall are often performed by a different radiologist than the one who interpreted the screening mammogram.
Obviously, the data presented herein are related to our own practice and our own group of radiologists and should not be used for extrapolation to other practices. Most of these limitations would likely result in adjustments to the observations reported herein that could be open to criticism. Hence, we chose to report our observations "as is" and not to engage in "hypothetical modeling" and corrections and/or adjustments. In particular, we assumed that the characteristics of our population of screened women remained relatively stable and the "loss to follow-up" was proportional to the volumes being performed and did not change substantially during the period in question. We have shown that, for high-volume well-trained radiologists in our group, cancer detection rates are related to recall rates (18); hence, we expected that an increase in the number of biopsies would result in a similar increase in the number of cancers detected. Because the focus of this work is related to an assessment of relative changes (if any) over time and because our technology, practices, and verification protocols during the period in question were consistent, any biases due to these changes or "incomplete" or "underestimated" cancer rates suggested in a recent editorial (21) are not likely to have a substantial effect on the relative observations we made.
Optimal screening mammography programs should maintain high cancer detection rates (sensitivity) while minimizing the number of women recalled for additional imaging procedures and biopsies because those add substantial financial and psychologic costs. The main topic of this study was assessment of the latternamely, trends, if any, in biopsy rates. As invasive procedures do, biopsies are likely to prompt more anxiety, cause more discomfort, and carry more risk (albeit not as substantial in the breast as in other organs, perhaps) than do noninvasive mammography and breast US. Biopsy of "too few" suspicious findings will decrease the number of breast cancers detected, with the decrease most likely occurring in the detection of small (hence, "earlier stages of") breast cancers. In general, these are the most treatable cancers and are the most desirable to identify in a screening program. Conversely, biopsy of "too many" negative or benign findings is expensive financially and emotionally, without a corresponding gain in cancer detection. Currently, in the United States, guidelines recommended by the Agency of Health Care Policy and Research in the interpretive performance of mammography (22) recommend that 25%40% of biopsies performed should prove to be malignant.
In our study, a statistically significant increase in the group biopsy rate over time was found, without a corresponding increase in cancer detection rate, and a significant fraction of the increase occurred in the rate of stereotactic (11-gauge) vacuum-assisted core biopsies. We suspect that at least some of this increase may be attributed to the common use of CAD in our clinical screening mammography practice. CAD is highly sensitive in the detection of subtle microcalcifications, and its routine use may have lowered the threshold of the radiologists for recommending further evaluation of subtle calcifications. When radiologists believe that the detected calcifications are sufficiently suspicious to warrant biopsy, at our institution, this is typically done with stereotactic (11-gauge) vacuum-assisted core biopsy. This possible "chasing" of the subtle microcalcification may vary substantially among radiologists. At the same time, the trend we observed toward increasing the number of US-guided core biopsies suggests that the use of CAD is not the only reason for the increase in biopsy rates we observed. We emphasize that this observation was made in one practice with one type of radiologist and for a single population of women; hence, generalization of our results to other practices may be limited. More data are needed in this regard. Although the precise reasons for the observed increase in our biopsy rate may never be known, the mere knowledge that this increase has occurred without the expected corresponding increase in breast cancer detection rates is important. It reinforces the need to carefully select cases for biopsy so that the goal of screening mammographynamely, an efficient, efficacious, and cost-effective practice for early detection of breast cancercan be achieved.
| FOOTNOTES |
|---|
Authors stated no financial relationship to disclose.
Author contributions: Guarantors of integrity of entire study, D.G., A.H.K.; study concepts, L.P.W., D.G.; study design, D.G., A.H.K., L.A.H., J.H.S.; literature research, A.H.K.; clinical studies, L.P.W., L.A.H., R.S., G.S.A., J.H.S.; experimental studies, A.H.K., L.A.H., D.G.; data acquisition, A.H.K.; data analysis/interpretation, D.G., A.H.K., L.A.H.; statistical analysis, A.H.K., D.G.; manuscript preparation and definition of intellectual content, all authors; manuscript editing, D.G., L.A.H., J.H.S.; manuscript revision/review and final version approval, all authors
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. S. Burnside, D. L. Rubin, J. P. Fine, R. D. Shachter, G. A. Sisney, and W. K. Leung Bayesian Network to Predict Breast Cancer Risk of Mammographic Microcalcifications and Reduce Number of Benign Biopsy Results: Initial Experience Radiology, September 1, 2006; 240(3): 666 - 673. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Plevritis, A. W. Kurian, B. M. Sigal, B. L. Daniel, D. M. Ikeda, F. E. Stockdale, and A. M. Garber Cost-effectiveness of screening BRCA1/2 mutation carriers with breast magnetic resonance imaging. JAMA, May 24, 2006; 295(20): 2374 - 2384. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Ellis, D. Gur, A. H. Klym, and J. H. Sumkin Regarding Trends in Recall, Biopsy, and Positive Biopsy Rates for Screening Mammography Radiology, January 1, 2006; 238(1): 375 - 376. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |