Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2292021585
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Beam, C. A.
Right arrow Articles by Weinstein, S. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beam, C. A.
Right arrow Articles by Weinstein, S. P.
(Radiology 2003;229:534-540.)
© RSNA, 2003


Breast Imaging

Evaluation of Proscriptive Health Care Policy Implementation in Screening Mammography1

Craig A. Beam, PhD, Emily F. Conant, MD, Edward A. Sickles, MD and Susan P. Weinstein, MD

1 From the H. Lee Moffitt Cancer Center & Research Institute, 12902 Magnolia Dr, Tampa, FL 33612-9497. From the 2002 RSNA scientific assembly. Received December 4, 2002; revision requested February 6, 2003; final revision received May 13; accepted May 19. Supported by National Cancer Institute grant CA-74110. Address correspondence to C.A.B. (e-mail: beamca@moffitt.usf.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
PURPOSE: To evaluate the potential effect of proscriptive health care policies directed toward improving screening mammogram interpretation in the United States.

MATERIALS AND METHODS: Percentiles of accuracy based on a random sample of 110 U.S. radiologists were used to examine the number of radiologists who would need to be restricted from providing mammographic interpretation to increase median accuracy from 66% to 67%, 71%, and 76%. In addition, reading volume data recorded for the sampled readers were used to project the percentage reduction in service volume (mammograms per year) that would result from restriction. Characteristics of participating radiologists were compared with those of nonparticipating radiologists by using {chi}2 testing and analysis of variance to assess the external validity of the results.

RESULTS: To increase median accuracy by 1% (from 66% to 67%) would require prohibiting about 2,200 U.S. radiologists (ie, the 11% in the lowest quantile for accuracy) from performing mammographic interpretation and would result in a reduction of yearly service volume of approximately 10%. An increase in median accuracy of 5% (to 71%) would require prohibiting about 6,000 U.S. radiologists (ie, 30%) from performing this service, with an accompanying volume reduction of 25%. An increase in median accuracy of 10% (to 77%) would require prohibiting about 11,400 practicing U.S. radiologists (ie, 57%) from performing this service and would diminish the national service capacity by 50%.

CONCLUSION: These data show that implementation of proscriptive health care policies based on accuracy would diminish the service capacity of screening mammography in the United States.

© RSNA, 2003

Index terms: Breast radiography, utilization, 00.11 • Diagnostic radiology, observer performance • Radiology and radiologists • Radiology and radiologists, departmental management


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Results of studies have shown that a breast imager’s experience in terms of the number of cases interpreted per year or his or her participation in specialized mammography training or continuing medical education is just one of the factors involved in determining a radiologist’s interpretative skills (13). For example, results of a study performed by Nodine and colleagues (4) suggest that the quality of feedback is as important as the quantity of feedback in learning from experience in mammography. Most likely, the phenomenon of expertise in mammography is multifactorial and complex. Results of a recent population study confirm this assertion (5).

Most recently, the factor of the minimum requirement of cases has been discussed. As compared with that in countries with mass screening programs, the caseload requirement for the American radiologist is minimal. In Sweden, mass screening is performed at select sites, with only expert radiologists (those specializing in breast imaging) interpreting the images (1). In the United Kingdom, where high-volume screening programs exist, the radiologist must interpret a minimum of 5,000 mammograms per year (6). In the Canadian province of British Columbia, the recommended minimum number of cases interpreted per year is 2,500 (3). Although this is half the number required in the United Kingdom, it is more than five times the recommended number in the United States. In the United States, the Food and Drug Administration (FDA), in enforcing the Mammography Quality Standards Act, requires every radiologist to read a minimum of 960 mammograms during a 24-month period (7).

Although volume has been cited as an important factor in improving the sensitivity and specificity of mammography (1,2), it is important to note that radiologists in the United States work in different financial and legal environments than do radiologists in countries with socialized medicine (8). Therefore, a simple comparison of specificity and sensitivity between radiologists in different countries that is based on volume of cases may not be valid. Additionally, patients in the United States demand convenient access to medical care (8) and may not be willing to travel to specialized mammography centers where high volumes of mammograms are batched for interpretation.

Given the uncertain connection between volume and skill, we might wonder whether qualifying radiologists on the basis of volume is a reliable foundation for an effective health care policy. One might ask instead, Why not qualify radiologists directly on the basis of ability?

The idea of qualifying examinations is not new to the field of medicine. However, such examinations rarely serve a proscriptive function. For example, the subspecialty board examination in radiology tests for a basic fund of knowledge in the field, but not passing the examination does not necessarily prohibit the physician from practicing radiology in the United States. The PERFORMS 2 test, administered in the United Kingdom by the National Health Service Breast Screening Program, is taken electively by radiologists and serves as a teaching tool as well as a skill-assessment tool. Recently, the American College of Radiology Committee on Mammography Interpretive Skills Assessment introduced a similar voluntary self-evaluation test with feedback and a scoring system that enables a radiologist to compare his or her scores with those of other radiologists across the country. However, this test was not designed to be used as a proscriptive health care policy tool.

Currently in the United States, proscriptive policies limiting the practice of medicine to only those who pass an examination do not exist. The implementation of proscriptive policies for American radiologists could conceivably improve the accuracy of mammographic interpretations while decreasing both patient recall and false-positive biopsy rates. Yet the very nature of proscription is to restrict access, and we must simultaneously be concerned with the potential for reduction in volume and services that such a policy might bring to mammographic screening in the United States. In sum, restricting the number of radiologists who can interpret mammograms may also restrict the access of American women to mammography. An important question is, therefore, what cost will there be in terms of access to qualify radiologists on the basis of their skill?

The purpose of our study was to evaluate the potential effect of proscriptive health care policies directed toward improving screening mammogram interpretation in the United States.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Our data come from a national random sample of radiologists who interpret screening mammograms at FDA-accredited facilities in the United States. Our study was approved by our local institutional review board. Each of 110 randomly selected radiologists interpreted the same set of 148 mammographic cases, which included randomly sampled biopsy-proved breast cancers or benign lesions, as well as findings that were confirmed to be normal on the basis of at least 2 years of follow-up. Original film mammograms were interpreted by using Breast Imaging Recording and Data System (BI-RADS) assessment categories. Results of comparison examinations were provided. Although no formal signed consent process was required by the institutional review board, radiologists were fully informed, prior to their participation in the study, of the study aims and of measures to be followed to ensure their anonymity. Per institutional review board approval, informed consent for inclusion of the mammograms used in our study was waived.

Radiologists
Radiologists were recruited to participate in the Variability in Diagnostic Interpretation, or VIDI, screening mammography study (9). VIDI is a research program devoted to the population-based assessment of interpretation variability in diagnostic medicine. Participants for the screening mammography study were from randomly sampled mammography facilities accredited by the FDA as of January 1, 1998. Stratified random sampling of the 9,916 geographically contiguous accredited facilities ensured approximately equal representation across four geographic regions defined by the U.S. Census Bureau. Facilities in each of the four geographic regions were additionally stratified according to the minority composition of local screening populations to yield a total of eight strata. Minority composition was categorized on the basis of the percentage of minorities in the population in the zip code area of the facility (obtained from U.S. Census Bureau reports) as either less than 50% nonwhite or more than 50% nonwhite. Thus, stratified sampling was performed and yielded approximately equal numbers of facilities within each of the eight strata. On average, each facility we sampled reported having two radiologists involved in mammography, and, thus, we estimated there to be approximately 20,000 radiologists in the U.S. population.

All radiologists at each randomly sampled facility were invited to participate. The procedure for recruitment began with a letter to the lead interpreting physician at a sampled facility that asked him or her to distribute our recruitment material to all radiologists who interpret mammograms for their facility. In this way, we sampled not only permanent faculty members but also locum tenens radiologists. The recruitment material explained the study and the requirements for and benefits of participation in the study and asked the radiologists whether they would be willing to participate if randomly sampled. In all, 412 radiologists were contacted, and 292 (71%) expressed willingness to participate in the study if sampled. These 292 radiologists, grouped by facility, provided our frame for random sampling. Again, we sampled facilities (and, hence, willing radiologists within facilities) within the strata formed by geographic region and minority composition to arrive at approximately equal numbers of radiologists per stratum.

Cases
One hundred forty-eight index mammography cases were randomly selected from the records of a large screening program affiliated with the University of Pennsylvania; these 148 cases represented results of examinations performed between 1993 and 1997. All mammograms selected for this study were reviewed for quality (positioning, compression, exposure level, contrast, and artifacts) by E.F.C., who is director of the Breast Imaging Program at the University of Pennsylvania. No cases were rejected because of poor technical quality.

Sample cases were stratified on the basis of disease status (ie, with cancer or cancer free), which was determined at biopsy or after a minimum follow-up period of 2 years, as well as on the basis of patient age. Stratification was performed by using the electronic patient information and biopsy databases maintained by the Breast Imaging Program. Once the cases were stratified, sampling was performed at random within strata. Differences in case availability prevented us from meeting our initial goal of having equal numbers of cases within strata for each disease status.

Original film mammograms were used in the reading study. To parallel usual clinical practice, comparison original film mammograms were also provided when available. Comparison mammograms were available for 67 cases (45%). Each set of mammograms had been obtained at low-dose screen-film mammography performed with dedicated mammography units and single-emulsion film. Each set consisted of mediolateral oblique and craniocaudal views of each breast. The index examination of a woman was defined as the one whose results led to the first biopsy in those women who underwent biopsy or as the next-to-last examination for those women who were followed up for at least 2 years and did not undergo biopsy. A comparison examination was defined as the screening examination immediately prior to the index examination.

Reading Study
All radiologists interpreted the mammograms in a controlled reading environment during two 3-hour periods. The reading was performed entirely in a room dedicated solely to the study that permitted the investigators to control ambient light. Readers traveled to a central site at which the controlled reading room was located. Eight readers participated at a time.

Case images were mounted in random sequence on dedicated mammography alternators (RADX, Houston, Tex). The only information presented to the reader was the age of the patient. Before reading, radiologists were instructed that the case set did not have the mix expected in a typical screening population (ie, about two to six cancers per 1,000 individuals). Results of pilot studies performed by the authors have established that this instruction adequately controls context bias (10) (details are available upon request to C.A.B.). Readers were oriented by means of supervised hands-on experience; they reviewed a set of practice cases before beginning the review of the study cases. The practice case set did not include any cases used for the reading study.

The reading data were immediately input to a database with laptop computers. A custom computer program operating in real time during the reading session captured the reading data described below and ensured data reliability by way of several programmed checks for completeness and inconsistency.

Readers were asked to (a) identify findings, (b) make a recommendation for further work-up, (c) report what they believed would be the result of additional work-up, and (d) give a subjective assessment of the presence of breast cancer for each case. Responses to item d were reported by using an 11-point scale (in which a score of 0 represented definitely normal and a score of 11 represented definitely cancer). Responses to item c involved use of the BI-RADS scale (11) and were used in the receiver operating characteristic (ROC) curve analysis in this study. This analysis is described later.

Reader Factors
Two surveys were used to collect data about the readers in our study. One survey was used to collect data about each individual reader and another to collect data about the facility with which the radiologist indicated affiliation. Among other things, radiologists were asked to report their recent reading volume, which is the total number of mammograms (both screening and diagnostic) read in the year prior to their participation in the study. All survey items were self reported and not independently verified.

Statistical Analysis
The qualitative characteristics (eg, sex, race) of the radiologists who participated in our study were summarized with percentages and compared—by means of the {chi}2 test—with those of radiologists who were contacted but who did not participate in the study. Quantitative characteristics (eg, age, recent reading volume) of participants were summarized with means, SDs, and ranges. The mean values for the participating and the nonparticipating radiologists were statistically compared with analysis of variance.

We also characterized our reader sample group by computing the performance characteristics (ie, sensitivity and specificity) for each reader who interpreted our case set and then summarizing the reader sample group data with the mean, median, and range of these characteristics. In our study, radiologist sensitivity was computed as the proportion of women with breast cancer who were recommended for recall by the radiologist. Radiologist specificity was computed as the proportion of women without breast cancer who were not recommended for further work-up by the radiologist.

The distribution of ages in the study cases was summarized with percentages in each disease group (with cancer or cancer free) to reflect the sampling plan used in selecting the cases.

Radiologist accuracy was measured by using the partial area under the binormal ROC curve (12,13). This measurement can be interpreted as the average sensitivity of the radiologist when he or she is reading with at least 90% specificity (1416). Technical details about the method of computation are presented in the Appendix.

Results in our sample group of radiologists yielded an estimate of quantiles of accuracy in the U.S. population of radiologists. A quantile of accuracy represents the accuracy value associated with a cumulative proportion of the population. For example, the median of a population is the value such that 0.50 of the population is less than or equal to that value. The median is, therefore, the 0.50 quantile of the population distribution.

The quantiles of accuracy estimated from our data were then used to estimate the number of radiologists who would need to be restricted from providing mammographic interpretation to increase median accuracy by 1%, 5%, and 10%. In the Appendix, we show that, to increase median accuracy by a certain amount, denoted by p%, the lower 2p% of the population has to be restricted. A detailed example of this computation is given in the next section.

Reading volume data recorded from the sampled readers were used to project the percentage reduction in service volume (ie, mammograms read per year) that would result from restriction. This was accomplished by computing the proportion of total reading volume attributed to each reader in the sample and then summing these proportions for the readers in the percentage of radiologists who would be restricted from reading mammograms in the proposed proscriptive policy.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
One hundred ten radiologists interpreted mammograms from the same 148 screening cases. Tables 1 and 2 summarize the characteristics of these radiologists. The radiologists were sampled from a large sampling frame of U.S. radiologists that we constructed for this study. About 16% of the radiologists were women. Ages ranged from 33–71 years, with a mean of 48.4 years. Almost all participants were board certified in radiology. Most (90.9%) interpreted screening mammograms on a part-time basis. The average radiologist in our sample had 13.8 years of experience reading mammograms and reported reading nearly 2,000 mammograms in the year prior to this study. There were no statistically significant differences in any of the characteristics summarized in Tables 1 and 2 between the radiologists who participated in this study and those who did not. Thus, we conclude that our sample is nationally representative.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Characteristics of 110 Participating Radiologists

 

View this table:
[in this window]
[in a new window]

 
TABLE 2. Age and Experience of Participating Radiologists

 
Table 3 summarizes the performance characteristics of this sample group of readers. Sensitivity refers to the percentage of women with breast cancer given a recommendation for further work-up. Further work-up consisted of short-term follow-up (for BI-RADS category 3 findings), additional imaging (for BI-RADS category 0 findings), or biopsy (for BI-RADS category 4 or 5 findings). Specificity refers to the percentage of women without breast cancer who were not recommended for further work-up (ie, they were given a BI-RADS category 1 or 2 rating, for which a return to normal screening is advised). The median sensitivity for our sampled radiologists was 94% (range, 59%–100%), and the median specificity was 70% (range, 35%–98%). These values fall within the range of other published performance data (1,4,5,9) and provide assurance that our reading protocol was scientifically appropriate.


View this table:
[in this window]
[in a new window]

 
TABLE 3. Interpretive Performance Characteristics of Sampled Radiologists

 
Table 4 summarizes the age distribution and disease status in our 148-case sample. Although we attempted an equal split, our sampling resulted in a case mix enriched with 64 (43%) cases of cancer. Ages ranged from 40 to 85 years, with a mean of 58 years. Women with cancer tended to be older than women without cancer (P = .011, {chi}2 test). This situation reflects differences in the availability of original mammograms after the already age-stratified cases were randomly selected. The mammograms of younger women with breast cancer tended more often to be in clinical use than the mammograms of older women with breast cancer.


View this table:
[in this window]
[in a new window]

 
TABLE 4. Age Distribution and Disease Status in Case Sample

 
Figure 1a depicts our estimate of the distribution of accuracy in the U.S. population of radiologists on the basis of the quantiles of accuracy observed in our study sample. Values of the partial ROC area are plotted along the horizontal axis, and quantiles are plotted along the vertical axis. We see, for example, that the sample values ranged from less than 0.40 (with an associated quantile of 0.0 on the vertical axis) to slightly less than 0.90 (with an associated quantile of 1.0 on the vertical axis). Figure 1a also demonstrates the use of such a plot to estimate median accuracy in the U.S. population of radiologists. The median is, by definition, the 50th percentile of the distribution. This is represented by the value of 0.50 on the vertical axis of the graph. Following the arrow lines across and down the graph, we observe that the partial ROC area of approximately 0.66 is the median value of accuracy in the sample.



View larger version (14K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1a. (a) Graph shows quantiles of accuracy in U.S. radiologists (as extrapolated from data in the study sample). Median is 0.50 quantile. By following the horizontal line to the right, where it intersects the graph, and then down to where it intersects the horizontal axis, we estimate median accuracy (measured as partial ROC area of the reader) in U.S. radiologists to be approximately 0.66. (b) Graph shows implications of health care policy goal of increasing median accuracy by 10%. To move the population median from 0.66 to 0.76 (an amount indicated by the right-pointing arrow under x axis), we follow the arrows upward to the distribution and then to the left to determine that the target value of accuracy of 0.76 is approximately the 0.75 quantile in U.S. radiologists. That is, about 75% of U.S. radiologists have an accuracy (partial ROC area) less than or equal to 0.76. Thus, to achieve an increase in median accuracy of 10%, we must shift median upward from 0.50 to 0.75 (ie, across about 25% of the population). The fact that to accomplish this shift requires restricting 50% of the currently practicing U.S. radiologists from interpreting screening mammograms is outlined in the Appendix.

 


View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1b. (a) Graph shows quantiles of accuracy in U.S. radiologists (as extrapolated from data in the study sample). Median is 0.50 quantile. By following the horizontal line to the right, where it intersects the graph, and then down to where it intersects the horizontal axis, we estimate median accuracy (measured as partial ROC area of the reader) in U.S. radiologists to be approximately 0.66. (b) Graph shows implications of health care policy goal of increasing median accuracy by 10%. To move the population median from 0.66 to 0.76 (an amount indicated by the right-pointing arrow under x axis), we follow the arrows upward to the distribution and then to the left to determine that the target value of accuracy of 0.76 is approximately the 0.75 quantile in U.S. radiologists. That is, about 75% of U.S. radiologists have an accuracy (partial ROC area) less than or equal to 0.76. Thus, to achieve an increase in median accuracy of 10%, we must shift median upward from 0.50 to 0.75 (ie, across about 25% of the population). The fact that to accomplish this shift requires restricting 50% of the currently practicing U.S. radiologists from interpreting screening mammograms is outlined in the Appendix.

 
Figure 1b visually demonstrates how we used this distribution to estimate the percentage of radiologists who would have to be restricted from interpreting mammograms to increase the median accuracy in the United States by 10% (from 0.66 to 0.76). (More precise numbers appear in the following paragraph.) From Figure 1b it can be seen that to move the median from 0.66 to 0.76 requires shifting the middle of the distribution upward to where the vertical arrow intersects the plot. By following the left-facing arrow, it can be seen that this action equates to shifting the middle of the distribution to the value that is currently approximately the 75th percentile in the population (ie, 0.75 on the vertical axis). In other words, this health care policy goal requires shifting the median up past 25% of the data. As shown in the Appendix, for each 1% increase in median accuracy desired, 2% of the population must be eliminated from service. Therefore, our data suggest that to increase the median accuracy in the United States by 10% by using proscription would require the restriction of 50% (two times the desired 25%) of presently active interpreting radiologists.

Figure 2 summarizes the implications of the previous analysis for proscriptive health care policies designed to achieve various improvements in median accuracy: To increase median accuracy by 1% (from 66% to 67%) would require restricting 11% of U.S. radiologists in the lowest quantile for accuracy (about 2,200 in an approximate U.S. population of 20,000 radiologists who interpret mammograms) and would result in a reduction in yearly service volume of approximately 10%. An increase in median accuracy of 5% (to 71%) would require restricting 30% of radiologists (about 6,000 physicians), with an accompanying volume reduction of 25%. An increase in median accuracy of 10% (to 76%) would require the restriction of 57% of practicing U.S. radiologists (about 11,400 physicians) and would diminish the national service capacity by 50%.



View larger version (19K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Implications of proscriptive health care policy.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Our data and analysis provide objective evidence that proscriptive health care policies based on accuracy would likely turn out to be prohibitively costly and inefficient methods for improving screening mammogram interpretation in the United States. Results of our analysis suggest that a dramatic reduction in access to mammography would result from such health care policy. To be successful, there would first need to be in place a preexisting system to redistribute a large volume of mammography interpretation services to those radiologists who pass the qualifying examination. These radiologists would also have to be willing and able to pick up the slack. Given the many disincentives to read mammograms already, the requirements of having to qualify and then take on a much increased caseload would likely lead to increased radiologist attrition. If great enough, such attrition could create a cascade of increased reading demand for those radiologists wishing to continue interpreting mammograms and, hence, even greater disincentive.

Efforts should focus on improving the tools and skills of currently practicing radiologists. In addition, more advanced practice management paradigms ought to be considered. For example, one management paradigm might be to achieve the goal of maintaining access to quality mammography through the planned redistribution of caseload within the facility. With this strategy, mammographic interpretation would be shifted away from those radiologists who did not meet the proscriptive threshold for performance to those who did. The former individuals would then pick up the resulting slack in other modalities and disease areas. The mammography service volume provided by the facility would be maintained. Another way to implement this management goal would be to require double reading of cases initially read by radiologists who did not meet the proscriptive threshold. The radiologist performing the second reading would be provided with appropriate acknowledgment of status and remuneration for their senior role. Some of the data required for this management paradigm should already be available, since current Mammography Quality Standards Act rules require each mammography facility to track outcomes of abnormal mammograms separately for each radiologist. This approach to auditing (ie, at the level of the radiologist) should be extended beyond collection of biopsy outcomes to include additional clinically useful performance parameters (eg, cancer detection rate, recall rate, etc).

Our study had several limitations that must be kept in mind when interpreting our findings. We did not attempt to measure or account for statistical error or uncertainty in our estimates and, hence, in our findings. Although this is a limitation of our study, we believe this was appropriate because the goal of this analysis was to investigate the potential implications of proscriptive health care policies. We believe that the point we make with this analysis is sufficient to dissuade serious consideration of proscriptive policies. However, should it be decided that such policies deserve further consideration, we point out that the next step must then be to specify the desired increase in median accuracy. This specification is, to say the least, enormously difficult because it involves specifying universally acceptable societal-level valuations of health care outcome. It is beyond the intended scope of this article to offer such a valuation.

One might also be concerned about the representativeness of our findings. However, we have provided data confirming that the physician sample was representative of the entire U.S. population of interpreting radiologists. Our case set was randomly sampled and hence should be representative of cases in typical screening populations, having approximately equal representation of women from each of the three age groups. Of course, the case mix was enriched with cancers, but this should not bias our assessment of ROC curves because such measurements are conditional on disease status and, therefore, yield estimates of sensitivity and specificity in an independent and unbiased fashion. One might be concerned with how well our measurement of accuracy reflects actual screening performance in the field. This is indeed a concern and a limitation of any study in which performance measured in the laboratory is used to estimate performance achieved in real practice. Finally, recent reading volume was used in our projections of future effect on service volume. Therefore, any changes in reading volume that occurred immediately after our study are not incorporated into our estimate of the effect of the proscriptive health care policy on total service capacity. For example, three randomly selected radiologists reported zero reading volume because one had been a resident and two had just returned to performing mammography. Our projection of the loss to total service volume from the restriction of any of these three radiologists would therefore be zero as well.

On the other hand, the experimental methods we used are well established and yield what might be considered to be the most optimistic appraisal because extraneous sources of variation were minimized. This is because our experimental conditions optimized reader performance—the radiologists read in optimal lighting conditions with state-of-the-art mammography alternators. They could focus entirely on the task and were not interrupted with the usual things that interfere with a radiologist’s concentrated reading of mammograms. Furthermore, our use of the ROC curve controlled for any overreading or underreading that the radiologists might have performed in response to being subjects in an experiment. Thus, we believe our data capture the essential state of practice. And, despite these conditions, the spread in average sensitivity among radiologists was great enough to provide compelling evidence against proscriptive health care policy approaches for improving mammographic interpretation in the United States.

It is also important to point out that average sensitivity in our study referred to average sensitivity in the context of screening. In screening, the central decision is whether or not to conduct additional work-up (ie, the callback decision). It is not the goal of screening mammographic interpretation to provide a definitive diagnosis or to recommend biopsy without further consideration. Thus, a true-positive result in screening occurs whenever a woman with breast cancer is given a recommendation for additional work-up. However, in our study, determination of true-positive results was performed without reference to correct localization of the cancer by the radiologist, and, therefore, our estimates of radiologist screening sensitivity might be positively biased.

Authors of articles published within the past 5 years have reported that there is appreciable variability in the interpretative skills of the radiologist reading mammograms (2,3,9). This finding has even reached the lay press. In a recent front-page New York Times article (17), radiologists were cited as the weak link in mammographic screening programs. The result of these developments is, naturally, a desire to implement far-reaching interventions to improve the interpretation of screening mammograms for all American women. Logically, two options are available: One option is to restrict radiologists from interpreting screening mammograms on the basis of their skill (proscription), and the other option is to develop interventions targeted at improving the skills of those practicing radiologists most in need of training (prescription). On the basis of our study data, our conclusion is that prescription is by far preferable to proscription because the cost of the former, in terms of reduced access to mammography for American women, is, we believe, unacceptable. In sum, we conclude that efforts should focus on improving the tools and skills of practicing radiologists while maintaining the access of American women to screening mammography.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Technical Details of Partial ROC Curve Area Estimation
We fit an ROC curve to each radiologist by using the binormal model of Metz (12). The algorithm for this fitting is that described by Beam (18). Using this fitted ROC curve, we then estimated the area under the partial ROC curve over the interval (0,0.1) for each radiologist by using an approximate integral function provided by S-Plus (Seattle, Wash). The supplied function implements adaptive 15-point Gauss-Kronrod quadrature based on the Fortran functions dqage and dqagie from QUADPACK (19) in NETLIB (20). The integrand was a suitably normalized Gaussian density function, having parameters reflective of the individual radiologist’s binormal ROC curve slope and intercept.

Determining Population Percentage That Would Need to be Restricted to Achieve Health Care Policy Goals
We first establish some terminology (Figure A1). Problem: We wish to move the median accuracy of the parent population upward by means of the restriction of a lower percentage of the population. What percentage should be restricted?



View larger version (17K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure A1. Graph depicts concepts and defines terminology involved in determining percentage of population that must be restricted to achieve health care policy goals expressed in terms of median accuracy.

 
Solution: Let Q.50 represent the median (50%) of the parent population, and let Qp represent the quantile of the parent population that is to become the new median. Let N represent the size of the parent population. If Qp is to be the new median, then N(1 - p) will represent 50% of the new population so that 2N(1 - p) is the size of the new population. That is, 2N(1 - p) is the part of the parent population that is to be retained.

Therefore, [N - 2N(1 - p)]/N = 1 - 2(1 - p) is the proportion of the parent population that must be restricted to move the median from Q.50 to Qp.

Now, let d = p - .50, where d represents the amount by which we wish to shift the population median, in terms of proportions in the parent population. And, therefore, p = .50 + d.

Then, the quantity 1 - 2(1 - p) can be reexpressed as

In other words, to shift the median upward by the amount d(100%), the lower 2d(100%) of the parent population must be restricted.


    ACKNOWLEDGMENTS
 
We acknowledge Charles Metz, PhD, for his insight that, to increase a median by d%, 2d% must be eliminated from the lower tail of the distribution. We also acknowledge the generous support of RADX for providing the mammography alternators.


    FOOTNOTES
 
Abbreviations: BI-RADS = Breast Imaging Reporting and Data System, FDA = Food and Drug Administration, ROC = receiver operating characteristic

Author contributions: Guarantor of integrity of entire study, C.A.B.; study concepts, C.A.B., E.F.C., E.A.S.; study design, C.A.B.; literature research, C.A.B., S.P.W.; data acquisition, C.A.B.; data analysis/interpretation, C.A.B., E.F.C., E.A.S.; statistical analysis, C.A.B.; manuscript preparation and definition of intellectual content, C.A.B.; manuscript editing, revision/review, and final version approval, all authors


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 

  1. Esserman L, Cowley H, Eberle C, et al. Improving the accuracy of mammography: volume and outcome relationships. J Natl Cancer Inst 2002; 94:369-375.[Abstract/Free Full Text]
  2. Sickles EA, Wolverton DE, Dee KE. Performance parameters for screening and diagnostic mammography: specialist and general radiologists. Radiology 2002; 224:861-869.[Abstract/Free Full Text]
  3. Kan L, Olivotto IA, Warren Burhenne LJ, Sickles EA, Coldman AJ. Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. Radiology 2000; 215:563-567.[Abstract/Free Full Text]
  4. Nodine CF, Kundel HL, Mello-Thoms C, et al. How experience and training influence mammography expertise. Acad Radiol 1999; 6:575-585.[CrossRef][Medline]
  5. Beam CA, Conant EF, Sickles EA. Association of volume and volume-independent factors with accuracy in screening mammograms. J Natl Cancer Inst 2003; 95:282-290.[Abstract/Free Full Text]
  6. National Health Service Breast Screening Radiologists Quality Assurance Committee. Quality assurance guidelines for radiologists Sheffield, England: NHSBSP Publications, 1997. National Health Service Breast Screening Programme Publication no. 15.
  7. U.S. Department of Health and Human Services. An overview of the final regulations implementing the Mammography Quality Standards Act of 1992 Rockville, Md: U.S. Department of Health and Human Services, 1997; 16-19.
  8. Elmore JG, Carney PA. Does practice make perfect when interpreting mammography? J Natl Cancer Inst 2002; 94:321-323.[Free Full Text]
  9. Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists: findings from a national sample. Arch Intern Med 1996; 156:209-213.[Abstract]
  10. Egglin TK, Feinstein AR. Context bias: a problem in diagnostic radiology. JAMA 1996; 276:1752-1755.[Abstract]
  11. D’Orsi CJ, Bassett LW, Deig SA, et al. Illustrated Breast Imaging Reporting and Data System: illustrated BI-RADS 3rd ed. Reston, Va: American College of Radiology, 1998.
  12. Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21:720-733.[Medline]
  13. Hanley JA, McNeil BJ. The meaning and use of the area under an ROC curve. Radiology 1982; 143:29-35.[Abstract/Free Full Text]
  14. Wieand S, Gail MH, James KL, James BR. A family of nonparametric statistics for comparing diagnostic tests with paired or unpaired data. Biometrika 1989; 76:585-592.[Abstract/Free Full Text]
  15. McClish DK. Analyzing a portion of the ROC curve. Med Decis Making 1989; 9:190-195.
  16. Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996; 201:745-750.[Abstract/Free Full Text]
  17. Moss M. Spotting breast cancer: doctors are weak link. The New York Times 2002; Jun 27:sect A:1(col 1).
  18. Beam CA. Two stage ROC curve regression model when sampling a population of diagnosticians. Proc SPIE 2002; 4686:236-246.[CrossRef]
  19. Piessens R, DeDoncker-Kapenga E, Uberhuber C, Kahaner D. QUADPACK: a subroutine package for automatic integration Berlin, Germany: Springer, 1983.
  20. Dongarra JJ, Grosse E. Distribution of mathematical software via electronic mail. Commun ACM 1987; 30:403-407.[CrossRef]



This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
E. S. Burnside, J. M. Park, J. P. Fine, and G. A. Sisney
The Use of Batch Reading to Improve the Performance of Screening Mammography
Am. J. Roentgenol., September 1, 2005; 185(3): 790 - 796.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
W. E. Barlow, C. Chi, P. A. Carney, S. H. Taplin, C. D'Orsi, G. Cutter, R. E. Hendrick, and J. G. Elmore
Accuracy of Screening Mammography Interpretation by Characteristics of Radiologists
J Natl Cancer Inst, December 15, 2004; 96(24): 1840 - 1850.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Beam, C. A.
Right arrow Articles by Weinstein, S. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beam, C. A.
Right arrow Articles by Weinstein, S. P.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE