Radiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


DOI: 10.1148/radiol.2303030297
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Eng, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Eng, J.
(Radiology 2004;230:606-612.)
© RSNA, 2004


Statistical Concepts Series

Sample Size Estimation: A Glimpse beyond Simple Formulas1

John Eng, MD

1 From the Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University, Central Radiology Viewing Area, Room 117, 600 N Wolfe St, Baltimore, MD 21287. Received February 21, 2003; revision requested April 10; revision received July 18; accepted July 21. Address correspondence to the author (e-mail: jeng@jhmi.edu).


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
Small increments in the complexity of clinical studies can readily take sample size estimation and statistical power analysis beyond the capabilities of simple mathematic formulas. In this article, the method of simulation is presented as a general technique with which sample size may be calculated for complex study designs. Applications of simulation for determining sample size requirements in studies involving correlated data and comparisons of receiver operating characteristic curves are discussed.

© RSNA, 2004

Index terms: Radiology and radiologists, research • Receiver operating characteristic (ROC) curve • Statistical analysis


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
In a previous article in this series (1), I discussed the fundamental concepts involved in determining the appropriate number of subjects that should be included in a clinical investigation. This number is known as the sample size. In the earlier article (1), the necessity for considering sample size, how certain study design characteristics affect sample size (Fig 1), and how to calculate sample size for several simple study designs were discussed. Also discussed was how sample size is related to statistical power, which is the sensitivity of detecting a statistically significant difference in a comparative study when a difference is truly present.



View larger version (41K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Study design characteristics that affect sample size and statistical power.

 
In this article, I will first discuss some important consequences of sample size and power calculations, then go on to discuss issues that arise when these basic principles are applied to real clinical investigations, which are often more complex than the simple situations covered in the previous article and in introductory biostatistics textbooks. My intent is to provide an overview and appreciation of some of the advanced statistical methods for handling some of the complex situations that arise. Since advanced statistical methods for sample size or power calculations cannot receive comprehensive treatment in the setting of an overview article, an investigator needing such methods is advised to seek help from a statistician early in the research project. However, I hope the material herein will at least help bridge the knowledge gap between investigator and statistician so that their interaction can be more productive.


    CONSEQUENCES OF SAMPLE SIZE CALCULATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
Academic and Ethical Importance
In conjunction with a well-defined research question (2), an adequate sample size can help ensure an academically interesting result, whether or not a statistically significant difference is eventually found in the study. The investigator does not have to be overly concerned that the study will only be interesting (and worth the expenditure of resources) if its results are "positive." For example, suppose a study is conducted to see if a new imaging technique is better than the conventional one. Obviously, the study would be interesting if a statistically significant difference was found between the two techniques. But if no statistically significant difference is found, an adequate sample size allows the investigator to conclude that no clinically important difference was found rather than wonder whether an important difference is being hidden by an inadequate sample size.

An inadequate sample size also has ethical implications. If a study is not designed to include enough individuals to adequately test the research hypothesis, then the study unethically exposes individuals to the risks and discomfort of the research even though there is no potential for scientific gain. Although the connection between research ethicsand adequate sample size has been recognized for at least 25 years (3), the performance of clinical trials with inadequate sample sizes remains widespread (4).

Practical Consequences of Mathematic Properties
A more intuitive understanding of the determinants of sample size can be obtained through closer inspection of the formulas for sample size. We saw in the previous article (1) that when the outcome variable of a comparative study is a continuous value for which means are compared, the appropriate sample size (5) is given by

where N is the total sample size (ie, the total of the two comparison groups), D is the smallest meaningful difference between the two means being compared, {sigma} is the SD of each group, and zcrit and zpwr are constants determined by the specified significance criterion (eg, .05) and desired statistical power (eg, .8), respectively. Since zcrit and zpwr are independent of the properties of the data, sample size depends only on the ratio between the smallest meaningful difference and the SD (Fig 2).



View larger version (19K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Graph shows the relationship between sample size and the ratio of meaningful difference to SD for studies in which means are compared. The graph was created by using Equation (1) and illustrates the fact that sample size increases exponentially as the ratio decreases.

 
Furthermore, because the ratio is in an inverse exponential relationship to sample size, anything that can be done to decrease the SD or increase the meaningful difference can substantially reduce the required sample size. The SD could be decreased by reducing measurement variability (eg, by using more precise instruments or procedures) and/or by selecting a more homogeneous study population. The meaningful difference could be increased by employing more sensitive instruments or procedures.

Another property of the comparison of means is that for a given SD, only the arithmetic difference between the comparison groups affects the sample size. For example, the sample size would be the same for detecting a systolic blood pressure difference of 10 mm Hg whether it is to be measured in normotensive individuals (eg, 110 vs 120 mm Hg) or hypertensive individuals (eg, 170 vs 180 mm Hg).

When proportions are being compared—a common task in clinical imaging research—the sample size depends on both the smallest meaningful difference between the proportions and the size of the proportions themselves. That is, when proportions are being compared, in contrast to when means are being compared, the sample size depends not just on the difference alone. The sample size increases dramatically as the meaningful difference between proportions is made smaller (Fig 3). The sample size also increases if the two proportions being compared (ie, the mean of the two proportions) are close to 0.5.



View larger version (61K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3a. (a) Graph shows relationship between sample size and proportions being compared in a study involving comparison of proportions. Sample size increases dramatically as the meaningful difference decreases. Sample size also increases if the proportions being compared (ie, the mean of the two proportions) are near 0.50. (b) Extension of the circled corner of the graph in a, with x and y axes magnified; this corner corresponds to a region that is of particular interest to the design of clinical investigations.

 


View larger version (74K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3b. (a) Graph shows relationship between sample size and proportions being compared in a study involving comparison of proportions. Sample size increases dramatically as the meaningful difference decreases. Sample size also increases if the proportions being compared (ie, the mean of the two proportions) are near 0.50. (b) Extension of the circled corner of the graph in a, with x and y axes magnified; this corner corresponds to a region that is of particular interest to the design of clinical investigations.

 
Retrospective Power Analysis
In sample size calculations, appropriate values for the smallest meaningful difference and the estimated SD are often difficult to obtain. Therefore, the formulas are sometimes applied after the study is completed, when the difference and SD actually observed in the study can be substituted in the appropriate sample size formula. Since sample size is also known after the study is completed, the formula will yield statistical power. In this case, power refers to the sensitivity of the study to enable detection of a statistically significant difference of the magnitude observed in the study. This activity, known as retrospective power analysis, is sometimes performed to aid in the interpretation of the statistical results of a study. If the results were not statistically significant, the investigator might explain the result as being due to a low power.

However, it can be shown that the retrospective power—essentially an observed quantity—is inversely related to the observed P value (6). The retrospective power tends to be large in any study with a small (statistically significant) observed P value. Conversely, the retrospective power tends to be small in any study with a large (statistically insignificant) observed P value. Therefore, the observed retrospective power cannot provide any information in addition to the observed P value (7,8). The important point is that the smallest meaningful difference is not the same as the observed difference: The former must be set before the study is conducted and is not determined after the study is completed.

Even though calculating the retrospective power is problematic, it remains important to consider the issue of adequate sample size when one is faced with a study whose results indicate there is no difference between comparison groups. Fortunately, several statistical approaches are available to guide the reader in terms of whether or not to "believe" a study that yields negative results (9). These approaches involve calculating CIs or performing {chi}2 tests.


    USE OF SIMULATION TO DETERMINE SAMPLE SIZE FOR COMPLEX STUDY DESIGNS
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
In contrast to the importance of considering sample size and statistical power, relatively few formulas exist for calculating them (10). The most simple formulas, such as those discussed previously (1) and in many introductory biostatistics textbooks, concern the estimation and comparison of means and proportions and are, fortunately, applicable to many situations in clinical radiology research. Beyond these simple formulas, methods have been established to determine sample size for general fixed-effect linear statistical models (of which the t test, ordinary linear regression, and analysis of variance are special cases), two-way contingency tables (of which the analysis of a 2 x 2 table with the {chi}2 test is a special case), correlation coefficient analysis, and simple survival analysis (10). Approximations exist for some other statistical models, most notably logistic regression, but the accuracy of these approximations may be difficult to establish in all situations.

Thus, the list of all statistical tests for which exact sample size calculation methods exist is much smaller than the list of all statistical tests. When no formula exists, as often happens for moderately complex statistical designs, the investigator may try to perform a sample size analysis for a simplified version of the study design and hope that the sample size can be extrapolated to the actual (more complex) study design being planned.

For situations without corresponding formulas, it is becoming more common to estimate sample size by using the technique of simulation (11). The simulation approach is powerful because it can be applied to almost any statistical model, regardless of the model’s complexity. In simulation, a mathematic model is used to generate a synthetic data set simulating one that might be collected in the study being planned. The mathematic model contains the dependent and independent variables being measured, along with estimates of each variable’s SD. The synthetic data set contains the same number of subjects as the planned sample size.

The planned statistical analysis is performed with this synthetic data set, and a P value is determined. As usual, the null hypothesis is rejected if the P value is less than a certain criterion value (eg, P < .05). This process is repeated a large number of times (perhaps hundreds or thousands of times) by using the mathematic model to generate a different synthetic data set for each iteration. The statistical power is equal to the percentage of these data sets in which the null hypothesis is rejected. In effect, simulation employs a mathematic model to virtually repeat the study an arbitrarily large number of times, allowing the statistical power to be determined essentially by direct measurement.

Since a real data set would contain random statistical error, random statistical error must be modeled in the synthetic data sets. To accomplish this in simulation, a random-number generator is used to add random error ("noise") to each synthetic data set. Because of their heavy reliance on random-number generators, simulation methods are also known as Monte Carlo methods, after the city in which random numbers also play an important role.

Let us consider a simple example. Suppose we are planning a clinical study to compare the contrast-to-noise ratio (CNR) between two magnetic resonance imaging pulse sequences used to examine each subject in a group of subjects. We would like to know the statistical power of the study to detect a smallest meaningful CNR difference of 2. We would like to plan a study with a power of .8 for detecting this smallest meaningful difference. We have resources to comfortably recruit and evaluate approximately 12 subjects. Suppose that from our previous experience with the pulse sequences, we estimate the SD of the CNR difference to be 4.

The statistical model for this study is

where CNRi is the observed CNR difference for subject i (of the 12 subjects), D is the true CNR difference (in this example, 2), and {epsilon}i is the random error associated with the observation of subject i. To run the simulation, we use a normally distributed random-number generator for {epsilon}i that generates a different normally distributed random number for each of the 12 observations. The mean of the numbers generated by the random number generator is 0 and the SD is 4, which we estimated on the basis of previous experience. With these 12 random numbers, we can generate a synthetic data set of 12 observations by using Equation (2). The simulated data set is then subjected to a t test, and the resulting P value is recorded.

The entire simulation process is then repeated, say, 1,000 times. The P value is recorded after each iteration. After completing the iterations, the P values are examined to determine what proportion of the iterations resulted in the detection of a statistically significant difference (indicated by P < .05); this proportion is equal to the power. The simulation for this example was performed with Stata version 7.0 (Stata, College Station, Tex), and the results are shown in the first line of the Table. In this example, the null hypothesis is rejected in 343 of the 1,000 iterations. Therefore, the statistical power of the t test, given the conditions of this example, is .34 (Table).


View this table:
[in this window]
[in a new window]

 
Results of Simulations of Hypothetical Study in Which Difference between Two Imaging Techniques Is Being Compared within Each Subject

 
Obviously, it would have been easier to use the formula for comparison of means. But the advantage of simulation is the ability to consider more complex statistical models for which there are no simple sample size formulas. This ability is especially important because seemingly small changes in study design can cause the simple sample size formulas to become invalid.

Returning to the example, we note that the estimated power of our study is lower than desired. The only way to improve the power, given our assumptions, is to increase the number of observations. (For the moment, we only have resources to study 12 subjects.) So, we decide to make four measurements of CNR difference per subject. This strategy will increase the number of observations by a factor of four and will result in an increase in power. However, it is important to realize that this data collection strategy is not the same as increasing the number of subjects by a factor of four, because the four observations within each subject are not independent of one another. Within each subject, the observations are likely to be more similar to each other than to the observations in the other subjects. In statistical terms, this lack of independence is called correlation.

Because of correlation, an additional observation in the same subject does not provide as much additional information as an additional observation in a different subject. The more similar the observations within each subject are, the less additional information will be provided by the repeated observation. If the observations within each subject are identical (100% correlated), then the study would have the same results (and sample size) as it would without the repeated observations, so there would be no benefit from repeating the measurement for the same subjects. Conversely, if the repeated observations within each subject were completely uncorrelated (0% correlation), then the results (and sample size) would be identical to those of a study with the same total number of observations but with enough additional subjects that only one observation per subject is used.

Simulation can easily account for the correlation of the four observations within each subject. The statistical model used is a slight variation of Equation (2):

where CNRij is the observed CNR for subject i (of the 12 subjects) and repetition j (of the four repetitions), D is the true CNR difference, and {epsilon}ij is the random error associated with each of the 48 observations. As in Equation (2), {epsilon}ij is generated by a normally distributed random-number generator having a mean of 0 and an SD of 4. In Equation (3), however, {epsilon}ij is calculated in such a way that each error term {epsilon}ij is correlated with the other error terms within the same subject. Correlation of the error terms is the mathematic mechanism for generating correlation in the observations. The amount of correlation is indicated by the correlation coefficient {rho}. In this example, we assume a moderate amount of correlation ({rho} = 0.5) between observations made within each subject. The results of the simulation are shown in the Table.

With an ordinary t test, there appears to be enough power in the proposed study design (Table). But an ordinary t test is inappropriate in this case because it treats each of the 48 observations as independent, ignoring the correlation between the four observations within each subject. An appropriate method that accounts for correlation is linear regression with an adjustment for clustering. When this type of linear regression is applied instead of the t test, the simulation reveals that the power is actually .5 (Table), which is lower than the desired power of .8. Results of further simulations indicate that increasing the number of subjects from 12 to 22 would result in adequate power (Table).

A discussion of statistical tests that adjust for correlation within subjects is beyond the scope of this article. However, without a simple formula for sample size, and even without extensive knowledge of the statistical test to be used, simulation still enabled the accurate determination of power in the preceding example; this demonstrates the utility and generalizability of simulation. In addition, the effects of the use of potentially inappropriate statistical analyses were also able to be examined.

One of the barriers to performing simulation is the requirement of iterative computation, which in turn requires fast computers. This barrier is becoming much less important as the speed of commonly available computers continues to increase. Even when the barrier of computational speed is overcome, simulation is successful only if the assumed statistical model accurately describes the study design being evaluated. Therefore, appropriate attention must be paid to establishing the model’s validity. Fortunately, it is often easier to develop a mathematic model for a statistical situation (from which it is a straightforward process to determine power and sample size with simulation) than to search for a specific method or formula, if one even exists, for calculating sample size. In the preceding example, the introduction of correlation substantially increased the complexity of the analysis from a statistical point of view but caused only a minor change in the mathematic model and the subsequent simulation.


    SAMPLE SIZE CALCULATIONS FOR READER STUDIES
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
A reader study involving the calculation of receiver operating characteristic (ROC) curves is another kind of study design that is fairly common in radiology and for which sample size and statistical power are difficult to determine. The area under the ROC curve (Az) is commonly used as an indicator of the accuracy of a diagnostic test. A typical ROC study involves multiple readers interpreting images obtained in the same group of subjects who have all undergone imaging with two or more imaging techniques or conditions. The purpose of the study is to compare the imaging techniques. The difficulty in determining sample size and statistical power is a result of the fairly complicated computational process required to calculate Az and the complicated correlations among the observations. In such a study, each reader generates multiple observations, and, likewise, each subject has a part in multiple observations. Therefore, correlation can occur simultaneously among the observations (readings) within the same reader and among the observations within the same subject.

One approach to sample size analysis in complex ROC studies involves an approximation performed by using the F distribution (12,13). Sample size tables created by using this method have been published (14); this method can also be used to calculate sample sizes for situations not addressed by such tables (Appendix). The method may be used to examine the trade-off between sample size, smallest meaningful difference, and number of readers (Fig 4). For most clinical investigations, it is likely to be difficult to include more than 10 readers or 100 cases. Given these constraints, we see that any ROC study will require at least four readers, even with a large meaningful difference of 0.15 in Az. At the other extreme, the smallest meaningful difference in Az that can be detected with 10 readers and 100 cases is 0.07. These two generalizations are based on many assumptions (Fig 4). More cases or readers are required if the interobserver variability or intraobserver variability is higher than assumed. Fewer cases or readers are required if the average Az (ie, accuracy of the readers) is higher than assumed.



View larger version (53K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Graph shows the relationship between sample size, smallest meaningful difference, and number of readers in an ROC study in which Az is used as the index of accuracy. Sample sizes were calculated by using the method described in the Appendix. The values used for all variables except J and {Delta} are the same as those in the example in Table A1.

 
Another major method for analyzing data from an ROC study with multiple readers and multiple cases is the jackknifed pseudovalue method (15). In this method, the data are mathematically transformed into pseudovalues that can be analyzed by using a relatively straightforward analysis of variance. Reducing the problem of ROC analysis to a more manageable analysis of variance is a strength of the pseudovalue method. A disadvantage is the lack of exact, or even approximate, formulas for determining sample size and statistical power. The performance and validity of the pseudovalue method have been examined with simulations (16,17), so simulation could also provide a viable method for determining sample size and power for the pseudovalue method.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
In contrast to the wide variety of statistical tools available to the clinical investigator, relatively few formulas exist for the exact calculation of the sample size and statistical power of a given study design. As demonstrated by the example of correlated data, a frequent occurrence in clinical research, it is relatively easy to construct a study design for which no simple formula exists. The availability of fast computers makes the iterative process of simulation a viable general method for performing sample size and power analysis for complex study designs.

At first glance, simulation may appear artificial and therefore suspicious because it relies on an equation and many assumptions about the terms in the equation, particularly the terms related to the variability of the components of the model. It should be noted, however, that similar (although perhaps less complex) mathematic models are the foundation of most statistical analyses—even simple ones like comparing means with a t test. Furthermore, estimates of variance are also required in sample size and power analysis for simple analyses like the t test. The reason for the large number of assumptions in simulations has more to do with the complexity of the data set being simulated than the method of simulation itself.

In addition to the factors usually mentioned as affecting sample size, correlation among observations within groups due to nonindependent sampling can also increase sample size and decrease statistical power. Therefore, when planning the sample size, one should take care to account for potential correlation in the data set.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 
Although tables for the determination of sample size in ROC studies are available (14), a practical presentation of the equations underlying these tables may be helpful for situations not addressed by the published tables. The equations necessary for calculating sample size in an ROC study that involves a number of readers interpreting images obtained in the same group of subjects who have all undergone imaging with two techniques are as follows (13,14):



and

Note that Equations (A1) and (A3) have been algebraically rearranged from their published form to isolate the dependent variables for more convenient calculation. All symbols are defined in Table A1. To calculate sample size with these equations, first assign values to the variables in the first section of Table A1, then sequentially substitute the values into Equations (A1)–(A4), using the suggested values of the variables in the second section of Table A1 and values from Tables A2 and A3 where indicated.


View this table:
[in this window]
[in a new window]

 
TABLE A1. Definition of Variables in Calculation of Sample Size for ROC Study in Which a Number of Readers Interpret Same Set of Cases Obtained by Using Two Different Imaging Techniques

 

View this table:
[in this window]
[in a new window]

 
TABLE A2. Noncentrality Parameter ({lambda}) of the Noncentral F Distribution Corresponding to a Significance Criterion ({alpha}) of .05 and a Power of .8

 

View this table:
[in this window]
[in a new window]

 
TABLE A3. Multiplier for Converting the Range of a Set of Observations into an Estimate of the Variance

 
For example, Table A1 shows all the values involved in the calculation of sample size for a study that includes four readers and is designed to examine the Az difference between two imaging techniques. On the basis of preliminary study results, the expected average Az ({theta}) of the two techniques is 0.75. The smallest meaningful difference ({Delta}) between the Az values for the two techniques is set to 0.15. Each reader interprets each case once (K = 1), and the study involves an equal number of positive and negative cases (R = 1). The difference in Az (wb) between the most accurate and least accurate observers is estimated to be 0.05. The values for ww, r1, r2, r3, and rb given in Table A1 are those suggested by published reports (14). The calculated sample size (N) is 76.


    FOOTNOTES
 
Abbreviations: Az = area under ROC curve, CNR = contrast-to-noise ratio, ROC = receiver operating characteristic


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 CONSEQUENCES OF SAMPLE SIZE...
 USE OF SIMULATION TO...
 SAMPLE SIZE CALCULATIONS FOR...
 CONCLUSION
 APPENDIX
 REFERENCES
 

  1. Eng J. Sample size estimation: how many individuals should be studied? Radiology 2003; 227:309-313.[Abstract/Free Full Text]
  2. Eng J, Siegelman SS. Improving radiology research methods: what is being asked and who is being studied? Radiology 1997; 205:651-655.[Free Full Text]
  3. Newell DJ. Type II errors and ethics (letter). BMJ 1978; 4:1789.
  4. Halpern SD, Karlawish JHT, Berlin JA. The continuing unethical conduct of underpowered clinical trials. JAMA 2002; 288:358-362.[Abstract/Free Full Text]
  5. Rosner B. Fundamentals of biostatistics 5th ed. Pacific Grove, Calif: Duxbury, 2000; 308.
  6. Lenth RV. Some practical guidelines for effective sample size determination. Am Stat 2001; 55:187-193.[CrossRef]
  7. Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat 2001; 55:19-24.[CrossRef]
  8. Thomas L. Retrospective power analysis. Conserv Biol 1997; 11:276-280.[CrossRef]
  9. Detsky AS, Sackett DL. When was a "negative" clinical trial big enough? How many patients you need depends on what you found. Arch Intern Med 1985; 145:709-712.[Abstract]
  10. Castelloe JM. Sample size computations and power analysis with the SAS system Proceedings of the 25th Annual SAS Users Group International Conference, April 9–12, 2000, Indianapolis, Ind. Cary, NC: SAS Institute, 2000.
  11. Feiveson AH. Power by simulation. Stata J 2002; 2:107-124.
  12. Obuchowski NA. Multireader receiver operating characteristic studies: a comparison of study designs. Acad Radiol 1995; 2:709-716.[CrossRef][Medline]
  13. Zhou XH, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine New York, NY: Wiley, 2002; 298-304.
  14. Obuchowski NA. Sample size tables for receiver operating characteristic studies. AJR Am J Roentgenol 2000; 175:603-608.[Abstract/Free Full Text]
  15. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992; 27:723-731.[CrossRef][Medline]
  16. Roe CA, Metz CE. Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. Acad Radiol 1997; 4:298-303.[CrossRef][Medline]
  17. Dorfman DD, Berbaum KS, Lenth RV, Chen YF, Donaghy BA. Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. Acad Radiol 1998; 5:591-602.[CrossRef][Medline]
  18. Abramowitz M, Stegun IA. Handbook of mathematical functions with formulas, graphs, and mathematical tables Washington, DC: U.S. Department of Commerce, National Bureau of Standards, 1964. Applied Mathematics Series No. 55.
  19. Feinstein AR. Principles of medical statistics Boca Raton, Fla: Chapman & Hall, 2002; 115-116.
  20. Lenth RV. Java applets for power and sample size. Available at: www.stat.uiowa.edu/~rlenth/Power/index.html. Accessed January 5 2004.
  21. Snedecor GW, Cochran WG. Statistical methods 8th ed. Ames, Iowa: Iowa State University Press, 1989; 469.



This article has been cited by other articles:


Home page
MutagenesisHome page
D. P. Lovell and T. Omori
Statistical issues in the use of the comet assay
Mutagenesis, May 1, 2008; 23(3): 171 - 182.
[Abstract] [Full Text] [PDF]


Home page
CirculationHome page
K. H. Zou, A. J. O'Malley, and L. Mauri
Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models
Circulation, February 6, 2007; 115(5): 654 - 657.
[Full Text] [PDF]


Home page
RadiologyHome page
K. E. Applegate and P. E. Crewson
Statistical Literacy
Radiology, March 1, 2004; 230(3): 613 - 614.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Eng, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Eng, J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
RADIOLOGY RADIOGRAPHICS RSNA JOURNALS ONLINE