El Cálculo del Tamaño Muestral en Ciencias de la Salud: Recomendaciones y Guía Práctica
Resumen
Resumen
El cálculo de tamaño muestral es uno de los aspectos más importantes en la planificación de la mayoría de las investigaciones, pudiendo derivar una muestra insuficiente a una inutilidad de la investigación en sí misma. Tradicionalmente se han utilizado los cálculos de tamaño muestral basados en potencia, pero actualmente se han empezado implementar los cálculos basados en precisión. En el presente escrito se presentan una serie de recomendaciones para cálculos para ensayos clínicos aleatorizados, modelos de regresión lineal y logística múltiples, análisis de reproducibilidad y de modelos predictivos multivariables, junto con algunos ejemplos prácticos de su implementación, así como algunas consideraciones con respecto a realización y utilización de datos de estudios piloto a la hora de planificar un cálculo de tamaño muestral.
Abstract
Sample size calculation is one of the most important aspects in the planning of most research, and an insufficient sample can lead to the uselessness of the research itself. Traditionally, power-based sample size calculations have been used, but now precision-based calculations have begun to be implemented. This paper presents recommendations for calculations for randomised clinical trials, multiple linear and logistic regression models, reproducibility analysis, and multivariable predictive models, along with some practical examples of their implementation, as well as some considerations regarding the development and use of pilot study data when planning a sample size calculation.
.
Cómo citar
Referencias
Algina, J., & Olejnik, S. (2000). Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient. Multivariate Behavioral Research, 35(1), 119–137. https://doi.org/10.1207/S15327906MBR3501_5 DOI: https://doi.org/10.1207/S15327906MBR3501_5
Arienti, C., Armijo-Olivo, S., Minozzi, S., Tjosvold, L., Lazzarini, S. G., Patrini, M., & Negrini, S. (2021). Methodological Issues in Rehabilitation Research: A Scoping Review. Archives of Physical Medicine and Rehabilitation, 102(8), 1614-1622.e14. https://doi.org/10.1016/J.APMR.2021.04.006 DOI: https://doi.org/10.1016/j.apmr.2021.04.006
Austin, P. C., & Steyerberg, E. W. (2015). The number of subjects per variable required in linear regression analyses. Journal of Clinical Epidemiology, 68(6), 627–636. https://doi.org/10.1016/J.JCLINEPI.2014.12.014 DOI: https://doi.org/10.1016/j.jclinepi.2014.12.014
Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Statistics in Medicine, 25(2), 233–245. https://doi.org/10.1002/SIM.2231 DOI: https://doi.org/10.1002/sim.2231
Beal, S. (1991). Response to “Confidence intervals and sample sizes.” Biometrics, 47(4), 1602–1603.
Beal, S. L. (1989). Sample Size Determination for Confidence Intervals on the Population Mean and on the Difference Between Two Population Means. Biometrics, 45(3), 969. https://doi.org/10.2307/2531696 DOI: https://doi.org/10.2307/2531696
Bell, M. L., Whitehead, A. L., & Julious, S. A. (2018). Guidance for using pilot studies to inform the design of intervention trials with continuous outcomes. Clinical Epidemiology, 10, 153–157. https://doi.org/10.2147/CLEP.S146397 DOI: https://doi.org/10.2147/CLEP.S146397
Bland, j. M., & Altman, D. G. (1995). Multiple significance tests: the Bonferroni method. BMJ, 310(6973), 170. https://doi.org/10.1136/BMJ.310.6973.170 DOI: https://doi.org/10.1136/bmj.310.6973.170
Bland, J. M. (2009). The tyranny of power: is there a better way to calculate sample size? BMJ (Clinical Research Ed.), 339(7730), 1133–1135. https://doi.org/10.1136/BMJ.B3985 DOI: https://doi.org/10.1136/bmj.b3985
Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Statistics in Medicine, 21(9), 1331–1335. https://doi.org/10.1002/sim.1108 DOI: https://doi.org/10.1002/sim.1108
Borm, G. F., Fransen, J., & Lemmens, W. A. J. G. (2007). A simple sample size formula for analysis of covariance in randomized clinical trials. Journal of Clinical Epidemiology, 60(12), 1234–1238. https://doi.org/10.1016/J.JCLINEPI.2007.02.006 DOI: https://doi.org/10.1016/j.jclinepi.2007.02.006
Browne, R. H. (1995). On the use of a pilot sample for sample size determination. Statistics in Medicine, 14(17), 1933–1940. https://doi.org/10.1002/SIM.4780141709 DOI: https://doi.org/10.1002/sim.4780141709
Cantor, A. B. (1996). Sample-Size Calculations for Cohen’s Kappa. Psychological Methods, 1(2), 150–153. DOI: https://doi.org/10.1037/1082-989X.1.2.150
Cocks, K., & Torgerson, D. J. (2013). Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology, 66(2), 197–201. https://doi.org/10.1016/J.JCLINEPI.2012.09.002 DOI: https://doi.org/10.1016/j.jclinepi.2012.09.002
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997 DOI: https://doi.org/10.1037/0003-066X.49.12.997
Cohen, J. F., Korevaar, D. A., Altman, D. G., Bruns, D. E., Gatsonis, C. A., Hooft, L., Irwig, L., Levine, D., Reitsma, J. B., De Vet, H. C. W., & Bossuyt, P. M. M. (2016). STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open, 6(11), e012799. https://doi.org/10.1136/BMJOPEN-2016-012799 DOI: https://doi.org/10.1136/bmjopen-2016-012799
Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Medicine, 13(1), 1–10. https://doi.org/10.1186/S12916-014-0241-Z/TABLES/1 DOI: https://doi.org/10.1186/s12916-014-0241-z
Cook, J. A., Julious, S. A., Sones, W., Hampson, L. V., Hewitt, C., Berlin, J. A., Ashby, D., Emsley, R., Fergusson, D. A., Walters, S. J., Wilson, E. C. F., Maclennan, G., Stallard, N., Rothwell, J. C., Bland, M., Brown, L., Ramsay, C. R., Cook, A., Armstrong, D., … Vale, L. D. (2018). DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. Trials, 19(1). https://doi.org/10.1186/S13063-018-2884-0 DOI: https://doi.org/10.1186/s13063-018-2884-0
Copsey, B., Thompson, J. Y., Vadher, K., Ali, U., Dutton, S. J., Fitzpatrick, R., Lamb, S. E., & Cook, J. A. (2018). Sample size calculations are poorly conducted and reported in many randomized trials of hip and knee osteoarthritis: results of a systematic review. Journal of Clinical Epidemiology, 104, 52–61. https://doi.org/10.1016/J.JCLINEPI.2018.08.013 DOI: https://doi.org/10.1016/j.jclinepi.2018.08.013
Dechartres, A., Trinquart, L., Boutron, I., & Ravaud, P. (2013). Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ (Clinical Research Ed.), 346(7908). https://doi.org/10.1136/BMJ.F2304 DOI: https://doi.org/10.1136/bmj.f2304
Eldridge, S. M., Chan, C. L., Campbell, M. J., Bond, C. M., Hopewell, S., Thabane, L., Lancaster, G. A., Altman, D., Bretz, F., Campbell, M., Cobo, E., Craig, P., Davidson, P., Groves, T., Gumedze, F., Hewison, J., Hirst, A., Hoddinott, P., Lamb, S. E., … Tugwell, P. (2016). CONSORT 2010 statement: extension to randomised pilot and feasibility trials. BMJ, 355. https://doi.org/10.1136/BMJ.I5239 DOI: https://doi.org/10.1136/bmj.i5239
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. DOI: https://doi.org/10.3758/BF03193146
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE.
Gardner, M. J., & Altman, D. G. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal (Clinical Research Ed.), 292(6522), 746. https://doi.org/10.1136/BMJ.292.6522.746 DOI: https://doi.org/10.1136/bmj.292.6522.746
Gonzalez, G. Z., Moseley, A. M., Maher, C. G., Nascimento, D. P., Costa, L. da C. M., & Costa, L. O. (2018). Methodologic Quality and Statistical Reporting of Physical Therapy Randomized Controlled Trials Relevant to Musculoskeletal Conditions. Archives of Physical Medicine and Rehabilitation, 99(1), 129–136. https://doi.org/10.1016/J.APMR.2017.08.485 DOI: https://doi.org/10.1016/j.apmr.2017.08.485
Grieve, A. (1989). Confidence intervals and trial sizes (Letter). Lancet, i, 337. DOI: https://doi.org/10.1016/S0140-6736(89)91356-1
Grieve, A. (1991). Confidence intervals and sample sizes. Biometrics, 47(4), 1597–1603. https://doi.org/https://doi.org/10.2307/2532411 DOI: https://doi.org/10.2307/2532411
Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29–48. https://doi.org/10.1348/000711006X126600 DOI: https://doi.org/10.1348/000711006X126600
Gwet, K. L. (2021a). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 1: Analysis of Categorical Ratings (5th ed.). AgreeStat Analytics.
Gwet, K. L. (2021b). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 2: Analysis of Quantitative Ratings (5th ed.). AgreeStat Analytics.
Harrell, F. E. (2001). Regression modeling strategies. Springer-Verlag. DOI: https://doi.org/10.1007/978-1-4757-3462-1
Haynes, A. G., Lenz, A., Stalder, O., & Limacher, A. (2021). `presize`: An R-package for precision-based sample size calculation in clinical research. Journal of Open Source Software, 6(60), 3118. https://doi.org/10.21105/JOSS.03118 DOI: https://doi.org/10.21105/joss.03118
Hingorani, A. D., Van Der Windt, D. A., Riley, R. D., Abrams, K., Moons, K. G. M., Steyerberg, E. W., Schroter, S., Sauerbrei, W., Altman, D. G., Hemingway, H., Briggs, A., Brunner, N., Croft, P., Hayden, J., Kyzas, P., Malats, N., Peat, G., Perel, P., Roberts, I., & Timmis, A. (2013). Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ, 346. https://doi.org/10.1136/BMJ.E5793 DOI: https://doi.org/10.1136/bmj.e5793
Hsieh, F., Bloch, D., & Larsen, M. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17(14), 1623–1634. DOI: https://doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S
Jan, S. L., & Shieh, G. (2018). The Bland-Altman range of agreement: Exact interval procedure and sample size determination. Computers in Biology and Medicine, 100, 247–252. https://doi.org/10.1016/J.COMPBIOMED.2018.06.020 DOI: https://doi.org/10.1016/j.compbiomed.2018.06.020
Julious, S. A., & Owen, R. J. (2006). Sample size calculations for clinical studies allowing for uncertainty about the variance. Pharmaceutical Statistics, 5(1), 29–37. https://doi.org/10.1002/PST.197 DOI: https://doi.org/10.1002/pst.197
Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: an R package. Behavior Research Methods, 39(4), 979–984. https://doi.org/10.3758/BF03192993 DOI: https://doi.org/10.3758/BF03192993
Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305–321. https://doi.org/10.1037/1082-989X.8.3.305 DOI: https://doi.org/10.1037/1082-989X.8.3.305
Kent, D. M., Paulus, J. K., Van Klaveren, D., D’Agostino, R., Goodman, S., Hayward, R., Ioannidis, J. P. A., Patrick-Lake, B., Morton, S., Pencina, M., Raman, G., Ross, J. S., Selker, H. P., Varadhan, R., Vickers, A., Wong, J. B., & Steyerberg, E. W. (2020). The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Annals of Internal Medicine, 172(1), 35–45. https://doi.org/10.7326/M18-3667 DOI: https://doi.org/10.7326/M18-3667
Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B. J., Hróbjartsson, A., Roberts, C., Shoukri, M., & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64(1), 96–106. https://doi.org/10.1016/j.jclinepi.2010.03.002 DOI: https://doi.org/10.1016/j.jclinepi.2010.03.002
Lai, K., & Kelley, K. (2012). Accuracy in parameter estimation for ANCOVA and ANOVA contrasts: sample size planning via narrow confidence intervals. The British Journal of Mathematical and Statistical Psychology, 65(2), 350–370. https://doi.org/10.1111/J.2044-8317.2011.02029.X DOI: https://doi.org/10.1111/j.2044-8317.2011.02029.x
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(NOV). https://doi.org/10.3389/fpsyg.2013.00863 DOI: https://doi.org/10.3389/fpsyg.2013.00863
Liu, S., & Luo, J. (2010). A Study on the Current Development of Body Shape during Infancy in Shanghai. In Jiang, Y and Zou, YL and Zhang, JG and Chen, JQ (Ed.), PROCEEDINGS OF THE 2010 INTERNATIONAL SYMPOSIUM ON CHILDREN AND YOUTH FITNESS AND HEALTH, VOL 1 (pp. 256–259).
Liu, X. S. (2010). Sample Size for Confidence Interval of Covariate-Adjusted Mean Difference. Http://Dx.Doi.Org/10.3102/1076998610381401, 35(6), 714–725. https://doi.org/10.3102/1076998610381401 DOI: https://doi.org/10.3102/1076998610381401
Moons, K. G. M., Altman, D. G., Vergouwe, Y., & Royston, P. (2009). Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ, 338(7709), 1487–1490. https://doi.org/10.1136/BMJ.B606 DOI: https://doi.org/10.1136/bmj.b606
Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E., & Altman, D. G. (2009). Prognosis and prognostic research: what, why, and how? BMJ, 338(7706), 1317–1320. https://doi.org/10.1136/BMJ.B375 DOI: https://doi.org/10.1136/bmj.b375
Pan, H., Liu, S., Miao, D., & Yuan, Y. (2018). Sample size determination for mediation analysis of longitudinal data. BMC Medical Research Methodology, 18(1), 1–11. https://doi.org/10.1186/S12874-018-0473-2/FIGURES/3 DOI: https://doi.org/10.1186/s12874-018-0473-2
Pate, A., Riley, R. D., Collins, G. S., van Smeden, M., Van Calster, B., Ensor, J., & Martin, G. P. (2023). Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Statistical Methods in Medical Research, 32(3). https://doi.org/10.1177/09622802231151220 DOI: https://doi.org/10.1177/09622802231151220
Riley, R. D., Ensor, J., Snell, K. I. E., Harrell, F. E., Martin, G. P., Reitsma, J. B., Moons, K. G. M., Collins, G., & Van Smeden, M. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ (Clinical Research Ed.), 368. https://doi.org/10.1136/BMJ.M441 DOI: https://doi.org/10.1136/bmj.m441
Riley, R. D., Hayden, J. A., Steyerberg, E. W., Moons, K. G. M., Abrams, K., Kyzas, P. A., Malats, N., Briggs, A., Schroter, S., Altman, D. G., & Hemingway, H. (2013). Prognosis Research Strategy (PROGRESS) 2: prognostic factor research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001380 DOI: https://doi.org/10.1371/journal.pmed.1001380
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019a). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019b). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993 DOI: https://doi.org/10.1002/sim.7993
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019c). Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Statistics in Medicine, 38(7), 1276–1296. https://doi.org/10.1002/SIM.7992 DOI: https://doi.org/10.1002/sim.7992
Rothman, K. J., & Greenland, S. (2018). Planning Study Size Based on Precision Rather Than Power. Epidemiology (Cambridge, Mass.), 29(5), 599–603. https://doi.org/10.1097/EDE.0000000000000876 DOI: https://doi.org/10.1097/EDE.0000000000000876
Royston, P., Moons, K. G. M., Altman, D. G., & Vergouwe, Y. (2009). Prognosis and prognostic research: Developing a prognostic model. BMJ, 338(7707), 1373–1377. https://doi.org/10.1136/BMJ.B604 DOI: https://doi.org/10.1136/bmj.b604
Saito, Y., Sozu, T., Hamada, C., & Yoshimura, I. (2006). Effective number of subjects and number of raters for inter-rater reliability studies. Statistics in Medicine, 25(9), 1547–1560. https://doi.org/10.1002/SIM.2294 DOI: https://doi.org/10.1002/sim.2294
Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, 31(3), 699–714. https://doi.org/10.1177/001316447103100310/ASSET/001316447103100310.FP.PNG_V03 DOI: https://doi.org/10.1177/001316447103100310
Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining Power and Sample Size for Simple and Complex Mediation Models. Social Psychological and Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068 DOI: https://doi.org/10.1177/1948550617715068
Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ (Online), 340(7748), 698–702. https://doi.org/10.1136/bmj.c332 DOI: https://doi.org/10.1016/j.ijsu.2010.09.006
Shieh, G. (2009). Detection of interactions between a dichotomous moderator and a continuous predictor in moderated multiple regression with heterogeneous error variance. Behavior Research Methods, 41(1), 61–74. https://doi.org/10.3758/BRM.41.1.61 DOI: https://doi.org/10.3758/BRM.41.1.61
Shieh, G. (2010). Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables. Behavior Research Methods, 42(3), 824–835. https://doi.org/10.3758/BRM.42.3.824 DOI: https://doi.org/10.3758/BRM.42.3.824
Shieh, G. (2018). Power and sample size calculations for comparison of two regression lines with heterogeneous variances. PLoS ONE, 13(12). https://doi.org/10.1371/JOURNAL.PONE.0207745 DOI: https://doi.org/10.1371/journal.pone.0207745
Sim, J. (2019). Should treatment effects be estimated in pilot and feasibility studies? Pilot and Feasibility Studies, 5(1). https://doi.org/10.1186/S40814-019-0493-7 DOI: https://doi.org/10.1186/s40814-019-0493-7
Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257–268. https://doi.org/10.1093/ptj/85.3.257 DOI: https://doi.org/10.1093/ptj/85.3.257
Steyerberg, E. W., Moons, K. G. M., van der Windt, D. A., Hayden, J. A., Perel, P., Schroter, S., Riley, R. D., Hemingway, H., & Altman, D. G. (2013). Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001381 DOI: https://doi.org/10.1371/journal.pmed.1001381
Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A., & Walters, S. J. (2014). Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: A simulation study. Trials, 15(1), 1–13. https://doi.org/10.1186/1745-6215-15-264/FIGURES/8 DOI: https://doi.org/10.1186/1745-6215-15-264
Van Smeden, M., De Groot, J. A. H., Moons, K. G. M., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2016). No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Medical Research Methodology, 16(1), 1–12. https://doi.org/10.1186/S12874-016-0267-3/TABLES/4 DOI: https://doi.org/10.1186/s12874-016-0267-3
van Smeden, M., Moons, K. G. M., de Groot, J. A. H., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2019). Sample size for binary logistic prediction models: Beyond events per variable criteria. Statistical Methods in Medical Research, 28(8), 2455–2474. https://doi.org/10.1177/0962280218784726/ASSET/IMAGES/LARGE/10.1177_0962280218784726-FIG4.JPEG DOI: https://doi.org/10.1177/0962280218784726
Vandenbroucke, J. P., von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D., Pocock, S. J., Poole, C., Schlesselman, J. J., & Egger, M. (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). Epidemiology, 18(6), 805–835. https://doi.org/10.1097/EDE.0b013e3181577511 DOI: https://doi.org/10.1097/EDE.0b013e3181577511
Vickers, A. J. (2001). The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: A simulation study. BMC Medical Research Methodology, 1(1), 1–4. https://doi.org/10.1186/1471-2288-1-6/TABLES/1 DOI: https://doi.org/10.1186/1471-2288-1-6
Vickers, A. J. (2003). Underpowering in randomized trials reporting a sample size calculation. Journal of Clinical Epidemiology, 56(8), 717–720. https://doi.org/10.1016/S0895-4356(03)00141-0 DOI: https://doi.org/10.1016/S0895-4356(03)00141-0
Vickers, A. J., & Altman, D. G. (2001). Statistics Notes: Analysing controlled trials with baseline and follow up measurements. BMJ : British Medical Journal, 323(7321), 1123. https://doi.org/10.1136/BMJ.323.7321.1123 DOI: https://doi.org/10.1136/bmj.323.7321.1123
Walter, S., & Donner A, M. E. (1998). Sample size and optimal designs for reliability studies. Stat Med, 17(1), 101–110. DOI: https://doi.org/10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>3.0.CO;2-E
Walters, S. J., Jacques, R. M., Henriques-Cadby, I. B. D. A., Candlish, J., Totton, N., & Shu Xian, M. T. (2019). Sample size estimation for randomised controlled trials with repeated assessment of patient-reported outcomes: what correlation between baseline and follow-up outcomes should we assume? Trials, 20(1), 566. https://doi.org/10.1186/S13063-019-3671-2 DOI: https://doi.org/10.1186/s13063-019-3671-2
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. In American Statistician (Vol. 70, Issue 2, pp. 129–133). American Statistical Association. https://doi.org/10.1080/00031305.2016.1154108 DOI: https://doi.org/10.1080/00031305.2016.1154108
Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 19(1), 231–240. https://doi.org/10.1519/15184.1 DOI: https://doi.org/10.1519/00124278-200502000-00038
Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057–1073. https://doi.org/10.1177/0962280215588241 DOI: https://doi.org/10.1177/0962280215588241
World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. (2013). JAMA, 310(20), 2191–2194. https://doi.org/10.1001/JAMA.2013.281053 DOI: https://doi.org/10.1001/jama.2013.281053
Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in Medicine, 31(29), 3972–3981. https://doi.org/10.1002/sim.5466 DOI: https://doi.org/10.1002/sim.5466