Metodologí­a de Investigación
9 agosto 2023

El Cálculo del Tamaño Muestral en Ciencias de la Salud: Recomendaciones y Guía Práctica

Ruben Fernandez-Matias
a:1:{s:5:"es_ES";s:86:"Unidad de Investigación, Hospital Universitario Fundación Alcorcón, Madrid, España";}
Tamaño de la MuestraEstadísticaMetodologíaSample SizeStatisticsMethodology
Vol. 5 Núm. 1 (2023): Junio

  Métricas

Resumen

Resumen

El cálculo de tamaño muestral es uno de los aspectos más importantes en la planificación de la mayoría de las investigaciones, pudiendo derivar una muestra insuficiente a una inutilidad de la investigación en sí misma. Tradicionalmente se han utilizado los cálculos de tamaño muestral basados en potencia, pero actualmente se han empezado implementar los cálculos basados en precisión. En el presente escrito se presentan una serie de recomendaciones para cálculos para ensayos clínicos aleatorizados, modelos de regresión lineal y logística múltiples, análisis de reproducibilidad y de modelos predictivos multivariables, junto con algunos ejemplos prácticos de su implementación, así como algunas consideraciones con respecto a realización y utilización de datos de estudios piloto a la hora de planificar un cálculo de tamaño muestral.

Abstract

Sample size calculation is one of the most important aspects in the planning of most research, and an insufficient sample can lead to the uselessness of the research itself. Traditionally, power-based sample size calculations have been used, but now precision-based calculations have begun to be implemented. This paper presents recommendations for calculations for randomised clinical trials, multiple linear and logistic regression models, reproducibility analysis, and multivariable predictive models, along with some practical examples of their implementation, as well as some considerations regarding the development and use of pilot study data when planning a sample size calculation.

.

  Cómo citar

1.
Fernandez-Matias R. El Cálculo del Tamaño Muestral en Ciencias de la Salud: Recomendaciones y Guía Práctica. MOVE [Internet]. 9 de agosto de 2023 [citado 19 de mayo de 2024];5(1):481-503. Disponible en: https://publicaciones.lasallecampus.es/index.php/MOVE/article/view/915
  

  Referencias

Algina, J., & Olejnik, S. (2000). Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient. Multivariate Behavioral Research, 35(1), 119–137. https://doi.org/10.1207/S15327906MBR3501_5 DOI: https://doi.org/10.1207/S15327906MBR3501_5

Arienti, C., Armijo-Olivo, S., Minozzi, S., Tjosvold, L., Lazzarini, S. G., Patrini, M., & Negrini, S. (2021). Methodological Issues in Rehabilitation Research: A Scoping Review. Archives of Physical Medicine and Rehabilitation, 102(8), 1614-1622.e14. https://doi.org/10.1016/J.APMR.2021.04.006 DOI: https://doi.org/10.1016/j.apmr.2021.04.006

Austin, P. C., & Steyerberg, E. W. (2015). The number of subjects per variable required in linear regression analyses. Journal of Clinical Epidemiology, 68(6), 627–636. https://doi.org/10.1016/J.JCLINEPI.2014.12.014 DOI: https://doi.org/10.1016/j.jclinepi.2014.12.014

Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Statistics in Medicine, 25(2), 233–245. https://doi.org/10.1002/SIM.2231 DOI: https://doi.org/10.1002/sim.2231

Beal, S. (1991). Response to “Confidence intervals and sample sizes.” Biometrics, 47(4), 1602–1603.

Beal, S. L. (1989). Sample Size Determination for Confidence Intervals on the Population Mean and on the Difference Between Two Population Means. Biometrics, 45(3), 969. https://doi.org/10.2307/2531696 DOI: https://doi.org/10.2307/2531696

Bell, M. L., Whitehead, A. L., & Julious, S. A. (2018). Guidance for using pilot studies to inform the design of intervention trials with continuous outcomes. Clinical Epidemiology, 10, 153–157. https://doi.org/10.2147/CLEP.S146397 DOI: https://doi.org/10.2147/CLEP.S146397

Bland, j. M., & Altman, D. G. (1995). Multiple significance tests: the Bonferroni method. BMJ, 310(6973), 170. https://doi.org/10.1136/BMJ.310.6973.170 DOI: https://doi.org/10.1136/bmj.310.6973.170

Bland, J. M. (2009). The tyranny of power: is there a better way to calculate sample size? BMJ (Clinical Research Ed.), 339(7730), 1133–1135. https://doi.org/10.1136/BMJ.B3985 DOI: https://doi.org/10.1136/bmj.b3985

Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Statistics in Medicine, 21(9), 1331–1335. https://doi.org/10.1002/sim.1108 DOI: https://doi.org/10.1002/sim.1108

Borm, G. F., Fransen, J., & Lemmens, W. A. J. G. (2007). A simple sample size formula for analysis of covariance in randomized clinical trials. Journal of Clinical Epidemiology, 60(12), 1234–1238. https://doi.org/10.1016/J.JCLINEPI.2007.02.006 DOI: https://doi.org/10.1016/j.jclinepi.2007.02.006

Browne, R. H. (1995). On the use of a pilot sample for sample size determination. Statistics in Medicine, 14(17), 1933–1940. https://doi.org/10.1002/SIM.4780141709 DOI: https://doi.org/10.1002/sim.4780141709

Cantor, A. B. (1996). Sample-Size Calculations for Cohen’s Kappa. Psychological Methods, 1(2), 150–153. DOI: https://doi.org/10.1037/1082-989X.1.2.150

Cocks, K., & Torgerson, D. J. (2013). Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology, 66(2), 197–201. https://doi.org/10.1016/J.JCLINEPI.2012.09.002 DOI: https://doi.org/10.1016/j.jclinepi.2012.09.002

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997 DOI: https://doi.org/10.1037/0003-066X.49.12.997

Cohen, J. F., Korevaar, D. A., Altman, D. G., Bruns, D. E., Gatsonis, C. A., Hooft, L., Irwig, L., Levine, D., Reitsma, J. B., De Vet, H. C. W., & Bossuyt, P. M. M. (2016). STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open, 6(11), e012799. https://doi.org/10.1136/BMJOPEN-2016-012799 DOI: https://doi.org/10.1136/bmjopen-2016-012799

Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Medicine, 13(1), 1–10. https://doi.org/10.1186/S12916-014-0241-Z/TABLES/1 DOI: https://doi.org/10.1186/s12916-014-0241-z

Cook, J. A., Julious, S. A., Sones, W., Hampson, L. V., Hewitt, C., Berlin, J. A., Ashby, D., Emsley, R., Fergusson, D. A., Walters, S. J., Wilson, E. C. F., Maclennan, G., Stallard, N., Rothwell, J. C., Bland, M., Brown, L., Ramsay, C. R., Cook, A., Armstrong, D., … Vale, L. D. (2018). DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. Trials, 19(1). https://doi.org/10.1186/S13063-018-2884-0 DOI: https://doi.org/10.1186/s13063-018-2884-0

Copsey, B., Thompson, J. Y., Vadher, K., Ali, U., Dutton, S. J., Fitzpatrick, R., Lamb, S. E., & Cook, J. A. (2018). Sample size calculations are poorly conducted and reported in many randomized trials of hip and knee osteoarthritis: results of a systematic review. Journal of Clinical Epidemiology, 104, 52–61. https://doi.org/10.1016/J.JCLINEPI.2018.08.013 DOI: https://doi.org/10.1016/j.jclinepi.2018.08.013

Dechartres, A., Trinquart, L., Boutron, I., & Ravaud, P. (2013). Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ (Clinical Research Ed.), 346(7908). https://doi.org/10.1136/BMJ.F2304 DOI: https://doi.org/10.1136/bmj.f2304

Eldridge, S. M., Chan, C. L., Campbell, M. J., Bond, C. M., Hopewell, S., Thabane, L., Lancaster, G. A., Altman, D., Bretz, F., Campbell, M., Cobo, E., Craig, P., Davidson, P., Groves, T., Gumedze, F., Hewison, J., Hirst, A., Hoddinott, P., Lamb, S. E., … Tugwell, P. (2016). CONSORT 2010 statement: extension to randomised pilot and feasibility trials. BMJ, 355. https://doi.org/10.1136/BMJ.I5239 DOI: https://doi.org/10.1136/bmj.i5239

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. DOI: https://doi.org/10.3758/BF03193146

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE.

Gardner, M. J., & Altman, D. G. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal (Clinical Research Ed.), 292(6522), 746. https://doi.org/10.1136/BMJ.292.6522.746 DOI: https://doi.org/10.1136/bmj.292.6522.746

Gonzalez, G. Z., Moseley, A. M., Maher, C. G., Nascimento, D. P., Costa, L. da C. M., & Costa, L. O. (2018). Methodologic Quality and Statistical Reporting of Physical Therapy Randomized Controlled Trials Relevant to Musculoskeletal Conditions. Archives of Physical Medicine and Rehabilitation, 99(1), 129–136. https://doi.org/10.1016/J.APMR.2017.08.485 DOI: https://doi.org/10.1016/j.apmr.2017.08.485

Grieve, A. (1989). Confidence intervals and trial sizes (Letter). Lancet, i, 337. DOI: https://doi.org/10.1016/S0140-6736(89)91356-1

Grieve, A. (1991). Confidence intervals and sample sizes. Biometrics, 47(4), 1597–1603. https://doi.org/https://doi.org/10.2307/2532411 DOI: https://doi.org/10.2307/2532411

Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29–48. https://doi.org/10.1348/000711006X126600 DOI: https://doi.org/10.1348/000711006X126600

Gwet, K. L. (2021a). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 1: Analysis of Categorical Ratings (5th ed.). AgreeStat Analytics.

Gwet, K. L. (2021b). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 2: Analysis of Quantitative Ratings (5th ed.). AgreeStat Analytics.

Harrell, F. E. (2001). Regression modeling strategies. Springer-Verlag. DOI: https://doi.org/10.1007/978-1-4757-3462-1

Haynes, A. G., Lenz, A., Stalder, O., & Limacher, A. (2021). `presize`: An R-package for precision-based sample size calculation in clinical research. Journal of Open Source Software, 6(60), 3118. https://doi.org/10.21105/JOSS.03118 DOI: https://doi.org/10.21105/joss.03118

Hingorani, A. D., Van Der Windt, D. A., Riley, R. D., Abrams, K., Moons, K. G. M., Steyerberg, E. W., Schroter, S., Sauerbrei, W., Altman, D. G., Hemingway, H., Briggs, A., Brunner, N., Croft, P., Hayden, J., Kyzas, P., Malats, N., Peat, G., Perel, P., Roberts, I., & Timmis, A. (2013). Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ, 346. https://doi.org/10.1136/BMJ.E5793 DOI: https://doi.org/10.1136/bmj.e5793

Hsieh, F., Bloch, D., & Larsen, M. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17(14), 1623–1634. DOI: https://doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S

Jan, S. L., & Shieh, G. (2018). The Bland-Altman range of agreement: Exact interval procedure and sample size determination. Computers in Biology and Medicine, 100, 247–252. https://doi.org/10.1016/J.COMPBIOMED.2018.06.020 DOI: https://doi.org/10.1016/j.compbiomed.2018.06.020

Julious, S. A., & Owen, R. J. (2006). Sample size calculations for clinical studies allowing for uncertainty about the variance. Pharmaceutical Statistics, 5(1), 29–37. https://doi.org/10.1002/PST.197 DOI: https://doi.org/10.1002/pst.197

Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: an R package. Behavior Research Methods, 39(4), 979–984. https://doi.org/10.3758/BF03192993 DOI: https://doi.org/10.3758/BF03192993

Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305–321. https://doi.org/10.1037/1082-989X.8.3.305 DOI: https://doi.org/10.1037/1082-989X.8.3.305

Kent, D. M., Paulus, J. K., Van Klaveren, D., D’Agostino, R., Goodman, S., Hayward, R., Ioannidis, J. P. A., Patrick-Lake, B., Morton, S., Pencina, M., Raman, G., Ross, J. S., Selker, H. P., Varadhan, R., Vickers, A., Wong, J. B., & Steyerberg, E. W. (2020). The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Annals of Internal Medicine, 172(1), 35–45. https://doi.org/10.7326/M18-3667 DOI: https://doi.org/10.7326/M18-3667

Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B. J., Hróbjartsson, A., Roberts, C., Shoukri, M., & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64(1), 96–106. https://doi.org/10.1016/j.jclinepi.2010.03.002 DOI: https://doi.org/10.1016/j.jclinepi.2010.03.002

Lai, K., & Kelley, K. (2012). Accuracy in parameter estimation for ANCOVA and ANOVA contrasts: sample size planning via narrow confidence intervals. The British Journal of Mathematical and Statistical Psychology, 65(2), 350–370. https://doi.org/10.1111/J.2044-8317.2011.02029.X DOI: https://doi.org/10.1111/j.2044-8317.2011.02029.x

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(NOV). https://doi.org/10.3389/fpsyg.2013.00863 DOI: https://doi.org/10.3389/fpsyg.2013.00863

Liu, S., & Luo, J. (2010). A Study on the Current Development of Body Shape during Infancy in Shanghai. In Jiang, Y and Zou, YL and Zhang, JG and Chen, JQ (Ed.), PROCEEDINGS OF THE 2010 INTERNATIONAL SYMPOSIUM ON CHILDREN AND YOUTH FITNESS AND HEALTH, VOL 1 (pp. 256–259).

Liu, X. S. (2010). Sample Size for Confidence Interval of Covariate-Adjusted Mean Difference. Http://Dx.Doi.Org/10.3102/1076998610381401, 35(6), 714–725. https://doi.org/10.3102/1076998610381401 DOI: https://doi.org/10.3102/1076998610381401

Moons, K. G. M., Altman, D. G., Vergouwe, Y., & Royston, P. (2009). Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ, 338(7709), 1487–1490. https://doi.org/10.1136/BMJ.B606 DOI: https://doi.org/10.1136/bmj.b606

Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E., & Altman, D. G. (2009). Prognosis and prognostic research: what, why, and how? BMJ, 338(7706), 1317–1320. https://doi.org/10.1136/BMJ.B375 DOI: https://doi.org/10.1136/bmj.b375

Pan, H., Liu, S., Miao, D., & Yuan, Y. (2018). Sample size determination for mediation analysis of longitudinal data. BMC Medical Research Methodology, 18(1), 1–11. https://doi.org/10.1186/S12874-018-0473-2/FIGURES/3 DOI: https://doi.org/10.1186/s12874-018-0473-2

Pate, A., Riley, R. D., Collins, G. S., van Smeden, M., Van Calster, B., Ensor, J., & Martin, G. P. (2023). Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Statistical Methods in Medical Research, 32(3). https://doi.org/10.1177/09622802231151220 DOI: https://doi.org/10.1177/09622802231151220

Riley, R. D., Ensor, J., Snell, K. I. E., Harrell, F. E., Martin, G. P., Reitsma, J. B., Moons, K. G. M., Collins, G., & Van Smeden, M. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ (Clinical Research Ed.), 368. https://doi.org/10.1136/BMJ.M441 DOI: https://doi.org/10.1136/bmj.m441

Riley, R. D., Hayden, J. A., Steyerberg, E. W., Moons, K. G. M., Abrams, K., Kyzas, P. A., Malats, N., Briggs, A., Schroter, S., Altman, D. G., & Hemingway, H. (2013). Prognosis Research Strategy (PROGRESS) 2: prognostic factor research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001380 DOI: https://doi.org/10.1371/journal.pmed.1001380

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019a). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019b). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993 DOI: https://doi.org/10.1002/sim.7993

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019c). Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Statistics in Medicine, 38(7), 1276–1296. https://doi.org/10.1002/SIM.7992 DOI: https://doi.org/10.1002/sim.7992

Rothman, K. J., & Greenland, S. (2018). Planning Study Size Based on Precision Rather Than Power. Epidemiology (Cambridge, Mass.), 29(5), 599–603. https://doi.org/10.1097/EDE.0000000000000876 DOI: https://doi.org/10.1097/EDE.0000000000000876

Royston, P., Moons, K. G. M., Altman, D. G., & Vergouwe, Y. (2009). Prognosis and prognostic research: Developing a prognostic model. BMJ, 338(7707), 1373–1377. https://doi.org/10.1136/BMJ.B604 DOI: https://doi.org/10.1136/bmj.b604

Saito, Y., Sozu, T., Hamada, C., & Yoshimura, I. (2006). Effective number of subjects and number of raters for inter-rater reliability studies. Statistics in Medicine, 25(9), 1547–1560. https://doi.org/10.1002/SIM.2294 DOI: https://doi.org/10.1002/sim.2294

Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, 31(3), 699–714. https://doi.org/10.1177/001316447103100310/ASSET/001316447103100310.FP.PNG_V03 DOI: https://doi.org/10.1177/001316447103100310

Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining Power and Sample Size for Simple and Complex Mediation Models. Social Psychological and Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068 DOI: https://doi.org/10.1177/1948550617715068

Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ (Online), 340(7748), 698–702. https://doi.org/10.1136/bmj.c332 DOI: https://doi.org/10.1016/j.ijsu.2010.09.006

Shieh, G. (2009). Detection of interactions between a dichotomous moderator and a continuous predictor in moderated multiple regression with heterogeneous error variance. Behavior Research Methods, 41(1), 61–74. https://doi.org/10.3758/BRM.41.1.61 DOI: https://doi.org/10.3758/BRM.41.1.61

Shieh, G. (2010). Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables. Behavior Research Methods, 42(3), 824–835. https://doi.org/10.3758/BRM.42.3.824 DOI: https://doi.org/10.3758/BRM.42.3.824

Shieh, G. (2018). Power and sample size calculations for comparison of two regression lines with heterogeneous variances. PLoS ONE, 13(12). https://doi.org/10.1371/JOURNAL.PONE.0207745 DOI: https://doi.org/10.1371/journal.pone.0207745

Sim, J. (2019). Should treatment effects be estimated in pilot and feasibility studies? Pilot and Feasibility Studies, 5(1). https://doi.org/10.1186/S40814-019-0493-7 DOI: https://doi.org/10.1186/s40814-019-0493-7

Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257–268. https://doi.org/10.1093/ptj/85.3.257 DOI: https://doi.org/10.1093/ptj/85.3.257

Steyerberg, E. W., Moons, K. G. M., van der Windt, D. A., Hayden, J. A., Perel, P., Schroter, S., Riley, R. D., Hemingway, H., & Altman, D. G. (2013). Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001381 DOI: https://doi.org/10.1371/journal.pmed.1001381

Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A., & Walters, S. J. (2014). Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: A simulation study. Trials, 15(1), 1–13. https://doi.org/10.1186/1745-6215-15-264/FIGURES/8 DOI: https://doi.org/10.1186/1745-6215-15-264

Van Smeden, M., De Groot, J. A. H., Moons, K. G. M., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2016). No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Medical Research Methodology, 16(1), 1–12. https://doi.org/10.1186/S12874-016-0267-3/TABLES/4 DOI: https://doi.org/10.1186/s12874-016-0267-3

van Smeden, M., Moons, K. G. M., de Groot, J. A. H., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2019). Sample size for binary logistic prediction models: Beyond events per variable criteria. Statistical Methods in Medical Research, 28(8), 2455–2474. https://doi.org/10.1177/0962280218784726/ASSET/IMAGES/LARGE/10.1177_0962280218784726-FIG4.JPEG DOI: https://doi.org/10.1177/0962280218784726

Vandenbroucke, J. P., von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D., Pocock, S. J., Poole, C., Schlesselman, J. J., & Egger, M. (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). Epidemiology, 18(6), 805–835. https://doi.org/10.1097/EDE.0b013e3181577511 DOI: https://doi.org/10.1097/EDE.0b013e3181577511

Vickers, A. J. (2001). The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: A simulation study. BMC Medical Research Methodology, 1(1), 1–4. https://doi.org/10.1186/1471-2288-1-6/TABLES/1 DOI: https://doi.org/10.1186/1471-2288-1-6

Vickers, A. J. (2003). Underpowering in randomized trials reporting a sample size calculation. Journal of Clinical Epidemiology, 56(8), 717–720. https://doi.org/10.1016/S0895-4356(03)00141-0 DOI: https://doi.org/10.1016/S0895-4356(03)00141-0

Vickers, A. J., & Altman, D. G. (2001). Statistics Notes: Analysing controlled trials with baseline and follow up measurements. BMJ : British Medical Journal, 323(7321), 1123. https://doi.org/10.1136/BMJ.323.7321.1123 DOI: https://doi.org/10.1136/bmj.323.7321.1123

Walter, S., & Donner A, M. E. (1998). Sample size and optimal designs for reliability studies. Stat Med, 17(1), 101–110. DOI: https://doi.org/10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>3.0.CO;2-E

Walters, S. J., Jacques, R. M., Henriques-Cadby, I. B. D. A., Candlish, J., Totton, N., & Shu Xian, M. T. (2019). Sample size estimation for randomised controlled trials with repeated assessment of patient-reported outcomes: what correlation between baseline and follow-up outcomes should we assume? Trials, 20(1), 566. https://doi.org/10.1186/S13063-019-3671-2 DOI: https://doi.org/10.1186/s13063-019-3671-2

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. In American Statistician (Vol. 70, Issue 2, pp. 129–133). American Statistical Association. https://doi.org/10.1080/00031305.2016.1154108 DOI: https://doi.org/10.1080/00031305.2016.1154108

Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 19(1), 231–240. https://doi.org/10.1519/15184.1 DOI: https://doi.org/10.1519/00124278-200502000-00038

Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057–1073. https://doi.org/10.1177/0962280215588241 DOI: https://doi.org/10.1177/0962280215588241

World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. (2013). JAMA, 310(20), 2191–2194. https://doi.org/10.1001/JAMA.2013.281053 DOI: https://doi.org/10.1001/jama.2013.281053

Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in Medicine, 31(29), 3972–3981. https://doi.org/10.1002/sim.5466 DOI: https://doi.org/10.1002/sim.5466