In this post, I will show you a way of determining a sample size for obtaining a precise estimate of the slope of the simple linear regression equation
. The basic ingredients we need for sample size planning are a measure of the precision, a way to determine the quantiles of the sampling distribution of our measure of precision, and a way to calculate sample sizes.
The distribution of the margin of error of the regression slope
![Rendered by QuickLaTeX.com t_{.975}\sigma_{\hat{\beta_1}}](https://small-s.science/wp-content/ql-cache/quicklatex.com-c269c42515e941cc6e1c6aba9aa2b5f7_l3.png)
![Rendered by QuickLaTeX.com t_{.975}](https://small-s.science/wp-content/ql-cache/quicklatex.com-894721ca7c1f1faf44fca569a78dee9f_l3.png)
![Rendered by QuickLaTeX.com N - 2](https://small-s.science/wp-content/ql-cache/quicklatex.com-9407bcced38a2cb86229db8e1cb912d1_l3.png)
![Rendered by QuickLaTeX.com \sigma_{\hat{\beta_1}}](https://small-s.science/wp-content/ql-cache/quicklatex.com-ceddf89812fd0876d3188db59ee46817_l3.png)
![Rendered by QuickLaTeX.com \beta_1](https://small-s.science/wp-content/ql-cache/quicklatex.com-dff50ab66b848b910ea781069cba1094_l3.png)
![Rendered by QuickLaTeX.com \beta_1](https://small-s.science/wp-content/ql-cache/quicklatex.com-dff50ab66b848b910ea781069cba1094_l3.png)
![Rendered by QuickLaTeX.com \frac{\sigma^2_{Y|X}}{\sum{(X_i - \bar{X})}^2}](https://small-s.science/wp-content/ql-cache/quicklatex.com-7f78b5fc788c6c32bc59107b2b074c90_l3.png)
![Rendered by QuickLaTeX.com \sigma^2_{Y|X}](https://small-s.science/wp-content/ql-cache/quicklatex.com-cae557ad246f3424f8ebd6446edb1676_l3.png)
![Rendered by QuickLaTeX.com \sigma^2_y(1 - \rho^2_{YX})](https://small-s.science/wp-content/ql-cache/quicklatex.com-a95db282d0731e404da466432bf553d1_l3.png)
![Rendered by QuickLaTeX.com \frac{\sum{(Y -\hat{Y})^2}}{df_e}](https://small-s.science/wp-content/ql-cache/quicklatex.com-7a7d3c7c4da4683be36d34c92dfc94d8_l3.png)
![Rendered by QuickLaTeX.com df_e = N - 2](https://small-s.science/wp-content/ql-cache/quicklatex.com-5748364621488453083b305e66bd72f5_l3.png)
(1)
![Rendered by QuickLaTeX.com \chi^2](https://small-s.science/wp-content/ql-cache/quicklatex.com-984dc78529fc235b078a9f3b62d0f0c4_l3.png)
![Rendered by QuickLaTeX.com df = N - 1](https://small-s.science/wp-content/ql-cache/quicklatex.com-b6a6291ed114dfd36e96539e51ab0750_l3.png)
![Rendered by QuickLaTeX.com \sum{(X - \bar{X})^2} = df\sigma^2_X](https://small-s.science/wp-content/ql-cache/quicklatex.com-9da3239e9f8ee4cfab3ca9190822f6cd_l3.png)
![Rendered by QuickLaTeX.com \frac{df}{df}](https://small-s.science/wp-content/ql-cache/quicklatex.com-eddd36d763c920ae97170dc6ec87f28d_l3.png)
![Rendered by QuickLaTeX.com \chi^2](https://small-s.science/wp-content/ql-cache/quicklatex.com-984dc78529fc235b078a9f3b62d0f0c4_l3.png)
![Rendered by QuickLaTeX.com df_e = N - 2](https://small-s.science/wp-content/ql-cache/quicklatex.com-5748364621488453083b305e66bd72f5_l3.png)
![Rendered by QuickLaTeX.com df = N - 1](https://small-s.science/wp-content/ql-cache/quicklatex.com-b6a6291ed114dfd36e96539e51ab0750_l3.png)
(2)
![Rendered by QuickLaTeX.com \sigma^2_Y = 1](https://small-s.science/wp-content/ql-cache/quicklatex.com-f5e94f445df79a5aa985bb1219ed94e7_l3.png)
![Rendered by QuickLaTeX.com \sigma^2_X = 1](https://small-s.science/wp-content/ql-cache/quicklatex.com-5d44207c180f7083698d66fdc86ff809_l3.png)
![Rendered by QuickLaTeX.com \rho = .50](https://small-s.science/wp-content/ql-cache/quicklatex.com-4667dc0543cb8261f95e26f96158bfac_l3.png)
![Rendered by QuickLaTeX.com N = 100](https://small-s.science/wp-content/ql-cache/quicklatex.com-648657ab17653b6ab095bab5c6d80d4b_l3.png)
vary = 1 varx = 1 rho = .5 N = 100 dfe = N - 2 dfx - N - 1 assu = .80 t = qt(.975, dfe) MOE.80 = t*sqrt(vary*(1 - rho^2)*qf(.80, dfe, dfx)/(dfx*varx)) MOE.80
## [1] 0.1880535
What does a quick simulation study tell us?
library(MASS) set.seed(355) m = c(0, 0) # note: s below is the variance-covariance matrix. In this case, # rho and the cov(y, x) have the same values # otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the # functions that calculate MOE) # equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used # in the specification of the variance-covariance matrix for #generating bivariate normal variates) s = matrix(c(1, .5, .5, 1), 2, 2) se <- rep(10000, 0) for (i in 1:10000) { theData <- mvrnorm(100, m, s) mod <- lm(theData[,1] ~ theData[,2]) se[i] <- summary(mod)$coefficients[4] } MOE = qt(.975, 98)*se quantile(MOE, .80)
## 80% ## 0.1878628
Planning for precision
vary = 1 varx = 1 rho = .5 assu = .80 tMOE = .10 MOE.assu = function(n, vary, varx, rho, assu) { varY.X = vary*(1 - rho^2) dfe = n - 2 dfx = n - 1 t = qt(.975, dfe) q.assu = qf(assu, dfe, dfx) MOE = t*sqrt(varY.X*q.assu/(dfx * varx)) return(MOE) } cost = function(x, tMOE) { cost = (MOE.assu(x, vary=vary, varx=varx, rho=rho, assu=assu) - tMOE)^2 } #note samplesize is at least 40, at most 5000. #note that since we already know that N = 100 is not enough #in stead of 40 we might just as well set N = 100 at the lower #limit of the interval (samplesize = ceiling(optimize(cost, interval=c(40, 5000), tMOE = tMOE)$minimum))
## [1] 321
#check the result: MOE.assu(samplesize, vary, varx, rho, assu)
## [1] 0.09984381
Let’s simulate with the proposed sample size
set.seed(355) m = c(0, 0) # note: s below is the variance-covariance matrix. In this case, # rho and the cov(y, x) have the same values # otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the # functions that calculate MOE) # equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used # in the specification of the variance-covariance matrix for # generating bivariate normal variates) s = matrix(c(1, .5, .5, 1), 2, 2) se <- rep(10000, 0) samplesize = 321 for (i in 1:10000) { theData <- mvrnorm(samplesize, m, s) mod <- lm(theData[,1] ~ theData[,2]) se[i] <- summary(mod)$coefficients[4] } MOE = qt(.975, 98)*se quantile(MOE, .80)
## 80% ## 0.1007269
References
Cumming, G. (2012). Understanding the New Statistics. Effect Sizes, Confidence Intervals, and Meta-Analysis. New York: Routledge
Cumming, G., & Calin-Jageman, R. (2017). Introduction to the New Statistics: Estimation, Open Science, and Beyond. New York: Routledge.
Wilcox, R. (2017). Understanding and Applying Basic Statistical Methods using R. Hoboken, New Jersey: John Wiley and Sons.