In this post, I will show you a way of determining a sample size for obtaining a precise estimate of the slope of the simple linear regression equation . The basic ingredients we need for sample size planning are a measure of the precision, a way to determine the quantiles of the sampling distribution of our measure of precision, and a way to calculate sample sizes.
The distribution of the margin of error of the regression slope
(1)
(2)
vary = 1 varx = 1 rho = .5 N = 100 dfe = N - 2 dfx - N - 1 assu = .80 t = qt(.975, dfe) MOE.80 = t*sqrt(vary*(1 - rho^2)*qf(.80, dfe, dfx)/(dfx*varx)) MOE.80
## [1] 0.1880535
What does a quick simulation study tell us?
library(MASS) set.seed(355) m = c(0, 0) # note: s below is the variance-covariance matrix. In this case, # rho and the cov(y, x) have the same values # otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the # functions that calculate MOE) # equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used # in the specification of the variance-covariance matrix for #generating bivariate normal variates) s = matrix(c(1, .5, .5, 1), 2, 2) se <- rep(10000, 0) for (i in 1:10000) { theData <- mvrnorm(100, m, s) mod <- lm(theData[,1] ~ theData[,2]) se[i] <- summary(mod)$coefficients[4] } MOE = qt(.975, 98)*se quantile(MOE, .80)
## 80% ## 0.1878628
Planning for precision
vary = 1 varx = 1 rho = .5 assu = .80 tMOE = .10 MOE.assu = function(n, vary, varx, rho, assu) { varY.X = vary*(1 - rho^2) dfe = n - 2 dfx = n - 1 t = qt(.975, dfe) q.assu = qf(assu, dfe, dfx) MOE = t*sqrt(varY.X*q.assu/(dfx * varx)) return(MOE) } cost = function(x, tMOE) { cost = (MOE.assu(x, vary=vary, varx=varx, rho=rho, assu=assu) - tMOE)^2 } #note samplesize is at least 40, at most 5000. #note that since we already know that N = 100 is not enough #in stead of 40 we might just as well set N = 100 at the lower #limit of the interval (samplesize = ceiling(optimize(cost, interval=c(40, 5000), tMOE = tMOE)$minimum))
## [1] 321
#check the result: MOE.assu(samplesize, vary, varx, rho, assu)
## [1] 0.09984381
Let’s simulate with the proposed sample size
set.seed(355) m = c(0, 0) # note: s below is the variance-covariance matrix. In this case, # rho and the cov(y, x) have the same values # otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the # functions that calculate MOE) # equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used # in the specification of the variance-covariance matrix for # generating bivariate normal variates) s = matrix(c(1, .5, .5, 1), 2, 2) se <- rep(10000, 0) samplesize = 321 for (i in 1:10000) { theData <- mvrnorm(samplesize, m, s) mod <- lm(theData[,1] ~ theData[,2]) se[i] <- summary(mod)$coefficients[4] } MOE = qt(.975, 98)*se quantile(MOE, .80)
## 80% ## 0.1007269
References
Cumming, G. (2012). Understanding the New Statistics. Effect Sizes, Confidence Intervals, and Meta-Analysis. New York: Routledge
Cumming, G., & Calin-Jageman, R. (2017). Introduction to the New Statistics: Estimation, Open Science, and Beyond. New York: Routledge.
Wilcox, R. (2017). Understanding and Applying Basic Statistical Methods using R. Hoboken, New Jersey: John Wiley and Sons.