In this post, I will show you a way of determining a sample size for obtaining a precise estimate of the slope
of the simple linear regression equation
. The basic ingredients we need for sample size planning are a measure of the precision, a way to determine the quantiles of the sampling distribution of our measure of precision, and a way to calculate sample sizes.
The distribution of the margin of error of the regression slope
(1) ![]()
![]()
![]()
![]()
![]()
![]()
![]()
(2) ![Rendered by QuickLaTeX.com \[\hat{MOE}\sim t_{.975}(N-2)\sqrt{\frac{\sigma_{y}^{2}(1-\rho^{2})F(N-2,N-1)}{(N-1)\sigma_{X}^{2}}}. \]](https://small-s.science/wp-content/ql-cache/quicklatex.com-f0b396b6b94f61a97099bec31546363b_l3.png)
vary = 1 varx = 1 rho = .5 N = 100 dfe = N - 2 dfx - N - 1 assu = .80 t = qt(.975, dfe) MOE.80 = t*sqrt(vary*(1 - rho^2)*qf(.80, dfe, dfx)/(dfx*varx)) MOE.80
## [1] 0.1880535
What does a quick simulation study tell us?
library(MASS)
set.seed(355)
m = c(0, 0)
# note: s below is the variance-covariance matrix. In this case,
# rho and the cov(y, x) have the same values
# otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the
# functions that calculate MOE)
# equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used
# in the specification of the variance-covariance matrix for
#generating bivariate normal variates)
s = matrix(c(1, .5, .5, 1), 2, 2)
se <- rep(10000, 0)
for (i in 1:10000) {
theData <- mvrnorm(100, m, s)
mod <- lm(theData[,1] ~ theData[,2])
se[i] <- summary(mod)$coefficients[4]
}
MOE = qt(.975, 98)*se
quantile(MOE, .80)
## 80% ## 0.1878628
Planning for precision
vary = 1
varx = 1
rho = .5
assu = .80
tMOE = .10
MOE.assu = function(n, vary, varx, rho, assu) {
varY.X = vary*(1 - rho^2)
dfe = n - 2
dfx = n - 1
t = qt(.975, dfe)
q.assu = qf(assu, dfe, dfx)
MOE = t*sqrt(varY.X*q.assu/(dfx * varx))
return(MOE)
}
cost = function(x, tMOE) {
cost = (MOE.assu(x, vary=vary, varx=varx, rho=rho, assu=assu)
- tMOE)^2
}
#note samplesize is at least 40, at most 5000.
#note that since we already know that N = 100 is not enough
#in stead of 40 we might just as well set N = 100 at the lower
#limit of the interval
(samplesize = ceiling(optimize(cost, interval=c(40, 5000),
tMOE = tMOE)$minimum))
## [1] 321
#check the result: MOE.assu(samplesize, vary, varx, rho, assu)
## [1] 0.09984381
Let’s simulate with the proposed sample size
set.seed(355)
m = c(0, 0)
# note: s below is the variance-covariance matrix. In this case,
# rho and the cov(y, x) have the same values
# otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the
# functions that calculate MOE)
# equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used
# in the specification of the variance-covariance matrix for
# generating bivariate normal variates)
s = matrix(c(1, .5, .5, 1), 2, 2)
se <- rep(10000, 0)
samplesize = 321
for (i in 1:10000) {
theData <- mvrnorm(samplesize, m, s)
mod <- lm(theData[,1] ~ theData[,2])
se[i] <- summary(mod)$coefficients[4]
}
MOE = qt(.975, 98)*se
quantile(MOE, .80)
## 80% ## 0.1007269
References
Cumming, G. (2012). Understanding the New Statistics. Effect Sizes, Confidence Intervals, and Meta-Analysis. New York: Routledge
Cumming, G., & Calin-Jageman, R. (2017). Introduction to the New Statistics: Estimation, Open Science, and Beyond. New York: Routledge.
Wilcox, R. (2017). Understanding and Applying Basic Statistical Methods using R. Hoboken, New Jersey: John Wiley and Sons.
