### Planning for precise contrast estimates in between subjects designs

Here I would like to explain the procedure for sample size planning for one-way and two-way (factorial) between subjects designs. We will consider examples based on and described in Haans (2018).

## The first example: one-way design

The first example considers the effect of seating location  of students on their educational performance. Seating location is defined as distance from the teacher and operationalized in terms of the row the student is seated in, with first row being the closest to the teacher and the fourth row being the furthest away. 20 Students are randomly assigned to one of the four possible rows, so N = 20, n = 5. The dependent variable is the course grade of the student. (Note: the data and study are hypothetical).

As Haans (2018) explains, one psychological theory explaining the effect of seating position on educational performance is based on social influence. This theory posits that due to the social influence of the teacher, the students that are seated closest to the teacher find themselves in a state of undivided attention. This undivided attention causes their educational performance to be better than the students who are seated further away.

In operational terms, then, we may expect that first row students will have a better average grade than students seated on the other rows. So, the quantitative research question we are interested in is:

“How much do the average grades differ between students seated first row and the students seated on other rows?”

We can estimate this quantity with a Helmert Contrast, where we assign a contrast weight of 1 to mean of the first row grades and weights -1/3 to the means of the grades in the other rows.

Haans (2018) gives us the following results. The contrast estimate equals 2.00 , 95% CI [0.27, 3.73]. In order to interpret this more easily, we divide this estimate by the square root Mean Square Error, to obtain the standardized estimate and standardized confidence interval (not to be confused with the confidence interval of the standardized estimate, but that’s a different story. The result is: 1.26, 95% CI [0.17, 2.36].

To answer the research question, the estimated difference equals 1.26 standard deviations, which according to rule-of-thumbs frequently used in psychology is a large difference. The CI shows the enormous amount of uncertainty of this estimate: population values between 0.17 (small) and 2.36 (very large) are also consistent with the observed data and our statistical assumptions. So, it seems safe to conclude that it looks like there is a positive effect of seating position, but the wide range of the CI makes it clear that the data do not tell us enough about the size of the effect, the precision is simply too low.

The precision is f = 1.09, which according to my rules-of-thumb is very imprecise (I consider f = 0.65, to be barely tolerable).

So, let’s plan for a replication study with a reasonably precise estimate of  f = 0.40, with 80% assurance. (Note: for some advice on setting target Moe: Planning with assurance, with assurance. ) I’ve used the app: https://gmulder.shinyapps.io/PlanningFactorialContrasts/ with the default values for a single factor between subjects design with 4 conditions.  According to the app, we need n = 36 participants per condition (making a total of  N = 144).

(For more detailed information considering sample size planning for contrast analysis see: http://small-s.science/?p=10 and for some guidelines for setting target MoE: http://small-s.science/?p=14)

## The second example: factorial design

Our second example is also taken from Haans (2018). It considered the same phenomenon, the effect of students’ seating distance from the teacher and the educational performance of the students.

A second theory explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall.

To test that theory, a experiment was conducted with N = 72 participants attending a lecture. The lecture was given to two independent groups of 36 pariticpant. The first group attended the lecture while the teacher was wearing dark sunglasses, the second group attented the lecture while the teacher was not wearing sunglasses,. Again, all participants were randomly assigned to 1 of 4 possible rows. The dependent variable was the score on a 10-item questionnaire about the contents of the lecture.

Now, if the eyecontact of the teachter is the causal variable, we may expect that in this experimental setup the difference between the average score of the persons seated on the first row and the averages of the other rows will be smaller for the condition where the teacher wears sunglasses than for the condition in which the teacher does not wear these glasses, as wearing sunglasses prevents eye-contact between the teacher and the students. Our quantitative question is therefore:

“How much does the contrast between the first row and the others rows differ between the conditions with and without sunglasses?”

In other words, we are interested in the size of the interaction effect.

I’ve downloaded the dataset from http://pareonline.net/sup/v23n9.zip (between2by4data.sav) and specified the following syntax in SPSS:

UNIANOVA retention BY sunglasses location
/LMATRIX = “Interaction contrast” sunglasses*location 1 -1/3 -1/3 – 1/3 -1 1/3 1/3 1/3 intercept 0
/DESIGN= sunglasses*location.

The result of the analysis is that the contrast estimate equals 1.0, 95% CI [-0.33, 2.33]. If we standardize this with the within condition variance (the condition being the combination of the levels of the two factors), we get 0.82, 95% CI [-0.27, 1.90].

So, it appears that the difference between the means of the first row and that of the other rows is on average 1.0 points larger in the condition without sunglasses than in the condition with sunglasses. This corresponds to a large difference (dwith = .82). However, the CI also contains negative population difference (albeit that they are smallish), so even though the results are promising for the theory (eyecontact), these negative effects will not persuade a critical reviewer of the study. Indeed, these negative effects contradict the substantive hypothesis.

Again, the confidence interval is so wide, that effects ranging from small negative effects to huge positive effects are considered plausible. Since the results are promising for the theory, a replication study with more precision may be needed to persuade the critics. Let’s plan for a precision of f = .25 with 95% assurance.

I’ve used the app: https://gmulder.shinyapps.io/PlanningFactorialContrasts/ specifying that we have a factorial design with a = 2 levels and b = 4 levels. The result is that for the interaction contrast  with f = .25 and assurance = .95, we need 175 participants per combination of the two factors. This means, that a total of N = 1400 must be recruited.

I’ve taken this from the following output.

 Figure 1: Output of sample size planning

I’ve looked at the  “Contrast Summary Tab” to check that interaction A1B1 is the correct one (see Figure 2).

 Figure 2. Summary of contrast weights.

What’s important in the above figure is that the set of weights for A1B1 matches the set of weights used to get the contrast estimate in SPSS (In the LMATRIX-subcommand), so that’s how we know that A1B1 is the contrast we want.  (Note: if you switch the number of levels in the app, that is, use 4 levels for A and 2 for B, the interaction weights will match perfectly).

Reference
Haans, Antal (2018). Contrast Analysis: A Tutorial. Practical Assessment, Research, & Education, 23(9). Available online: http://pareonline.net/getvn.asp?v=23&n=9

### Planning for precise contrasts in two-way factorial designs: a Tutorial

I’ve created a first version of Shiny App for sample size planning for precise contrast estimates in one and two-way designs. So, if you want to plan for interaction contrasts for two-way designs, take a look here: https://gmulder.shinyapps.io/PlanningFactorialContrasts/

Important: this is a first version that contains no error checking (but you know what you’re doing, so that’s not a problem). Also, I have not tested the results of the app with simulation studies. As soon as I have done that, I will share the code of the app.

Update: The values for the factorial two-way within subjects designs are checked with simulation studies (for f = .40, assurance = .80, and correlation rho = .50).

Note: if you have a single factor design, you may also consider looking here: http://small-s.science/?p=19

Note: if you have a two-groups independent design, go here for an introduction to sample size planning and its relation to the power of the t-test: http://small-s.science/?p=25

See for more detailed information about sample size planning for single factor and factorial designs (between, within, and mixed): http://small-s.science/?p=10 and for guidelines for setting target MoE: http://small-s.science/?p=14)

## How does the app work?

### 1. Specifying Target MoE and Assurance

Target MoE should be specified in a number of standard deviations (usually a fraction; for details see Cumming, 2012; Cumming & Calin-Jageman, 2017). The symbol f will be used to refer to this standardized MoE. Target MoE (f) must be larger than zero (f will be automatically set to .05 if you accidentally fill in the value 0).
I suggest using the following guidelines for target MoE (f):
Description f
Extremely Precise .05
Very Precise .10
Precise .25
Reasonably Precise .40
Borderline Precise .65
You should only use these guidelines if you lack the information you need for specifying a reasonable value for Target MoE.

Assurance is the probability that (to be) obtained MoE will be no larger than Target MoE. I suggest setting Assurance minimally at .80.

#### Assumptions of the App

The app uses the within condition standard deviation as the standardizer for MoE. For the factorial designs this is the variance within each combination of the factor levels. The app assumes equal variances and, for the mixed and within subject designs equal covariances as well.

### 2. Specifying number of factors, design, correlation and levels of each factor

You can plan for designs with one factor (between and within designs) and two factors (between, within, and mixed designs).
If you have a mixed design, the first factor (Factor A) is considered to be the between subjects factor and the second factor (Factor B) the within subjects factor.
The app also requires a value for the cross-condition correlation in the within or mixed designs.

### 3. Specifying contrasts

#### Main comparisons

The default contrasts for main comparisons are Helmert contrasts, but you can specify any contrast you like. Use commas to separate the contrast weights and use semi-colons to separate the weights of different contrasts. For example, the “1, -1/2, -1/2; 0, 1, -1” indicates two contrasts, the first contrast has weights {1, -1/2, -1/2}, the second contrast has weights {0, 1, -1}.
It is recommended that the absolute values of the contrast weights of each contrast sum to 2.0 (except for interaction contrasts, where the absolute values should sum to 4.0).
You will only have to specify contrasts for the marginal means of each factor. The app calculates appropriate values for the contrast weights of each cell in the design based on the weights for the marginal means. For example, in a 2×2 design the weights for the marginal means for each factor may be {1, -1]. The app translates this to {0.5, 0.5, -0.5, -0.5}, to account for the fact that the contrast etimates involves the combination of each of the 2×2 cell means.

#### Interaction contrasts

The app calculates the interaction contrasts on the basis of the contrasts specified for the main comparisons. So, if you want to plan for your favorite interaction, simply type in the weights for the two main comparisons involved. Suppose you have a 2×2 design, for example, and you want to plan for an interaction contrasts with weigths {1, -1, -1, 1}, type in “1, -1” for factor A and “1, -1” for factor B.
As another example, suppose factor A has two levels and factor B has three, and you want to estimate the extent to which the difference between the first level of B and the means of the other two levels differes between the levels of A. You type in “1, -1” for factor A, and  “1, -1/2, -1/2” for factor B, and the app will calculate the interaction contrast with weights {1, -1/2, -1/2, -1, 1/2, 1/2}.
After planning the results: you can check whether the contrasts are what you intended by looking at the “Contrast Summary” output-tab.

### 4. Output

The output contains sample sizes  per  treatment (combination) and total samples required for target MoE and assurance. You will get samples sizes per contrast.
On the “Contrasts Summary” tab the app shows information about the contrast weights.

### Planning for Precise Contrasts: Tutorial for single factor designs

This is a tutorial for  a planning for precision  of contrasts estimates. The application is here: https://gmulder.shinyapps.io/PlanningContrasts/.

NOTE: For a (beta) version of planning for factoral designs: http://small-s.science/?p=18

NOTE: I’ve updated the app with a few corrections, so there is a new version. (The November version has corrected degrees of freedom  for the 3 and 4 condition within design).

If you like to run the app in R, install the shiny and devtool packages and run the following:

library(shiny)
library(devtools)
source_url("https://git.io/fpI1R")
shinyApp(ui = ui, server = server)


#### Specifying Target MoE and Assurance

Target MoE should be specified in a number of standard deviations (usually a fraction; for details see Cumming, 2012; Cumming & Calin-Jageman, 2017). The symbol f will be used to refer to this standardized MoE. Target MoE (f) must be larger than zero (f will be automatically set to .05 if you accidentally fill in the value 0).
I suggest using the following guidelines for target MoE (f):
Description f
Extremely Precise .05
Very Precise .10
Precise .25
Reasonably Precise .40
Borderline Precise .65
You should only use these guidelines if you lack the information you need for specifying a reasonable value for Target MoE.

Assurance is the probability that (to be) obtained MoE will be no larger than Target MoE. I suggest setting Assurance minimally at .80.

#### Specifying the Design

The app works with independent and dependent designs for 2, 3, and 4 conditions.  With 2 conditions, the analysis is equivalent to the independent and dependent t-tests, with more than two conditions the analysis is equivalent to one-way independent ANOVA or dependent ANOVA.

### Specifying the Cross-Condition correlation

If you choose the dependent design, you also need to specify a value for the cross-condition correlation. This value should be larger than zero. One of the assumptions underlying the app, is that there is only 1 observation per participant (or any other unit of analysis). That is why I like to think of this correlation as (conceptually related to) the reliability of the participant scores (averaged over conditions). From that perspective, a correlation around .60 would be borderline acceptable and around .80 would be considered good enough. So, for worst-case scenarios use a correlation smaller than .60, and for optimistic scenario’s correlations of .80 or larger.

Note: for technical reasons a correlation of 1 will be automatically changed to .99.

For independent designs the correlation should equal 0. (And the above story about reliability does no longer make sense; but we also do not need it).

### Specifying Contrasts

#### Contrasts must obey the following rules.

1. The sum of the contrast weights must equal zero;
2. The sum of the absolute values of the contrast must be equal to two.
If the contrast weights confirm to these rules the resulting estimate is a difference between two or more means expressed on the scale of the variable (see Kline, 2013 for more information on contrasts).
The contrast estimate is simply the sum of the condition means multiplied by the contrast weights. For instance, with four condition means M1, M2, M3, M4, and contrast weights {0.5, 0.5, -0.5, -0.5}, the value of the contrast estimate is the sum 0.5M1 + 0.5M2 + -0.5M3 + -0.5M4 = (.5M1 + .5M2) – (.5M3 + .5M4) = ( M1 + M2) / 2 – (M3 + M4) / 2:  the value of the contrast estimate is the difference between the mean of the first two conditions and the mean of the last two conditions.
With more than 2 conditions, the app let’s you choose between “Custom contrast” and “Helmert Contrasts”.
If you choose “Custom contrast” the app plans for precision of just that contrast. You will get the sample size needed and a figure of the expected results (see below). The default values give you the weights for a pair wise comparison of two of the conditions. You can simply type over these default values.
If you choose “Helmert contrasts” the app will give you an orthogonal set of Helmert contrasts as default values.  You can simply type over these default values to get any contrast you like, but you cannot specify more contrasts than the number of conditions minus one.
If you choose “Helmert contrasts” the app will plan for the sample size of the contrast with the lowest precision. If you use the default values this will be the contrast specifying a pairwise comparison.  For a set of contrasts the pair wise comparison estimate will be the least precise so if you know the sample size needed for a precise pairwise comparison, you know that the precision you will get for the other contrasts will be just as precise  or more precise. The planning results will show the expected value for MoE for all contrasts, but the figure will only display expected results for the least precise contrast estimate.

#### Examples

Two groups independent design

I use the default values (see Figures 1 and 2). And click the “Get Sample size ” button.
 Figure 1. Values for Target MoE, assurance and design
 Figure 2: Standard contrasts for comparison of two conditions

The output is as follows:

The results give you the sample size for each condition (n), and information about target MoE (f), assurance (assu), the number of conditions (k), and the cross-condition correlation (cor; the value is zero, as it should be in the independent design). With n = 55, there is a 80% probability that f will not be larger than .40.

If you use 55 participants per group the expected MoE equals 0.38.

Of course, using 55 participants per group, makes the total sample size equal to 110.

The output also included a plot of the expected results (what you can expect to happen on average). See Figure 3.

 Figure 3: Expected results using n = 55 participants per group in the two groups independent design

This output helps you to consider whether the Expected MoE is small enough. Suppose, for instance, that true difference equals .5 standard deviations, i.e. a medium effect. The figure shows that the expected contrast estimate is a medium effect, and the confidence interval shows that on average values ranging from small to large effects [.12, .88] will be included in the interval. If the difference between small, medium and large effects is important, an expected precision of f = .38 may not be enough, although small and large effects are at the limits of the confidence interval.

A four groups dependent design

Technical Note:
The app assumes that the sum of squares of the Error Variance can be decomposed in (k – 1) equal parts, where k is the number of conditions. I will change this restriction in a future version of the app. For a custom contrast it is assumed that the contrast is part of an orthogonal set.

Suppose your major interest is the comparison between the average of two groups and the average of two other groups. You have a dependent (repeated measures) design in which participants will be exposed to each of the four treatment conditions. Let’s plan for a target MoE of f = 0.25, with 80 % assurance and let’s suppose our cross-condition correlation equals r = .70. I choose a custom contrast with weights {1/2, 1/2, -1/2, -1/2} (see Figure 4).

 Figure 4. Input for sample size planning

The output is as follows.

So, we need 26 participants to have 80% assurance that obtained MoE will not be larger than f = 0.25. Expected MoE is equal to .22. According to the guidelines above, this is a precise estimate.

If you choose “Helmert Contrasts” instead, and press the button without changing anything, the output is as follows.

Under Expected Moe you will see for each of the three contrasts c1, c2, and c3, the weights and expected MOE. The 46 participants give an expected MoE smaller than target MoE, for the least precise estimate (c3; the pairwise comparison) the other expected MoE’s are smaller than that. The Expected Results Figure will display the results for the contrasts with the largest expected MoE.

### Custom contrasts for the one-way repeated measures design using Lmer

Here is some code for doing one-way repeated measures analysis with lme4 and custom contrasts. We will use a repeated measures design with three conditions of the factor Treat and 20 participants. The contrasts are Helmert contrasts, but they differ from the built-in Helmert contrasts in that the sum of the absolute values of the contrasts weights equals 2 for each contrast.
The standard error of each contrast equals the square root of the product of the sum of the squared contrast weight w and the residual variance divided by the number of participants n.

The residual variance equals the within treatment variance times 1 minus the correlation between conditions. (Which equals the within treatment variance minus 1 times the covariance ) .

In the example below, the within treatment variance equals 1 and the covariance 0.5 (so the value of the correlation is .50 as well). The residual variance is therefore equal to .50.

For the first contrasts, the weights are equal to {-1, 1, 0}, so the value of the standard error of the contrasts should be equal to the square root of 2*.50/20 = 0.2236.

library(MASS)
library(lme4)
# setting up treatment and participants factors
nTreat = 3
nPP = 20
Treat <- factor(rep(1:nTreat, each=nPP))
PP <- factor(rep(1:nPP, nTreat))

# generate some random
# specify means

means = c(0, .20, .50)

# create variance-covariance matrix
# var = 1, cov = .5
Sigma = matrix(rep(.5, 9), 3, 3)
diag(Sigma) <- 1

# generate the data; using empirical = TRUE
# so that variance and covariance are known
# set to FALSE for "real" random data

sco = as.vector(mvrnorm(nPP, means, Sigma, empirical = TRUE))

#setting up custom contrasts for Treatment factor
myContrasts <- rbind(c(-1, 1, 0), c(-.5, -.5, 1))
contrasts(Treat) <- ginv(myContrasts)

#fit linear mixed effects model:
myModel <- lmer(sco ~ Treat + (1|PP))

summary(myModel)

## Linear mixed model fit by REML ['lmerMod']
## Formula: sco ~ Treat + (1 | PP)
##
## REML criterion at convergence: 157.6
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -2.6676 -0.4869  0.1056  0.6053  1.9529
##
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  PP       (Intercept) 0.5      0.7071
##  Residual             0.5      0.7071
## Number of obs: 60, groups:  PP, 20
##
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   0.2333     0.1826   1.278
## Treat1        0.2000     0.2236   0.894
## Treat2        0.4000     0.1936   2.066
##
## Correlation of Fixed Effects:
##        (Intr) Treat1
## Treat1 0.000
## Treat2 0.000  0.000


### Distribution of the difference between two binomial variables

For a project I’m working on, I needed to work with the probabilities associated with the difference between two binomial variables. I thought I’d share the code for four functions for calculating the probability mass and the cumulative probabilites.

The functions d.diff.binom (probability mass) and p.diff.binom (cumulative distribution) are functions for calculating the distribution of the differences for equal sample sizes N and equal “succes” probability.  The arguments of the functions are the quantiles k of the distribution of the differences, the number of trials N and the probability of succes p.

The functions d.diff.binom.un (probability mass) and p.diff.binom.un (cumulative distribution) are functions for calculating the distribution of the differences for sample sizes and succes probabilities that may (or may not) differ between the two populations. The arguments of the functions are the quantiles k of the distribution of the differences, the number of trials N.1 and N.2 for the two groups and the probability of succes p.1 and p.2 for each group.

#equal N and p:
#probability mass:
d.diff.bin = function(k, N, p) {
diff = outer(0:N, 0:N, "-")
prob = outer(dbinom(0:N, N, p, log=TRUE),
dbinom(0:N, N, p, log=TRUE), "+")
p = sum(exp(prob))
return(p)
}
#cumulative probability:
p.diff.bin = function(k, N, p) {
diff = outer(0:N, 0:N, "-")
prob = outer(dbinom(0:N, N, p, log=TRUE),
dbinom(0:N, N, p, log=TRUE), "+")
p = sum(exp(prob))
return(p)
}
#examples
d.diff.bin(0, 30, .50)

## [1] 0.1025782

p.diff.bin(0, 30, .50)

## [1] 0.5512891

#mean of distribution:
N = 30
m = sum(sapply(-N:N, d.diff.bin, N = 30, p= .50)*(-N:N))
m

## [1] -3.084642e-20

all.equal(0, m)

## [1] TRUE

#(un)equal N and p:
#probability mass:
d.diff.bin.un = function(k, N.1, N.2, p.1, p.2) {
diff = outer(0:N.1, 0:N.2, "-")
prob = outer(dbinom(0:N.1, N.1, p.1, log=TRUE),
dbinom(0:N.2, N.2, p.2, log=TRUE), "+")
p = sum(exp(prob))
return(p)
}

#cumulative distribion:
p.diff.bin.un = function(k, N.1, N.2, p.1, p.2) {
diff = outer(0:N.1, 0:N.2, "-")
prob = outer(dbinom(0:N.1, N.1, p.1, log=TRUE),
dbinom(0:N.2, N.2, p.2, log=TRUE), "+")
p = sum(exp(prob))
return(p)
}

#examples
d.diff.bin.un(0, 30, 20, .60, .70)

## [1] 0.05896135

p.diff.bin.un(0, 30, 20, .60, .70)

## [1] 0.1497043


### A rule of thumb for setting target MOE

One of the most difficult aspects of sample size planning for precision is the specification of a target Margin of Error (MoE). Here, I would like to introduce a simple rule of thumb, in the hope that it helps you in determining a reasonable target MoE.
Here, the rule of thumb is applied to obtaining an estimate of the difference between two independent group means, where the two populations are normally distributed with equal variances.

## Goal 1: Assessing the direction of an effect

Sample size planning starts with formulating a goal for the research. A very common goal is to try to determine the direction of an effect. For the goal of assessing the direction of an effect, it helps if the confidence interval of the difference contains only positive or negative values. That is, you want a confidence interval that exludes the value 0, for if that value is included, you would probably conclude that the estimate is consistent with both positive and negative effects. Thus, our first goal is to obtain a confidence interval of the mean difference that excludes the value 0.

Now, a confidence interval excludes 0, if obtained MOE is at most equal to the obtained effect size estimate. Suppose that the estimate equals the true effect of say, 0.50, we want MOE to be at most very close to 0.50, otherwise 0 will be included in the interval. But if our estimate underestimates the true effect, say the estimate equals 0.30, we want MOE to be at most very close to 0.30. Likewise, if we overestimate the effect, MOE can be larger than 0.50.

This means that we cannot say, for instance, we expect that the true effect is .50, so let’s plan for a target MOE that with 80% assurance is at most .50, because this target MOE may be too large for underestimates of the true effect, depending on the extent to which the effect is underestimated. So, in specifying target MOE, we should take into account that underestimates of the effect size occur. (Actually, these underestimates occur with a relative frequency of 50% in a huge collection of direct replications). We can say that we do not only want to exclude zero from the interval, but also that we want that to occur in a large proportion of direct replications. This will be our second goal. I will call the probabiity associated with our second goal, the probability of exclusion (PE)

The rule of thumb is that if we want 80% probability that a random confidence interval excludes zero, we should plan for an expected MOE equal to f = d / √2. (the square root sign is unreadable in my browser; so in words: the effect size divided by the square root of 2; with mathjax: $f = d / sqrt{2}$). Since there is 50% probability that obtained MOE will be larger than expected MOE, this is equal to planning for target MOE = f = d / √2, with 50% assurance or simply without assurance. You can do this in the ESCI-software, but also with the R-functions provided below.

The first example in the code below, is an illustration of planning for assessing the direction of the effect, with true effect size d = .50. If we want 80% assurance to have only positive values in our confidence interval, we should plan for a target MoE = expected MoE = f = d / √2 = 0.3535. Using the SampleSize-function below, this gives a sample size n = 63, or total sample size = N = 2*63 = 126. The probability that the confidence interval excludes 0 equals approximately 80% (p = 0.7951). So, the rule of thumb of planning for d / √2, seems to work pretty good.

## Goal 2: distinguishing between effect sizes

If your research goal is to estimate the value of the effect size in stead of its direction, the rule of thumb can be used as follows. Suppose we do not know the true effect size, but want to have 80% assurance that we have a high probability to be able to distinguish between small (d = .20) and large effects (d = .80). That is, if the true effect is .20 we want the value .80 to be excluded from the confidence interval and if the true effect is .80, we want the value .20 to be excluded from the confidence interval.

We can proceed as follows, the difference between the effect sizes is .80 – .20 = .60. We use this value to determine target MOE. Thus, if we now plan for a target MoE = expected Moe = d / √2), we should have approximately 80% PE that obtained MoE will exclude 0.80 if the true effect is 0.20 and vice versa. The functions below give sample size n = 44, and the probablity of exclusion equals .7947. So, our rule of thumb, seems to work pretty good again. See example 2 in the code below.

Alternatively, we could take the region of practical equivalence (ROPE) into account. Suppose, our equivalence range equals .10 sigma. If we want to have enough precision to distinguish large from small effects, we should plan as follows. We take the difference between a large effect and the upper equivalence value of a small effect or, equivalently, the difference between a small effect and the lower equivalence vaue of a large effect, i.e. .50, and plan for f = .50 / √2. If the effect is large we expect a confidence interval that excludes the equivalence range for the small effect (and vice versa), with 80% probability of exclusion.

But we could also take the difference between the lower equivalence value of a large effect and the upper equivalence value of a small effect, i.e. .40, and plan for f = .40/√2. (See the third example in the code below) This will give us 80% PE that any true value within the ROPE of the one effect will exclude values in the ROPE of the other. For example, if the true effect is .70, and expected MOE equals .40/√2 = .2828, there is approximately 80% probability that the 95% CI excludes .30, which is in the ROPE of a small effect. The expected CI will be .70 +/- .2828 = [0.4172, 0.9828]. Note that the lower limit is larger than the upper limit of the ROPE for d = .20, as we want it to be. Note, however, that if the true effect is small (d = .20), the CI will exclude effects equivalent to large effects, which is consistent with our research goal, but it will not exlude the value 0 or effects equivalent to a medium effect. Indeed, the expected CI will be [-0.0828, 0.4828]. (This is not a problem, of course, since this was not the purpose of our research)

As a final example, suppose we want sufficient precision to distinguish small from medium effects (or large from medium effects). If we take the ROPE perspective, with an equivalence range of +/- .10 sigma, the lower equivalence value of the medium effect equals .50 – .10 = .40 and the upper limit of the small effect equals .30. If we want 80% assurance that the CI will be small enough to distinguish small from medium effects, we should plan for expected MOE f = (.40 – .30)/√2 = 0.0707. Using the functions below, this requires a sample size n = 1538. (See the final example in the code below).

## Setting target MOE: conclusion

In summary, the rule of thumb is to divide the effect size d by √2 and plan for an expected MoE equal to this value. This will give you a sample size that gives approximately 80% assurance that the CI will not contain 0. In the case of distinguishing effect sizes, one option is to divide the difference between the lower equivalence value of the larger effect and the upper equivalence value of the smaller effect by the square root of 2 and plan for an expected MoE equal to this value. This will give you a sample size that gives approximately 80% PE that the CI of the estimated true value of one effect excludes the values in the ROPE of the other effect.

Do you want at least 90% PE? Use the square root of three, in stead of the square root of two, in determining target MoE.

eMoe = function(n) {
eMoe = qt(.975, 2*(n - 1))*sqrt(2/n)
return(eMoe)
}

cost <- function(n, tMoe) {
(eMoe(n) - tMoe)^2
}

sampleSize <- function(tMoe) {
PS <- TukeyHSD(mod)$gr[,4] RejectHSD[i] = sum(PS <=.05) } #probability type I error F-test sum(Reject)/length(Reject)  ## [1] 0.0515  #probability at least one type I error Tukey HSD sum(RejectHSD > 0) / length(RejectHSD)  ## [1] 0.0503  #probability at least one type I error given F-tests Rejects sum(RejectHSD[Reject==TRUE] > 0) / length(RejectHSD)  ## [1] 0.0424  Even though a single (relatively tiny) simulation (which, by the way, takes a long time to run, nonetheless), is not necessarily convincing, it does illustrate the main points of this post. First, the probability of at least one incorrect rejection using the TukeyHSD function is close to .05. With this particular random seed it even performs a little better than the ANOVA F-test: .0503 versus .0515. This illustrates that even without considering whether the omnibus test is significant the main demand of not rejecting too many true null-hypotheses is completely satisfied. So, in practical terms, you can safely ignore the omnibus test if your concerns are about . Second, the probability of incorrectly rejecting at least one true pair-wise null-hypothesis after the ANOVA F-test is significant is estimated to be .0424. This shows, that the two-step procedure leads to a larger decrease in the actual type I error probability than is wanted. Even though this may seem good news from the perspective of avoiding type I errors, the down side is that pair wise null-hypotheses that are false (and potentially important) may not be detected. ## Conclusion Common wisdom and practice suggest that multiple comparisons procedures should be done only after a significant omnibus test. We have seen that this is not at all necessary if we use a multiple comparisons procedure that is designed to control the type I error probability. To my knowledge, most of the procedures conventionally thought of as post hoc tests are designed in this manner, the exception being the LSD procedure which does require a significant F-test. For practical purposes, then, do not bother with the omnibus test (note the exception) if you are planning to pair wise compare all the treatment means. This practical advice does not mean, of course, that I am suggesting you spend your time comparing all treatment means. Most of the time, focused comparisons are a more fruitful way of analysing your data. But I’ll leave that topic for another time. ### References Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. 4th Edition. London: Sage. Wilcox, R. (2017). Understanding and Applying Basic Statistical Methods Using R. Hoboken, NJ: Wiley, ### Planning for a precise slope estimate in simple regression In this post, I will show you a way of determining a sample size for obtaining a precise estimate of the slope of the simple linear regression equation . The basic ingredients we need for sample size planning are a measure of the precision, a way to determine the quantiles of the sampling distribution of our measure of precision, and a way to calculate sample sizes. As our measure of precision we choose the Margin of Error (MOE), which is the half-width of the 95% confidence interval of our estimate (see: Cumming, 2012; Cumming & Calin-Jageman, 2017; see also www.thenewstatistics.com). ### The distribution of the margin of error of the regression slope In the case of simple linear regression, assuming normality and homogeneity of variance, MOE is , where , is the .975 quantile of the central t-distribution with degrees of freedom, and is the standard error of the estimate of . An expression of the squared standard error of the estimate of is (Wilcox, 2017): the variance of Y given X divided by the sum of squared errors of X. The variance equals , the variance of Y multiplied by 1 minus the squared population correlation between Y and X, and it is estimated with the residual variance , where . The estimated squared standard error is given in (1) (1) With respect to the sampling distribution of MOE, we first note the following. The distribution of estimates of the residual variance in the numerator of (1) is a scaled -distribution: thus Second, we note that where , therefore Alternatively, since , and multiplying by 1 (). In terms of the sampling distribution of (1), then, we have the ratio of two (scaled) distributions, one with degrees of freedom, and one with degrees of freedom. Or something like: which means that the sampling distribution of MOE is: (2) This last equation, that is (2), can be used to obtain quantiles of the sampling distribution of MOE, which enables us to determine assurance MOE, that is the value of MOE that under repeated sampling will not exceed a target value with a given probability. For instance, if we want to know the .80 quantile of estimates of MOE, that is, assurance is .80, we determine the .80 quantile of the (central) F-distribution with N – 2 and N – 1 degrees of freedom and fill in (2) to obtain a value of MOE that will not be exceeded in 80% of replication experiments. For instance, suppose , , , , and assurance is .80, then according to (2), 80% of estimated MOEs will not exceed the value given by: vary = 1 varx = 1 rho = .5 N = 100 dfe = N - 2 dfx - N - 1 assu = .80 t = qt(.975, dfe) MOE.80 = t*sqrt(vary*(1 - rho^2)*qf(.80, dfe, dfx)/(dfx*varx)) MOE.80  ## [1] 0.1880535  ### What does a quick simulation study tell us? A quick simulation study may be used to check whether this is at all accurate. And, yes, the estimated quantile from the simulation study is pretty close to what we would expect based on (2). If you run the code below, the estimate equals 0.1878628. library(MASS) set.seed(355) m = c(0, 0) # note: s below is the variance-covariance matrix. In this case, # rho and the cov(y, x) have the same values # otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the # functions that calculate MOE) # equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used # in the specification of the variance-covariance matrix for #generating bivariate normal variates) s = matrix(c(1, .5, .5, 1), 2, 2) se <- rep(10000, 0) for (i in 1:10000) { theData <- mvrnorm(100, m, s) mod <- lm(theData[,1] ~ theData[,2]) se[i] <- summary(mod)$coefficients[4]
}
MOE = qt(.975, 98)*se
quantile(MOE, .80)

##       80%
## 0.1878628


### Planning for precision

If we want to plan for precision we can do the following. We start by making a function that calculates the assurance quantile of the sampling distribution of MOE described in (2). Then we formulate a  squared cost function, which we will optimize for the sample sizeusing the optimize function in R.
Suppose we want to plan for a target MOE of .10 with 80% assurance.We may do the following.
vary = 1
varx = 1
rho = .5
assu = .80
tMOE = .10

MOE.assu = function(n, vary, varx, rho, assu) {
varY.X = vary*(1 - rho^2)
dfe = n - 2
dfx = n - 1
t = qt(.975, dfe)
q.assu = qf(assu, dfe, dfx)
MOE = t*sqrt(varY.X*q.assu/(dfx * varx))
return(MOE)
}

cost = function(x, tMOE) {
cost = (MOE.assu(x, vary=vary, varx=varx, rho=rho, assu=assu)
- tMOE)^2
}

#note samplesize is at least 40, at most 5000.
#note that since we already know that N = 100 is not enough
#in stead of 40 we might just as well set N = 100 at the lower
#limit of the interval
(samplesize = ceiling(optimize(cost, interval=c(40, 5000),
tMOE = tMOE)$minimum))  ## [1] 321  #check the result: MOE.assu(samplesize, vary, varx, rho, assu)  ## [1] 0.09984381  ### Let’s simulate with the proposed sample size Let’s check it with a simulation study. The value of estimated .80 of estimates of MOE is 0.1007269 (if you run the below code with random seed 335), which is pretty close to what we would expect based on (2). set.seed(355) m = c(0, 0) # note: s below is the variance-covariance matrix. In this case, # rho and the cov(y, x) have the same values # otherwise: rho = cov(x, y)/sqrt(varY*VarX) (to be used in the # functions that calculate MOE) # equivalently, cov(x, y) = rho*sqrt(varY*varX) (to be used # in the specification of the variance-covariance matrix for # generating bivariate normal variates) s = matrix(c(1, .5, .5, 1), 2, 2) se <- rep(10000, 0) samplesize = 321 for (i in 1:10000) { theData <- mvrnorm(samplesize, m, s) mod <- lm(theData[,1] ~ theData[,2]) se[i] <- summary(mod)$coefficients[4]
}
MOE = qt(.975, 98)*se
quantile(MOE, .80)

##       80%
## 0.1007269


### References

Cumming, G. (2012). Understanding the New Statistics. Effect Sizes, Confidence Intervals, and Meta-Analysis. New York: Routledge
Cumming, G., & Calin-Jageman, R. (2017). Introduction to the New Statistics: Estimation, Open Science, and Beyond. New York: Routledge.
Wilcox, R. (2017). Understanding and Applying Basic Statistical Methods using R. Hoboken, New Jersey: John Wiley and Sons.