## Contrast Analysis with R for factorial designs: A Tutorial

In this post, I want to show how to do contrast analysis with R for factorial designs. We focus on a 2-way between subjects design. A tutorial for factorial within-subjects designs can be found here: https://small-s.science/2019/01/contrast-analysis-with-r-repeated-measures/ . A tutorial for mixed designs (combining within and between subjects factors can be found here: https://small-s.science/2019/04/contrast-analysis-with-r-mixed-design/.

I want to show how we can use R for contrast analysis of an interaction effect in a 2 x 4 between subjects design. The analysis onsiders the effect of students’ seating distance from the teacher and the educational performance of the students: the closer to the teacher the student is seated, the higher the performance. A “theory “explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall.

To test that theory, a experiment was conducted with N = 72 participants attending a lecture. The lecture was given to two independent groups of 36 participants. The first group attended the lecture while the teacher was wearing dark sunglasses, the second group attented the lecture while the teacher was not wearing sunglasses. All participants were randomly assigned to 1 of 4 possible rows, with row 1 being closest to the teacher and row 4 the furthest from the teacher The dependent variable was the score on a 10-item questionnaire about the contents of the lecture. So, we have a 2 by 4 factorial design, with n = 9 participants in each combination of the factor levels.

Here we focus on obtaining an interaction contrast: we will estimate the extent to which the difference between the mean retention score of the participants on the first row and those on the other rows differs between the conditions with and without sunglasses.

## The interaction contrast with SPSS

I’ve downloaded a dataset from the supplementary materials accompanying Haans (2018) from http://pareonline.net/sup/v23n9.zip (Between2by4data.sav) and I ran the following syntax in SPSS:

UNIANOVA retention BY sunglasses location
/LMATRIX = "Interaction contrast"
sunglasses*location 1 -1/3 -1/3 -1/3 -1 1/3 1/3 1/3 intercept 0
/DESIGN= sunglasses*location.

Table 1 is the relevant part of the output.

So, the estimate of the interaction contrasts equals 1.00, 95% CI [-0.332, 2.332]. (See this post for optimizing the sample size to get a more precise estimate than this).

## Contrast analysis with R for factorial designs

Let’s see how we can get the same results with R.

library(MASS)
library(foreign)

theData <- as.data.frame(theData)

attach(theData)

# setting contrasts
contrasts(sunglasses) <- ginv(rbind(c(1, -1)))
contrasts(location)  <- ginv(rbind(c(1, -1/3, -1/3, -1/3),
c(0, 1, -1/2, -1/2), c(0, 0, 1, -1)))

# fitting model

myMod <- lm(retention ~ sunglasses*location)

The code above achieves the following. First the relevant packages are loaded. The MASS package provides the function ginv, which we need to specify custom contrasts and the Foreign package contains the function read.spss, which enables R to read SPSS .sav datafiles.

Getting custom contrast estimates involves calculating the generalized inverse of the contrast matrices for the two factors. Each contrast is specified on the rows of these contrast matrices. For instance, the contrast matrix for the factor location, which has 4 levels, consists of 3 rows and 4 columns. In the above code, the matrix is specified with the function rbind, which basically says that the three contrast weight vectors c(1, -1/3, -1/3, -1/3), c(0, 1, -1/2, -1/2), c(0, 0, 1, -1) form the three rows of the contrast matrix that we use as an argument of the ginv function. (Note that the set of contrasts consists of orthogonal Helmert contrasts).

The last call is our call to the lm-function which estimates the contrasts. Let’s have a look at these estimates.

summary(myMod)
##
## Call:
## lm(formula = retention ~ sunglasses * location)
##
## Residuals:
##    Min     1Q Median     3Q    Max
##     -2     -1      0      1      2
##
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)             5.3750     0.1443  37.239  < 2e-16 ***
## sunglasses1             1.2500     0.2887   4.330 5.35e-05 ***
## location1               2.1667     0.3333   6.500 1.39e-08 ***
## location2               1.0000     0.3536   2.828  0.00624 **
## location3               2.0000     0.4082   4.899 6.88e-06 ***
## sunglasses1:location1   1.0000     0.6667   1.500  0.13853
## sunglasses1:location2   3.0000     0.7071   4.243 7.26e-05 ***
## sunglasses1:location3   2.0000     0.8165   2.449  0.01705 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.225 on 64 degrees of freedom
## Multiple R-squared:  0.6508, Adjusted R-squared:  0.6126
## F-statistic: 17.04 on 7 and 64 DF,  p-value: 1.7e-12

For the present purposes, we will consider the estimate of the first interaction contrast, which estimates the difference between the means of the first and  the other rows between the with and without sunglasses conditions. So, we will have to look at the sunglasses1:location1 row of the output.

Unsurprisingly, the estimate of the contrast and its standard error are the same as in the SPSS ouput in Table 1. The estimate equals 1.00 and the standard error equals 0.6667.

Note that the residual degrees of freedom equal 64. This is equal to the product of the number of levels of each factor, 2 and 4, and the number of participants (9) per combination of the levels minus 1: df =  2*4*(9 – 1) = 64. We will use these degrees of freedom to obtain a confidence interval of the estimate.

We will calculate the confidence interval by first extracting the contrast estimate and the standard error,  after which we multiply the standard error by the critical value of t with df = 64 and add the result to and substract it from the contrast estimate:

estimate = myMod\$coefficients["sunglasses1:location1"]

se = sqrt(diag(vcov(myMod)))["sunglasses1:location1"]

df = 2*4*(9 - 1)

# confidence interval

estimate + c(-qt(.975, df), qt(.975, df))*se
## [1] -0.3318198  2.3318198

Clearly, we have replicated all the estimation results presented in Table 1.

Reference
Haans, Antal (2018). Contrast Analysis: A Tutorial. Practical AssessmentResearch& Education, 23(9). Available online: http://pareonline.net/getvn.asp?v=23&n=9

## Planning for precise contrast estimates in between subjects designs

Here I would like to explain the procedure for sample size planning for one-way and two-way (factorial) between subjects designs. We will consider examples based on and described in Haans (2018).

## The first example: one-way design

The first example considers the effect of seating location  of students on their educational performance. Seating location is defined as distance from the teacher and operationalized in terms of the row the student is seated in, with first row being the closest to the teacher and the fourth row being the furthest away. 20 Students are randomly assigned to one of the four possible rows, so N = 20, n = 5. The dependent variable is the course grade of the student. (Note: the data and study are hypothetical).

As Haans (2018) explains, one psychological theory explaining the effect of seating position on educational performance is based on social influence. This theory posits that due to the social influence of the teacher, the students that are seated closest to the teacher find themselves in a state of undivided attention. This undivided attention causes their educational performance to be better than the students who are seated further away.

In operational terms, then, we may expect that first row students will have a better average grade than students seated on the other rows. So, the quantitative research question we are interested in is:

“How much do the average grades differ between students seated first row and the students seated on other rows?”

We can estimate this quantity with a Helmert Contrast, where we assign a contrast weight of 1 to mean of the first row grades and weights -1/3 to the means of the grades in the other rows.

Haans (2018) gives us the following results. The contrast estimate equals 2.00 , 95% CI [0.27, 3.73]. In order to interpret this more easily, we divide this estimate by the square root Mean Square Error, to obtain the standardized estimate and standardized confidence interval (not to be confused with the confidence interval of the standardized estimate, but that’s a different story. The result is: 1.26, 95% CI [0.17, 2.36].

To answer the research question, the estimated difference equals 1.26 standard deviations, which according to rule-of-thumbs frequently used in psychology is a large difference. The CI shows the enormous amount of uncertainty of this estimate: population values between 0.17 (small) and 2.36 (very large) are also consistent with the observed data and our statistical assumptions. So, it seems safe to conclude that it looks like there is a positive effect of seating position, but the wide range of the CI makes it clear that the data do not tell us enough about the size of the effect, the precision is simply too low.

The precision is f = 1.09, which according to my rules-of-thumb is very imprecise (I consider f = 0.65, to be barely tolerable).

So, let’s plan for a replication study with a reasonably precise estimate of  f = 0.40, with 80% assurance. (Note: for some advice on setting target Moe: Planning with assurance, with assurance. ) I’ve used the app: https://gmulder.shinyapps.io/PlanningFactorialContrasts/ with the default values for a single factor between subjects design with 4 conditions.  According to the app, we need n = 36 participants per condition (making a total of  N = 144).

(For more detailed information considering sample size planning for contrast analysis see: http://small-s.science/?p=10 and for some guidelines for setting target MoE: http://small-s.science/?p=14)

## The second example: factorial design

Our second example is also taken from Haans (2018). It considered the same phenomenon, the effect of students’ seating distance from the teacher and the educational performance of the students.

A second theory explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall.

To test that theory, a experiment was conducted with N = 72 participants attending a lecture. The lecture was given to two independent groups of 36 pariticpant. The first group attended the lecture while the teacher was wearing dark sunglasses, the second group attented the lecture while the teacher was not wearing sunglasses,. Again, all participants were randomly assigned to 1 of 4 possible rows. The dependent variable was the score on a 10-item questionnaire about the contents of the lecture.

Now, if the eyecontact of the teachter is the causal variable, we may expect that in this experimental setup the difference between the average score of the persons seated on the first row and the averages of the other rows will be smaller for the condition where the teacher wears sunglasses than for the condition in which the teacher does not wear these glasses, as wearing sunglasses prevents eye-contact between the teacher and the students. Our quantitative question is therefore:

“How much does the contrast between the first row and the others rows differ between the conditions with and without sunglasses?”

In other words, we are interested in the size of the interaction effect.

I’ve downloaded the dataset from http://pareonline.net/sup/v23n9.zip (between2by4data.sav) and specified the following syntax in SPSS:

UNIANOVA retention BY sunglasses location
/LMATRIX = “Interaction contrast” sunglasses*location 1 -1/3 -1/3 – 1/3 -1 1/3 1/3 1/3 intercept 0
/DESIGN= sunglasses*location.

The result of the analysis is that the contrast estimate equals 1.0, 95% CI [-0.33, 2.33]. If we standardize this with the within condition variance (the condition being the combination of the levels of the two factors), we get 0.82, 95% CI [-0.27, 1.90].

So, it appears that the difference between the means of the first row and that of the other rows is on average 1.0 points larger in the condition without sunglasses than in the condition with sunglasses. This corresponds to a large difference (dwith = .82). However, the CI also contains negative population difference (albeit that they are smallish), so even though the results are promising for the theory (eyecontact), these negative effects will not persuade a critical reviewer of the study. Indeed, these negative effects contradict the substantive hypothesis.

Again, the confidence interval is so wide, that effects ranging from small negative effects to huge positive effects are considered plausible. Since the results are promising for the theory, a replication study with more precision may be needed to persuade the critics. Let’s plan for a precision of f = .25 with 95% assurance.

I’ve used the app: https://gmulder.shinyapps.io/PlanningFactorialContrasts/ specifying that we have a factorial design with a = 2 levels and b = 4 levels. The result is that for the interaction contrast  with f = .25 and assurance = .95, we need 175 participants per combination of the two factors. This means, that a total of N = 1400 must be recruited.

I’ve taken this from the following output.

 Figure 1: Output of sample size planning

I’ve looked at the  “Contrast Summary Tab” to check that interaction A1B1 is the correct one (see Figure 2).

 Figure 2. Summary of contrast weights.

What’s important in the above figure is that the set of weights for A1B1 matches the set of weights used to get the contrast estimate in SPSS (In the LMATRIX-subcommand), so that’s how we know that A1B1 is the contrast we want.  (Note: if you switch the number of levels in the app, that is, use 4 levels for A and 2 for B, the interaction weights will match perfectly).

Reference
Haans, Antal (2018). Contrast Analysis: A Tutorial. Practical Assessment, Research, & Education, 23(9). Available online: http://pareonline.net/getvn.asp?v=23&n=9