## Linear Trend Analysis with R and SPSS

This is an introduction to contrast analysis for estimating the  linear trend among condition means with R and SPSS . The tutorial focuses on obtaining point and confidence intervals.  The contents of this introduction is based on Maxwell, Delaney, and Kelley (2017) and Rosenthal, Rosnow, and Rubin (2000). I have taken the (invented) data from Haans (2018). The estimation perspective to statistical analysis is aimed at obtaining point and interval estimates of effect sizes. Here, I will use the frequentist perspective of obtaining a point estimate and a 95% Confidence Interval of the relevant effect size. For linear trend analysis, the relevant effect size is the slope coefficient of the linear trend, so, the purpose of the analysis is to estimate the value of the slope and the 95% confidence interval of the estimate. We will use contrast analysis to obtain the relevant data.

[Note: A pdf-file that differs only slightly from this blogpost can be found on my Researchgate page: here; I suggest Haans (2018) for an easy to follow introduction to contrast analysis, which should really help understanding what is being said below].

The references cited above are clear about how to construct contrast coefficients (lambda coefficients) for linear trends (and non-linear trends for that matter) that can be used to perform a significance test for the null-hypothesis that the slope equals zero. Maxwell, Delaney, and Kelley (2017) describe how to obtain a confidence interval for the slope and make clear that to obtain interpretable results from the software we use, we should consider how the linear trend contrast values are scaled. That is, standard software (like SPSS) gives us a point estimate and a confidence interval for the contrast estimate, but depending on how the coefficients are scaled, these estimates are not necessarily interpretable in terms of the slope of the linear trend, as I will make clear
momentarily.

So our goal of the data-analysis is to obtain a point and interval estimate of the slope of the linear trend and the purpose of this contribution is to show how to obtain output that is interpretable as such.

## Contrast analysis with R: Tutorial for factorial mixed designs

In this tutorial I will show how contrast estimates can be obtained with R. Previous posts focused on the analyses in factorial between and within designs, now I will focus on a mixed design with one between participants factor and one within participants factor. I will discuss how to obtain an estimate of an interaction contrast using a dataset provided by Haans (2018).

I will illustrate two approaches, the first approach is to use transformed scores in combination with one-sample t-tests, and the other approach uses the univariate mixed model approach. As was explained in the previous tutorial, the first approach tests each contrast against it’s own error variance, whereas in the mixed model approach a common error variance is used (which requires statistical assumptions that will probably not apply in practice; the advantage of the mixed model approach, if its assumptions apply,  is that the Margin of Error of the contrast estimate is somewhat smaller).

Again, our example is taken from Haans (2018; see also this post. It considers the effect of students’ seating distance from the teacher and the educational performance of the students: the closer to the teacher the student is seated, the higher the performance. A “theory “explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall. To test that theory, a experiment was conducted with N = 9 participants in a factorial mixed design (also called a split-plot design), with two fixed factors: the between participants Sunglasses (with or without), and the within participants factor Location (row 1 through row 4). The dependent variable was the score on a 10-item questionnaire about the contents of the lecture. So, we have a 2 by 4  mixed factorial design, with n = 9 participants in each combination of the factor levels.

We will again focus on obtaining an interaction contrast: we will estimate the extent to which the difference between the mean retention score on the first row and those on the other rows differs between the conditions with and without sunglasses.

## Contrast Analysis for Within Subjects Designs with R: a Tutorial.

In this post, I illustrate how to do contrast analysis for within subjects designs with  R.  A within subjects design is also called a repeated measures design.  I will illustrate two approaches. The first is to simply use the one-sample t-test on the transformed scores. This will replicate a contrast analysis done with SPSS GLM Repeated Measures. The second is to make use of mixed linear effects modeling with the lmer-function from the lme4 library.

Conceptually, the major difference between the two approaches is that in the latter approach we make use of a single shared error variance and covariance across conditions (we assume compound symmetry), whereas in the former each contrast has a separate error variance, depending on the specific conditions involved in the contrast (these conditions may have unequal variances and covariances).

As in the previous post (https://small-s.science/2018/12/contrast-analysis-with-r-tutorial/), we will focus our attention on obtaining an interaction contrast estimate.

Again, our example is taken from Haans (2018; see also this post). It considers the effect of students’ seating distance from the teacher and the educational performance of the students: the closer to the teacher the student is seated, the higher the performance. A “theory “explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall.

To test that theory, a experiment was conducted with N = 9 participants in a completely within-subjects-design (also called a fully-crossed design), with two fixed factors: sunglasses (with or without) and location (row 1 through row 4). The dependent variable was the score on a 10-item questionnaire about the contents of the lecture. So, we have a 2 by 4 within-subjects-design, with n = 9 participants in each combination of the factor levels.

We will again focus on obtaining an interaction contrast: we will estimate the extent to which the difference between the mean retention score on the first row and those on the other rows differs between the conditions with and without sunglasses.

## Contrast Analysis with R for factorial designs: A Tutorial

In this post, I want to show how to do contrast analysis with R for factorial designs. We focus on a 2-way between subjects design. A tutorial for factorial within-subjects designs can be found here: https://small-s.science/2019/01/contrast-analysis-with-r-repeated-measures/ . A tutorial for mixed designs (combining within and between subjects factors can be found here: https://small-s.science/2019/04/contrast-analysis-with-r-mixed-design/.

I want to show how we can use R for contrast analysis of an interaction effect in a 2 x 4 between subjects design. The analysis onsiders the effect of students’ seating distance from the teacher and the educational performance of the students: the closer to the teacher the student is seated, the higher the performance. A “theory “explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall.

To test that theory, a experiment was conducted with N = 72 participants attending a lecture. The lecture was given to two independent groups of 36 participants. The first group attended the lecture while the teacher was wearing dark sunglasses, the second group attented the lecture while the teacher was not wearing sunglasses. All participants were randomly assigned to 1 of 4 possible rows, with row 1 being closest to the teacher and row 4 the furthest from the teacher The dependent variable was the score on a 10-item questionnaire about the contents of the lecture. So, we have a 2 by 4 factorial design, with n = 9 participants in each combination of the factor levels.

Here we focus on obtaining an interaction contrast: we will estimate the extent to which the difference between the mean retention score of the participants on the first row and those on the other rows differs between the conditions with and without sunglasses.

## The interaction contrast with SPSS

I’ve downloaded a dataset from the supplementary materials accompanying Haans (2018) from http://pareonline.net/sup/v23n9.zip (Between2by4data.sav) and I ran the following syntax in SPSS:

UNIANOVA retention BY sunglasses location
/LMATRIX = "Interaction contrast"
sunglasses*location 1 -1/3 -1/3 -1/3 -1 1/3 1/3 1/3 intercept 0
/DESIGN= sunglasses*location.

Table 1 is the relevant part of the output.

So, the estimate of the interaction contrasts equals 1.00, 95% CI [-0.332, 2.332]. (See this post for optimizing the sample size to get a more precise estimate than this).

## Contrast analysis with R for factorial designs

Let’s see how we can get the same results with R.

library(MASS)
library(foreign)

theData <- as.data.frame(theData)

attach(theData)

# setting contrasts
contrasts(sunglasses) <- ginv(rbind(c(1, -1)))
contrasts(location)  <- ginv(rbind(c(1, -1/3, -1/3, -1/3),
c(0, 1, -1/2, -1/2), c(0, 0, 1, -1)))

# fitting model

myMod <- lm(retention ~ sunglasses*location)

The code above achieves the following. First the relevant packages are loaded. The MASS package provides the function ginv, which we need to specify custom contrasts and the Foreign package contains the function read.spss, which enables R to read SPSS .sav datafiles.

Getting custom contrast estimates involves calculating the generalized inverse of the contrast matrices for the two factors. Each contrast is specified on the rows of these contrast matrices. For instance, the contrast matrix for the factor location, which has 4 levels, consists of 3 rows and 4 columns. In the above code, the matrix is specified with the function rbind, which basically says that the three contrast weight vectors c(1, -1/3, -1/3, -1/3), c(0, 1, -1/2, -1/2), c(0, 0, 1, -1) form the three rows of the contrast matrix that we use as an argument of the ginv function. (Note that the set of contrasts consists of orthogonal Helmert contrasts).

The last call is our call to the lm-function which estimates the contrasts. Let’s have a look at these estimates.

summary(myMod)
##
## Call:
## lm(formula = retention ~ sunglasses * location)
##
## Residuals:
##    Min     1Q Median     3Q    Max
##     -2     -1      0      1      2
##
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)             5.3750     0.1443  37.239  < 2e-16 ***
## sunglasses1             1.2500     0.2887   4.330 5.35e-05 ***
## location1               2.1667     0.3333   6.500 1.39e-08 ***
## location2               1.0000     0.3536   2.828  0.00624 **
## location3               2.0000     0.4082   4.899 6.88e-06 ***
## sunglasses1:location1   1.0000     0.6667   1.500  0.13853
## sunglasses1:location2   3.0000     0.7071   4.243 7.26e-05 ***
## sunglasses1:location3   2.0000     0.8165   2.449  0.01705 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.225 on 64 degrees of freedom
## Multiple R-squared:  0.6508, Adjusted R-squared:  0.6126
## F-statistic: 17.04 on 7 and 64 DF,  p-value: 1.7e-12

For the present purposes, we will consider the estimate of the first interaction contrast, which estimates the difference between the means of the first and  the other rows between the with and without sunglasses conditions. So, we will have to look at the sunglasses1:location1 row of the output.

Unsurprisingly, the estimate of the contrast and its standard error are the same as in the SPSS ouput in Table 1. The estimate equals 1.00 and the standard error equals 0.6667.

Note that the residual degrees of freedom equal 64. This is equal to the product of the number of levels of each factor, 2 and 4, and the number of participants (9) per combination of the levels minus 1: df =  2*4*(9 – 1) = 64. We will use these degrees of freedom to obtain a confidence interval of the estimate.

We will calculate the confidence interval by first extracting the contrast estimate and the standard error,  after which we multiply the standard error by the critical value of t with df = 64 and add the result to and substract it from the contrast estimate:

## [1] 246.4563
##
## $objective ## [1] 8.591375e-18  Thus, according to the optimize function we need 247 participants (per group; total N = 988), to get an expected MOE equal to our target MOE. The expected MOE equals 0.4553, which you can confirm by using the MOE function we made above. #### Planning with assurance Although expected MOE is close to our target MOE, there is a probability 50% that the obtained MOE will be larger than our target MOE. In other words, repeated sampling will lead to obtained MOEs larger than what we want. That is to say, we have 50% assurance that our obtained MOE will be at least as small as our target MOE. Planning with assurance means that we aim for a certain specified assurance that our obtained MOE will not exceed our target MOE. For instance, we may want to have 80% assurance that our obtained MOE will not exceed our target MOE. Basically, what we need to do is take the sampling distribution of the estimate of Mean Square Error into account. We use the following formula (see also my post introducing the Precision App for the general formulae: https://the-small-s-scientist.blogspot.nl/2017/04/planning-for-precision.html). where is the assurance expressed in a probability between 0 and 1. Let’s do it in R. Again, the function that calculates assurance MOE is tailored for the specific situation, but it is relatively easy to formulate these functions in a generally applicable way, MOE.gamma = function(n) { df = 4*(n-1) MOE = 2*qt(.975, df)*sqrt(3.324/n*qchisq(.80, df)/df) } loss <- function(n) { (MOE.gamma(n) - 0.4558)^2 } optimize(loss, c(100, 1000))  ##$minimum
## [1] 255.576
##
## \$objective
## [1] 2.900716e-18


Thus, according to the results, we need 256 persons per group (N = 1024 in total) to have a 80% probability of obtaining a MOE not larger than our target MOE. In that case, our expected MOE will be 0.4472.