Continue reading "The independent and paired T-test in jamovi"

The post The independent and paired T-test in jamovi appeared first on The small S scientist.

]]>I have downloaded the dataset from the Mathematics and Statistics Help (MASH) site. The dataset contains data on 78 persons following one of three diets. I will use the dataset to show you how to estimate the difference between two means with the independent t-test analysis and the dependent t-test analysis in jamovi. I will ignore the different diets and focus on gender differences and pre- and post-weight differences instead.

I will focus on two substantive research questions.

- To what extent does following a diet lead to weight loss?
- To what extent does the weight lost differ between females and males?

Let’s start with the first research question. Statistical analysis can be useful for answering the research question, but then we will first have to translate the substantive question into a statistical one. Now, it seems reasonable to assume that if following a diet leads to weight loss, that the typical weight before the diet differs from the typical weight after following the diet. If we assume furthermore that the mean provides a good representation of what is typical, then it is plausible that the mean weight before the diet differs from the mean weight after the diet.

If we are interested in the extent to which means differ, we are – statistically speaking – interested in the extent to which expected values differ. So, our analysis will focus on finding out what our data have to say about the difference between the expected values. More concrete: we focus on the difference between the expected values for weight (measured in kg) before and after the diet. Using conventional symbols, we aim at uncovering quantitative information about the difference \mu_{pre\ weight} – \mu_{post\ weight}.

Since all persons were measured pre and post diet, the measurements are likely to be correlated. Indeed, the sample correlation equals r = .96. We need to take this correlation into account and that is why we use the statistical techniques for estimation and testing that are available in the paired t-test analysis in jamovi.

In a real research situation, we would of course start with descriptive analyses to figure out what the data seem to suggest about the extent to which a diet leads to weight loss. But now we are just looking at how to obtain the relevant inferential information from jamovi.

I have chosen the following options for the analysis.

Since we are interested in the extent to which following a diet leads to weight loss, it is important to realize that the t-test in itself does not necessarily provide useful information. Why? Because the t-test gives us input to make the decision whether to reject the null-hypothesis that the expected values are equal, and only indirectly provides us with the information about the extent to which the expected values differ, and the latter is of course what we are interested in: we want quantitative information! The more useful information is provided by the estimate of the mean difference and the 95% Confidence Interval.

The relevant output for the t-test and the estimation results are presented in Figure 2.

Let’s start with the t-test. The conventional null-hypothesis is that the expected values of the two variables are equal ( \mu_{pre\ weight} = \mu_{post\ weight}, or \mu_{pre\ weight} – \mu_{post\ weight} = 0. . The alternative hypothesis is that the expected values are not equal. Following convention, we use a significance level of \alpha = .05, so that our decision rule is to reject the null-hypothesis if the p-value is smaller than .05 and to not reject but also not accept the null-hypothesis if the p-value is .05 or larger.

The result of the t-test is t(77) = 13.3, p < .001. Since the p-value is smaller than .05 we reject the null-hypothesis and we decide that the expected values are not equal. In other words, we decide that the population means are not equal. As we said above, this does not answer our research question, so we’d better move on to the estimation results.

The estimated difference between the expected values equals 3.84 kg, 95% CI [3.27, 4.42]. So, we estimate that the difference in expected weights after 10 weeks of following the diet is somewhere between 3.72 and 4.42 kilograms.

Jamovi also provides point and interval estimates for Cohen’s d. This version of Cohen’s d is derived by standardizing the mean difference using the standard deviation of the difference scores. Figure 3 presents the relevant output. The estimated value of Cohen’s delta equals 1.51, 95% CI [1.18, 1.83]. Using rules of thumb for the interpretation of Cohen’s d, these results suggest that there may be a very large difference in mean weights of the pre- and postdiet measurements.

There is an alternative conceptualisation of the standardized mean difference. Instead of using the SD of the difference scores, we may use the average of the SDs of the two measurements. See https://small-s.science/2020/12/cohens-d-for-paired-designs/ for an explanation and R-code for the calculation of the CI.

To answer the second research question, we will have to reframe the substantive question into a statistical one. Just like we assumed above, we will consider the difference between the expected values (or population means) to be the statistical quantity of interest.

The conventional statistical null-hypothesis of the t-test is that the expected value of the variable does not differ between the groups or conditions. In other words, the null-hypothesis is that the two population means are equal. If the test result is significant, we will reject the null-hypothesis and decide that the population means are not equal. Note that this does not really answer the research question. Indeed, we are interested in the extent to which the expected values differ and not in whether we can decide that the difference is not zero. For that reason, estimation results are usually more informative than the results of a significance test.

I have chosen the following options for the independent t-test analysis in jamovi.

The above options will give you the results of the independent t-test and, more importantly, the estimation results, both unstandardized and standardized (Cohen’s d).

The relevant output is presented in Figure 5.

The result of the t-test is t(74) = -0.21, p = 0.84. This test result is clearly not significant, so we cannot decide that the population means differ. Importantly, we can also not decide that the population means are equal. That would be an instance of accepting the null-hypothesis and that is not allowed in NHST.

The estimation results (i.e. -0.12 kg, 95% CI [-1.29, 1.04]) make it clear that we should not necessarly believe that the population means are equal. Indeed, even though the estimated difference is only 0.12 kilograms, the CI shows the data to be consistent with differences up to 1 kg in either direction, i.e. with women showing more average weight loss than men or women showing less average weight loss than men.

We can also find the standardized mean difference and its CI in the independent t-test output: Cohen’s d = -0.08, 95% CI [-0.50, 0.41]. In this case, Cohen’s d is based on the pooled standard deviaton. According to rules-of-thumb that are used frequently in psychology the estimated effect is negligible to small, but the CI shows the data to be consistent with medium effect sizes in either direction.

The post The independent and paired T-test in jamovi appeared first on The small S scientist.

]]>Continue reading "Chi-square test with summary data in jamovi"

The post Chi-square test with summary data in jamovi appeared first on The small S scientist.

]]>This example is based on a question from an assignment I use in my Applied Statistics course (the assignment itself is from the instructor resources of the book Introduction to the New Statistics (first edition)).

The introductory text to the question is as follows.

To what extent might feeling powerful make you less considerate of the perspective of others? In one study (Galinsky et al., 2006), participants were manipulated to feel either powerful (High Power) or powerless (Low Power). They were then asked to write an ‘E’ on their forehead with a washable marker. Those who wrote the ‘E’ to be correctly readable from their own perspective—looking from inside the head—were considered ego-centric (Ego); those who

wrote it to be readable to others were considered to be non-ego-centric (Non-Ego).

Table 1 contains the data of the original study.

Ego | Non-Ego | Total | |

High Power | 8 | 16 | 24 |

Low Power | 4 | 29 | 33 |

Total | 12 | 45 | 57 |

All you need to do is to a create a dataset with three variables. The first two variables are nominal variables. These variable define the rows and columns of your contingency table. Here, I opted for the variables Power, with levels 1 = High Power and 2 = Low Power and Perspective, with levels 1 = Ego and 2 = Non-Ego.

The third variable is the variable Counts (which can be nominal, ordinal and continuous, a far as I can tell). The count variable contains the number of observations for each combination of the two categorical variables.

This is what the dataset looks like:

If you have the dataset, the rest is super easy as well. Just choose Frequencies on the Analyses tab followed by Independent samples. Now place your row, columns and counts variables in the right spot, as in Figure 2. That’s all!

The post Chi-square test with summary data in jamovi appeared first on The small S scientist.

]]>Continue reading "Comparing the quantiles of two groups"

The post Comparing the quantiles of two groups appeared first on The small S scientist.

]]>Traditionally, the comparison of two groups focuses on comparing means or medians. But, as Wilcox (2012) explains, there are many more features of the distributions of two groups that we may compare in order to shed light on how the groups differ. An interesting approach is to estimate the difference between the quantiles of the two groups. Wilcox (2012, pp. 138-150) shows us an approach that is based on the shift function. The procedure boils down to estimating the quantiles of both groups, and plotting the quantiles of the first group against the difference between the quantiles.

In order to aid in comparing the quantiles of the groups, I’ve created a function for R that can be used for plotting the comparison between the two groups. The functions uses the ggplot2 package and the WRS package (that can be found here: WRS: A package of R.R. Wilcox’ robust statistics functions version 0.24 from R-Forge (rdrr.io)) ; see also: Installation of WRS package (Wilcox’ Robust Statistics) | R-bloggers (r-bloggers.com).).

```
library(WRS)
library(ggplot2)
plotSband <- function(x, y, x.name = "Control") {
x <- sort(x[!is.na(x)])
y <- sort(y[!is.na(y)])
qhat <- 1:length(x)/length(x)
idx.y <- floor(qhat*length(y) + .5)
idx.y[idx.y <= 0] <- 1
idx.y[idx.y > length(y)] <- length(y)
delta <- y[idx.y] - x
cis <- WRS::sband(x, y, plotit=F)$m[, c(2, 3)]
check.missing <- apply(cis, 2, function(x) sum(is.na(x)))
if (sum(check.missing == length(x)) > 1) {
stop("All CI limits equal to - or + Infinity")
}
ylims <- c(min(cis[!is.na(cis[,1]), 1]) - .50,
max(cis[!is.na(cis[,2]), 2]) + .50)
cis[is.na(cis[, 1]), 1] <- ylims[1]*5
cis[is.na(cis[, 2]), 2] <- ylims[2]*5
thePlot <- ggplot(mapping = aes(x)) +
xlab(x.name) +
geom_smooth(aes(x = x, y = delta), se = F, col="blue") +
ylab("Delta") +
geom_point(aes(x = quantile(x, c(.25, .50, .75)),
y = rep(ylims[1], 3)), pch=c(3, 2, 3), size=2) +
geom_ribbon(aes(ymin = cis[,1], ymax = cis[,2]), alpha=.20) +
coord_cartesian(ylim = ylims)
suppressMessages(print(thePlot))
}
```

Let’s look at an example. Figure 1 presents data from an experiment investigating the persuasive effect of narratives on intentions of adopting a healthy lifestyle (see for details Boeijinga, Hoeken, and Sanders (2017)). The plotted data are the differences in intention between the quantiles of a group of participants who read a narrative focusing on risk-perception (detailing the risks of unhealthy behavior) and a group of participants who read a narrative focusing on action-planning (here called the control group), focusing on how the healthy behavior may actually be implemented by the participant.

Figure 1 shows the following. The triangle is the median of the data in the control group, and the plusses the .25th and .75th quantiles. The shaded regions define the simultaneous 95% confidence intervals for the differences between the quantiles of the two groups. Here, these regions appear quite ragged, because of the discrete nature of the data. For values below 2.5 and above 3.5, the limits (respectively the lower and upper limits of the 95% CI’S) equal infinity, so these values extend beyond the limits of the y-axis. (The sband function returns NA for these limits). The smoothed-regression line should help interpret the general trend.

How can we interpret Figure 1? First of all, if you think that it is important to look at statistical significance, note that none of the 95% intervals exclude zero, so none of the differences reach the traditional significance at the .05 level. As we can see, none of them exclude differences as large as -0.50 as well, so we should not be tempted to conclude that because zero is in the interval that we should adopt zero as the point-estimate. For instance, if we look at x = 2.5, we see that the 95% CI equlas [-1.5, 0.0], the value zero is included interval, but so is the value -1.5. It would be unlogical to conclude that zero is our best estimate if so many values are included in the interval.

The loess-regression line suggests that the differences in quantiles between the two groups of the narrative is relatively steady for the lower quantiles of the distribution (up to the x = 3.0, or so; or at least below the median), but for quantiles larger than the median the effect gets smaller and smaller until the regression line crosses zero at the value x = 3.75. This value is approximately the .88 quantile of the distribution of the scores in the control condition (this is not shown in the graph).

The values on the y-axis are the differences between the quantiles. A negative delta means that the quantile of the control condition has a larger value than the corresponding quantile in the experimental condition. The results therefore suggest that participants in the control condition with a relatively low intention score, would have scored even lower in the other condition. To give some perspective: expressed in the number of standard deviations of the intention scores in the control group a delta of -0.50 corresponds to a 0.8 SD difference.

Note however, that due to the limited number of observations in the experiment, the uncertainty about the direction of the effect is very large, especially in the tails of the distribution (roughly below the .25 and above the .75 quantile). So, even though the data suggest that Action Planning leads to more positive intentions, especially for the lower quantiles, but still considerably for the .75 quantile, a (much) larger dataset is needed to obtain more convincing evidence for this pattern.

The post Comparing the quantiles of two groups appeared first on The small S scientist.

]]>Continue reading "A confidence interval for the correlation coefficient"

The post A confidence interval for the correlation coefficient appeared first on The small S scientist.

]]>- Transform r to a standard normal deviate Z

Z_{xy} = \frac{1}{2}ln\left(\frac{1 + r}{1 – r}\right), \tag{1}

which is equal to:

Z_{xy} = arctanh(r). \tag{2} - Determine the standard error for Z:

s_Z = \sqrt\frac{1}{N – 3}. \tag{3} - Calculate the Margin of Error (MoE) for Z:

MOE_Z = 1.96*s_z. \tag{4} - Add to and substract MoE from Z to obtain a 95% Confidence Interval for Z.
- Transform the upper and lower limits of the CI for Z to obtain the corresponding limits for \rho, using:

r_Z = \frac{e^{2Z} – 1}{e^{2Z} + 1}, \tag{4}

which is equal to:

r_Z = tanh(Z). \tag{5}

The following R-code does all the work:

```
conf.int.rho <- function(r, N) {
lims.rho = tanh(atanh(r) + c(qnorm(.025),
qnorm(.975)) * sqrt(1/(N - 3)))
return(lims.rho)
}
```

So, if you have r = .50 and N = .50, just run the above function in R to obtain a confidence interval for the correlation coefficient.

```
conf.int.rho(.50, 50)
## [1] 0.2574879 0.6832563
```

The post A confidence interval for the correlation coefficient appeared first on The small S scientist.

]]>Continue reading "Cohen’s d for paired designs"

The post Cohen’s d for paired designs appeared first on The small S scientist.

]]>d_{av} =\frac{(M_1 - M_2)}{s_{av}}, \tag{1}

where s_{av} equals

s_{av}= \sqrt{\frac{1}{2}(S^2_1+S^2_2)}. \tag{2}

Now, (1) is of course an estimate of (3), the population value of Cohen’s d for paired designs, but we do not only need a point estimate, but also a confidence interval.

The following R-code can be used to obtain a 95% confidence interval for the estimate of \delta_{av}, the population mean difference standardized by using the average of the two standard deviations:

\delta_{av} = \frac{\mu_1 - \mu_2}{\sqrt{\frac{1}{2}(\sigma_1^2 + \sigma_2^2)}} , \tag{3}

This procedure uses the approximate procedure by Algina & Keselman (2003), which is also used by ESCI (Cumming, 2012; Cumming and Calin-Jageman, 2017), as Kline (2013) explains. The following steps are performed to obtain the 95% confidence interval.

- Use the obtained t-value of the paired t-test to estimate the non-centrality parameter \lambda. Steps 2 and 3 are for calculating a 95% confidence interval for the non-centrality parameter.
- Use an iterative procedure to find the non-centrality parameter of the t-distribution for which the observed t-value is the .025 quantile. This is the upper limit of the confidence interval for the non-centrality parameter.
- Use an iterative procedure to find the non-centrality parameter of the t-distribution for which the observed t-value is the .975 quantile. This is the lower limit of the confidence interval for the non-centrality parameter.
- To obtain a CI for \delta_{av} multiply the limits of the confidence interval for the non-centrality parameter by the value \sqrt{\frac{2S_D^2}{n(S_1^2+S_2^2)}} where n is the sample size, S_D^2 the variance of the difference scores , and S_1^2 and S_2^2 the variances of the two variables.

The following R-function does all the work. Note that with large potential values for the noncentrality parameter R issues warnings that “full precision has not been achieved in ‘pnt{final}'”. These warnings can be ignored ( I checked many examples against ESCI’s output), but in order to prevent them, I let the optimize function search from the observed value of t to maximally five times its value, and I have included the option to suppress warnings or not (just set warn = TRUE to get the warnings).

```
ci.d.av <- function(t, n, s.1, s.2, s.diff, warn = FALSE) {
if (t < 1) {
lims <- c(-5, 5)
} else {
lims <- c(-5*abs(t), 5*abs(t))
}
df = n - 1
multiplier = sqrt((2*s.diff^2) / (n*(s.1^2 + s.2^2)) )
loss <- function(x, prob) (pt(t, df, x) - prob)^2
if (warn == FALSE) {
ul <- suppressWarnings(optimize(loss, lims, prob=.025))$minimum
ll <- suppressWarnings(optimize(loss, lims, prob=.975))$minimum
} else {
ul <- optimize(loss, lims, prob=.025)$minimum
ll <- optimize(loss, lims, prob=.975)$minimum
}
return(round(c(ll, ul), 4)*multiplier)
}
```

The arguments of the function are t, the t-value of the t-test testing the hypothesis of equal population means, the sample size (n), the standard deviations (s.1 and s.2) of the two variables and the standard deviation of the difference scores (s.diff).

Here is a quick example.

```
library(MASS)
set.seed(1234)
# generate random multivariate normal data with sample size n = 20
theData <- mvrnorm(20, c(.5, .0), matrix(c(1, .8, .8, 1), 2, 2))
# calculate the standard deviations
sds <- apply(theData, 2, sd)
# calculate the standard deviation of difference scores
sDiff <- sd(theData[,1] - theData[,2])
# get t.value and a value for d.av
# here I use the output of the t-test in R to obtain t and the mean
# difference score (needed for calculating d.av)
theTest <- t.test(theData[,1], theData[,2], paired=TRUE)
t = theTest$statistic
d.av = theTest$estimate / mean(sds)
ci.d.av(t = t, n = 20, s.1 = sds[1], s.2 = sds[2], sDiff)
```

The results are that the estimate equals d_{av} = 0.87, 95\% \text{CI} [0.51, 1.22].

Alternatively, we can make use of the conf.limits.nct function of the MBESS package (Kelley, 2007a, 2007b), and proceed as follows (using the data generated above).

```
library(MBESS)
ci.d.av.2 <- function(t, n, s.1, s.2, s.diff) {
df = n - 1
multiplier = sqrt((2*s.diff^2) / (n*(s.1^2 + s.2^2)) )
unlist(conf.limits.nct(t, df)[c(1,3)])*multiplier
}
ci.d.av.2(t = t, n = 20, s.1 = sds[1], s.2 = sds[2], s.diff = sDiff).
```

Algina, J. & Keselman, H. J. (2003). Approximate confidence intervals for effect sizes. *Educational and Psychological Measurement*, *63*, 721-734.

Cumming, G. (2012). Understanding the New Statistics. Effect Sizes, Confidence Interval, and Meta-Analysis. New York: Routledge.

Cumming, G. & Calin-Jageman, R. (2017). Introduction fo the New Statistics. Estimation, Open Science, and Beyond. New York: Routledge.

Kelley, K. (2007a). Methods for the Behavioral, Educational, and Social Sciences: An R Package. *Behavior Research Methods*, *39*, 979–984.

Kline, R.b. (2013). Beyond Significance Testing. Statistics Reform in the Behavioral Sciences. (Second Edition). Washington: APA.

The post Cohen’s d for paired designs appeared first on The small S scientist.

]]>Continue reading "Linear Trend Analysis with R and SPSS"

The post Linear Trend Analysis with R and SPSS appeared first on The small S scientist.

]]>[Note: A pdf-file that differs only slightly from this blogpost can be found on my Researchgate page: here; I suggest Haans (2018) for an easy to follow introduction to contrast analysis, which should really help understanding what is being said below].

The references cited above are clear about how to construct contrast coefficients (lambda coefficients) for linear trends (and non-linear trends for that matter) that can be used to perform a significance test for the null-hypothesis that the slope equals zero. Maxwell, Delaney, and Kelley (2017) describe how to obtain a confidence interval for the slope and make clear that to obtain interpretable results from the software we use, we should consider how the linear trend contrast values are scaled. That is, standard software (like SPSS) gives us a point estimate and a confidence interval for the contrast estimate, but depending on how the coefficients are scaled, these estimates are not necessarily interpretable in terms of the slope of the linear trend, as I will make clear

momentarily.

So our goal of the data-analysis is to obtain a point and interval estimate of the slope of the linear trend and the purpose of this contribution is to show how to obtain output that is interpretable as such.

Let us have a look at an example of a linear trend to make clear what exactly we are talking about here. To keeps things simple, we suppose the following context. We have an experimental design with a single factor and a single dependent variable. The factor we are considering is quantitive and its values are equally spaced. This may (or may not) differ from the usual experiment,where the independent variable is a qualitative, nominal variable. An example from Haans (2018) is the variable location, which is the row in the lecture room where students attending the lecture are seated. There are four rows and the distance between the rows is equal. Row 1 is the row nearest to the lecturer, and row 4 is the row with the largest distance between the student and the lecturer. We will assign values 1 through 4 to the different rows.

We hypothesize that the distance between the student and the lecturer, where distance is operationalized as the row where the student is seated, and mean exam scores of the students in each row show a negative linear trend. The purpose of the data-analysis is to estimate how large the (negative) slope of this linear trend is. Let us first suppose that there is a perfect negative linear trend, in the sense that each unit increase in the location variable is associated with a unit decrease in the mean exam score. Let us suppose that means are 4, 3, 2 and 1, respectively.

The negative linear trend is depicted in the following figure.

Figure 1: Negative Linear Trend with slope \beta_1 = -1.0 |

The equation for this perfect linear relation between location and mean exam score is \bar{Y} = 5 + (-1)X, that is, the slope of the negative trend equals −1. So, suppose the pattern in our sample means follows this perfect negative trend, we want our slope estimate to equal −1.

Now, following Maxwell, Delaney, and Kelley (2017), with equal sample sizes, the estimated slope of the linear trend is equal to

{\beta_1} = \frac{\hat{\psi}_{linear}}{\sum{\lambda_j^2}}, \tag{1}

which is the ratio of the linear contrast estimate \hat{\psi} to the sum of the squared contrast weights, also called the lambda weights. (Again, for a gentle introduction to contrast analysis see Haans (2018).) The lambda weight \lambda_j corresponding to each value of the independent variable Xis \lambda_j = X_j – \bar{X}.

For the intercept of the linear trend equation we have

\hat{\beta_0} = \bar{Y} – \hat{\beta_1}\bar{X}. \tag{2}

Since the mean of the X values equals 2.5, remember that we numbered our rows 1 through 4, we have lambda weights \boldsymbol{\Lambda} = [-1.5,-0.5, 0.5, 1.5]. The value of the linear contrast estimate equals \hat{\psi} = −1.5 * 4 + −0.5 * 3 + 0.5 * 2 + 1.5 * 1 = −5 (note that this is simply the sum of the product of each lambda weight and the corresponding sample mean), the sum of the squared lambda weights equals 5, so the slope estimate equals \hat{\beta_1} = \frac{-5}{5} = -1, as it should.

The importance of scaling becomes clear if we use the standard recommended lambda weights for estimating the negative linear trend. These standard weights are \boldsymbol{\Lambda} = [-3,-1, 1, 3]. Using those weights leads to a contrast estimate of −10, and, since the sum of the squared weights now equals 20, to a slope estimate of −0.50, which is half the value we are looking for. For significance tests of the linear trend, this difference in results doesn’t matter, but for the interpretation of the slope it clearly does. Since getting the “correct” value for the slope estimate requires an additional calculation (albeit a very elementary one), I recommend sticking to setting the lambda weight to \lambda_j = X_j – \bar{X}.
## Estimating the slope

Let us apply the above to the imaginary data provided by Haans (2018) and see how we can estimate the slope of the linear trend. The data are reproduced in Table 1.

The groups means of all of the four rows are \boldsymbol{\bar{Y}} = [7, 7, 6, 2]. The lambda weights are \boldsymbol{\Lambda} = [-1.5, -0.5, 0.5, 1.5]. The value of the contrast estimate equals \hat{\psi}_{linear} = -8, the sum of the squared lambda weights equals \sum_{j=1}^{k}\lambda_{j}^{2} = 5, so the estimated slope equals -1.6. The equation for the linear trend is therefore \hat{\mu}_j = 9.5 – 1.6X_j. Figure 2 displays the obtained means and the estimated means based on the linear trend.

Figure 2: Obtained group means and estimated group means (unfilled dots) based on the linear trend. |

If we estimate the linear trend contrast with SPSS, we will get a point estimate of the contrast value and a 95% confidence interval estimate. For instance, if we use the lambda weights \boldsymbol{\Lambda} = [-1.5, -0.5, 0.5, 1.5] and the following syntax, we get the output presented in Figure 3. (Note: I have created a SPSS dataset with the variables row and score; see data in Table 1).

Figure 3: SPSS Output Linear Trend Contrast |

Figure 3 makes it clear that the 95% CI is of the linear trend contrast estimate, and not of the slope. But it is easy to obtain a confidence interval for the slope estimate by using (1) on the limits of the CI of the contrast estimate. Since the sum of the squared lambda weights equals 5.0, the confidence interval for the slope estimate is 95% CI [-11.352/5, -4.648/5] = [-2.27, -0.93]. Alternatively, divide the lambda weights by the sum of the squared lambda weights and use the results in the specification of the L-matrix in SPSS:

Using the syntax above leads to the results presented in Figure 4.

Figure 4: SPSS Output Slope Estimate with adjusted contrast weights |

The following R-code accomplishes the same goals. (Note: I make use of the emmeans package, so you need to install that package first; I have created an RData-file named betweenData.Rdata using the data in Table 1, using the variable names location and examscore).

# load the dataset load('~\betweenData.RData') # load the functions of the emmeans package library(emmeans) # set options for the emmeans package to get # only confidence intervals set infer=c(TRUE, TRUE)for both # CI and p-value emm_options(contrast = list(infer=c(TRUE, FALSE))) # specify the contrast (note divide # by sum of squared contrast weights) myContrast = c(-1.5, -0.5, 0.5, 1.5)/ 5 # fit the model (this assumes the data are # available in the workspace) theMod = lm(examscore ~ location) # get expected marginal means theMeans = emmeans(theMod, "location") contrast(theMeans, list("Slope"=myContrast))

## contrast estimate SE df lower.CL upper.CL ## Slope -1.6 0.3162278 16 -2.270373 -0.9296271 ## ## Confidence level used: 0.95

The estimate of the slope of the linear trend equals \hat{\beta}_1 = -1.60, 95% CI [-2.27, -0.93]. This means that with each increase in row number (from a given row to a location one row further away from the lecturer) the estimated exam score will on average decrease by -1.6 points, but any value between -2.27 and -0.93 is considered to be a relatively plausible candidate value, with 95% confidence. (Of course, we should probably not extrapolate beyond the rows that were actually observed, otherwise students seated behind the lecturer will be expected to have a higher population mean than students seated in front of the lecturer).

In order to aid interpretation one may convert these numbers to a standardized version (resulting in the standardized confidence interval of the slope estimate) and use rules-of-thumb for interpretation. The square root of the within condition variance may be a suitable standardizer. The value of this standardizer is S_{W} = 1.58 (I obtained the value of \text{MS}_{within} = 2.5 from the SPSS ANOVA table). The standardized estimates are therefore -1.0, 95% CI [-1.43, -0.59] suggesting that the negative effect of moving one row further from the lecturer is somewhere between medium and very large, with the point estimate corresponding to a large negative effect.

Haans, Antal (2018). Contrast Analysis: A Tutorial. Practical Assessment, Research, & Education, 23(9). Available online: http://pareonline.net/getvn.asp?v=23&n=9

Maxwell, S.E., Delaney, H. D., & Kelley, K. (2017). Designing Experiments and Analyzing Data. A Model Comparison Perspective. (Third Edition). New York/ London: Routledge.

Rosenthal, R., Rosnow, R.L., & Rubin, D.B. (1993). Contrasts and Effect Sizes in Behavioral Research. A Correlational Approach. Cambridge, UK: Cambridge University Press.

The post Linear Trend Analysis with R and SPSS appeared first on The small S scientist.

]]>Continue reading "Planning for Precise Contrast Estimates: Introduction and Tutorial (Preprint)"

The post Planning for Precise Contrast Estimates: Introduction and Tutorial (Preprint) appeared first on The small S scientist.

]]>The preprint is availabe on researchgate: Click (but I am just as happy to send it to you if you like; just let me know).

The post Planning for Precise Contrast Estimates: Introduction and Tutorial (Preprint) appeared first on The small S scientist.

]]>Continue reading "Contrast analysis with R: Tutorial for factorial mixed designs"

The post Contrast analysis with R: Tutorial for factorial mixed designs appeared first on The small S scientist.

]]>I will illustrate two approaches, the first approach is to use transformed scores in combination with one-sample t-tests, and the other approach uses the univariate mixed model approach. As was explained in the previous tutorial, the first approach tests each contrast against it’s own error variance, whereas in the mixed model approach a common error variance is used (which requires statistical assumptions that will probably not apply in practice; the advantage of the mixed model approach, if its assumptions apply, is that the Margin of Error of the contrast estimate is somewhat smaller).

Again, our example is taken from Haans (2018; see also this post. It considers the effect of students’ seating distance from the teacher and the educational performance of the students: the closer to the teacher the student is seated, the higher the performance. A “theory “explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall. To test that theory, a experiment was conducted with N = 9 participants in a factorial mixed design (also called a split-plot design), with two fixed factors: the between participants Sunglasses (with or without), and the within participants factor Location (row 1 through row 4). The dependent variable was the score on a 10-item questionnaire about the contents of the lecture. So, we have a 2 by 4 mixed factorial design, with n = 9 participants in each combination of the factor levels.

We will again focus on obtaining an interaction contrast: we will estimate the extent to which the difference between the mean retention score on the first row and those on the other rows differs between the conditions with and without sunglasses.

Let us first have look at obtaining the estimate with the per contrast error variance approach.

I have used the following SPSS syntax to obtain the estimate of the interaction contrast. I used the SPSS file Mixed2by4data.sav, which is available from the supplementary materials of Haans (2018).

The result is as follows.

Figure 1: SPSS interaction contrast estimate in a factorial mixed design. |

The contrast estimate equals 1.0, 95% CI [ -0.53, 2.53], indicating that the difference between the first row mean and the means of the other rows is estimated to be 1 point larger (on the retention scale) when the teacher does not wear sunglasses compared to when the teacher does wear them. The width of the confidence interval suggests that this estimate is rather imprecise, which can be expected with such a small sample. (For more detailed information considering sample size planning for contrastanalysis see: http://small-s.science/?p=10)

If you want to use the mixed model approach, you can do the following. (Make sure you have restructured the data into long format; download my version here).

Running this syntax will render the following result.

Figure 2: SPSS mixed linear model contrast estimate. |

The contrast estimate equals 1.0, 95% CI [-0.28, 2.28]. The interpretation is the same as above, of course.

Let’s do the same as above and start with estimating the contrast and the confidence interval with a per-contrast error variance. (For background details with respect to the calculation of the contrastScores, see the post on the analysis of the within-subjects design).

library(foreign) theData = as.data.frame(read.spss(file="./Mixed2by4data.sav")) #contrast for within participants factor: myContrast = c(1, -1/3, -1/3, -1/3) contrastScores = as.matrix(theData[,3:6]) %*% myContrast t.test(contrastScores ~ sunglasses, data=theData)

## ## Welch Two Sample t-test ## ## data: contrastScores by sunglasses ## t = 1.3846, df = 15.875, p-value = 0.1853 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.5320236 2.5320236 ## sample estimates: ## mean in group Without glasses mean in group With glasses ## 2.666667 1.666667

#assuming equal variances: t.test(contrastScores ~ sunglasses, var.equal=TRUE, data=theData)

## ## Two Sample t-test ## ## data: contrastScores by sunglasses ## t = 1.3846, df = 16, p-value = 0.1852 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.5310427 2.5310427 ## sample estimates: ## mean in group Without glasses mean in group With glasses ## 2.666667 1.666667

Note that using the t-test with the equal variances assumption reproduces the results of the GLM analysis with SPSS exactly.

For the mixed model approach I will use the emmeans package. I will show how this can be used in combination with the aov-function from the stats package and in combination with the lmer-fuction from the lme4 package. Note that this requires a dataset in long format. (Download my datafile here).

library(lme4) # not necesary if you want to use aov library(emmeans) # set options for emmeans to get both p-value and ci emm_options(contrast = list(infer=c(TRUE, TRUE))) theData = as.data.frame(read.spss(file="./Mixed2by4dataLong.sav")) theData$person = factor(theData$person) attach(theData) myContrast = c(1, -1/3, -1/3, -1/3, -1, 1/3, 1/3, 1/3) # using the lmer model myMod.lme = lmer(retention ~ sunglasses*location + (1|person)) # note: in the model formula I used between * within (i.e. # sunglasses * location). In the call to emmeans below, I have used # the reverse notation: location*sunglasses. This has to do with # the way I formulated the contrast: the first 4 weights are for # the condition without sunglasses and the last 4 weights are for # the condition with sunglasses. So, the weights are (so to say) # specified as a 4 x 2 design, and the model as a 2 x 4 design (which # is more or less the traditional way of specifying the anova model: # first the between factor(s) than the within factor(s) followed by # all the interactions). condMeans = emmeans(myMod.lme, ~location*sunglasses) contrast(condMeans, list("interaction"= myContrast))

## contrast estimate SE df lower.CL upper.CL t.ratio p.value ## interaction 1 0.6358624 48 -0.278487 2.278487 1.573 0.1224 ## ## Confidence level used: 0.95

# using the aov model: myMod.aov = aov(retention ~ sunglasses*location + Error(person/location)) condMeans = emmeans(myMod.aov, ~location*sunglasses) contrast(condMeans, list("interaction"= myContrast))

**Reference**

Haans, Antal (2018). Contrast Analysis: A Tutorial. Practical Assessment, Research, & Education, 23(9). Available online: http://pareonline.net/getvn.asp?v=23&n=9

The post Contrast analysis with R: Tutorial for factorial mixed designs appeared first on The small S scientist.

]]>Continue reading "The Anatidae Principle"

The post The Anatidae Principle appeared first on The small S scientist.

]]>If it looks like a duck, and quacks like a duck, we have at least to consider the possibility that we have a small aquatic bird of the family Anatidae on our hands.

– Douglas Adams

I like to teach my students how they can apply to their data-analysis what I call the Anatidae Principle (or the Principle of the Duck). (The name is obviously inspired by the above quote from Douglas Adam’s *Dirk Gently’s Holistic Detective Agency*).

For the purpose of data-analysis, the Anatidae Principle simply boils down to the following: If it looks like you found a relation, difference, or effect in your sample you should at least consider the possibility that there indeed is a relation, difference or effect. That is, look at your data, summarize, make figures, and think (hard) about what your data potentially mean for the answer to your research question, hypotheses, hunches, whatever you like. Do this before you start calculating p-values, confidence intervals, Bayes Factors, Posterior distributions, etc., etc.

In my experience, researchers too often violate the Anatidae Principle: they calculate a p-value, and if it is not significant they simply ignore their sample results. Never mind that, as they predicted, group A outperforms group B, if it is not significant, they will claim they found no effect. And, worse still, believe it.

Kline (2013) ) (p. 117) gives solid advice:

“Null hypothesis rejections do not imply substantive significance, so researchers need other frames of reference to explain to their audiences why the results are interesting or important. A start is to learn to describe your results without mention of statistical significance at all. In its place, refer to descriptive statistics and effect sizes and explain why those effect sizes matter in a particular context. Doing so may seem odd at first, but you should understand that statistical tests are not generally necessary to detect meaningful or noteworthy effects, which should be obvious to visual inspection of relatively simple kinds of graphical displays (Cohen, 1994). The description of results at a level closer to the data may also help researchers to develop better communication skills.”

The post The Anatidae Principle appeared first on The small S scientist.

]]>Continue reading "Planning with assurance, with assurance"

The post Planning with assurance, with assurance appeared first on The small S scientist.

]]>Cumming and Calin-Jageman (2017, p. 277) propose a strategy for determining target MoE. You can use this strategy if your research goal is to provide strong evidence that the effect size is non-zero. The strategy is to divide the expected value of the difference by two, and to use that result as your target MoE.

Let’s restrict our attention to the comparison of two means. If the expected difference between the two means is Cohens’s d = .80, the proposed strategy is to set your target MoE at f = .40, which means that your target MoE is set at .40 standard deviations. If you plan for this value of target MoE with 80% assurance, the recommended sample size is n = 55 participants per group. These results are guaranteed to be true, if it is known for a fact that Cohen’s d is .80 and all statistical assumptions apply.

But it is generally not known for a fact that Cohen’s d has a particular value and so we need to answer a non-trivial question: what effect size can we reasonably expect? And, how can we have assurance that the MoE will not exceed half the **unknown** true effect size? One of the many options we have for answering this question is to conduct a pilot study, estimate the plausible values of the effect size and use these values for sample size planning. I will describe a strategy that basically mirrors the sample size planning for power approach described by Anderson, Kelley, and Maxwell (2017).

The procedure is as follows. In order to plan with approximately 80% assurance, estimate on the basis of your pilot the 80% confidence interval for the population effect size and use half the value of the lower limit for sample size planning with 90% assurance. This will give you 81% assurance that assurance MoE is no larger than half the **unknown** true effect size.

There are two “problems” we need to consider when estimating the true effect size. The first problem is that there is at least 50% probability of obtaining an overestimate of the true effect size. If that happens, and we take the point estimate of the effect size as input for sample size planning, what we “believe” to be a sample size sufficient for 80% assurance will be a sample size that has less than 80% assurance at least 50% of the times. So, using the point estimate gives assurance MoE for the unknown effect size with less than 50% assurance.

To make it more concrete: suppose the true effect equals .80, and we use n = 25 participants in both groups of the pilot study, the probability is approximately 50% that the point estimate is above .80. This implies, of course, that we will plan for a value of f > .40, approximately 50% of the times, and so the sample we get will only give us 80% assurance 50% of the times.

The second problem is that the small sample sizes we normally use for pilot studies may give highly imprecise estimates. For instance, with n = 25 participants per group, the expected MoE is f = 0.5687. So, even if we accept 50% assurance, it is highly likely that the point estimate is rather imprecise.

Since we are considering a pilot study, one of the obvious solutions, increasing the sample size so that expected MoE is likely to be small, is not really an option. But what we can do is to use an estimate that is unlikely to be an overestimate of the true effect size. In particular, we can use as our estimate the lower limit of a confidence interval for the effect size.

Let me explain, by considering the 80% CI of the effect size estimate. From basic theory it follows that the “true” value of the effect size will be smaller than the lower limit of the 80% confidence interval with probability equal to 10%. That is, if we calculate a huge number of 80% confidence intervals, each time on the basis of new random samples from the population, the true value of the effect size will be below the lower limit in 10% of the cases. This also means that the lower limit of the interval has 90% probability to **not **overestimate the true effect size.

This means that if we take the lower limit of the 80% CI of the pilot estimate as input for our sample size calculations, and if we plan with assurance of .90, we will have 90%*90% = 81% assurance that using the sample size we get from our calculations will have MoE no larger than half the true effect size. (Note that for 80% CI’s with negative limits you should choose the upper limit).

Student of mine recently did a pilot study. This was a pilot for an experiment investigating the size of the effect of fluency of delivery of a spoken message in a video on Comprehensibility, Persuasiveness and viewers’ Appreciation of the video. The pilot study used two groups of size n = 10, one group watched the fluent video (without ‘eh’) and the other group watched the disfluent video where the speaker used ‘eh’ a lot. The dependent variables were measured on 7-point scales.

Let’s look at the results for the Appreciation variable. The (biased) estimate of Cohen’s d (based on the pooled standard deviation) equals 1.09, 80% CI [0.46, 1.69] (I’ve calculated this using the ci.smd function from the MBESS-package. According to the rules-of-thumb for interpreting Cohen’s d, this can be considered a large effect. (For communication effect studies it can be considered an insanely large effect). However, the CI shows the large imprecision of the result, which is of course what we can expect with sample sizes of n = 10. (Average MoE equals f = 0.95, and according to my rules-of-thumb that is well below what I consider to be borderline precise).

If we use the lower limit of the interval (d = 0.46), sample size planning with 90% assurance for half that effect (f = 0.23) gives us a sample size equal to n = 162. (Technical note: I planned for the half-width of the standardized CI of the unstandardized effect size, not for the CI of the standardized effect size; I used my Shiny App for planning assuming an independent groups design with two groups). As explained, since we used the lower limit of the 80% CI of the pilot and used 90% assurance in planning the sample size, the assurance that MoE will not exceed half the unknown true effect size equals 81%.

The post Planning with assurance, with assurance appeared first on The small S scientist.

]]>