Planning for Precision: Introduction to variance components

Both theory underlying the Precision application and the use of the app in practice rely for a large part on specifying variance components. In this post, I will give you some more details about what these components are, and how they relate to the analysis of variance model underlying the app.

What is variance?

Let’s start with a relatively simple conceptual explanation of variance. The key ideas are expected value and error.  Suppose you randomly select a single score from a population of possible values. Let’s suppose furthermore that the population of values can be described with a normal distribution. There is actually no need to suppose a normal distribution, but it makes the explanation relatively easy to follow.

As you probably know, the normal distribution is centered around its mean value, which is (equal to) the parameter μ. We call this parameter the population mean.

Now, we select a single random value from the population. Let’s call this value X.  Because we know something about the probability distribution of the population values, we are also in the position to specify an expected value for the score X. Let’s use the symbol E(X) for this expected value. The value of E(X) proves (and can be proven) to be equal to the parameter μ. (Conceptually, the expectation of a variable can be considered as its long run average).

Of course, the actual value obtained will in general not be equal to the expected value, at least not if we sample from continuous distributions like the normal distribution. Let’s call the difference between the value X and it’s expectation E(X) = μ. an error, deviation or residual: e = X – E(X) = X –  μ.

We would like to have some indication of the extent to which X differs from its expectation, especially when E(X) is estimated on the basis of  a statistical model.  Thus, we would like to have something like E(X – E(X)) = E(X –  μ). The variance gives us such an indication, but does so in squared units, because working with the expected error itself always leads to the value 0  E(X –  μ) = E(X) – E( μ) =  μ –  μ = 0. (This simply says that on average the error is zero; the standard explanation is that negative and positive errors cancel out in the long run).

The variance is the expected squared deviation (mean squared error) between X and its expectation: E((X – E(X))2) = E(X –  μ)2), and the symbol for the population value is σ2.

Some examples of variances (remember we are talking conceptually here):
– the variance of the mean, is the expected squared deviation between a sample mean and its expectation the population mean.
– the variance of the difference between two means:  the expected squared deviation between the sample difference and the population difference between two means.
– the variance of a contrast: the expected squared deviation between the sample value of the contrast and the population value of the contrast.

It’s really not that complicated, I believe.

Note: in the calculation of t-tests (and the like), or in obtaining confidence intervals, we usually work with the square root of the variance: the root mean squared error (RMSE), also known as the standard deviation (mostly used when talking about individual scores) or standard error (when talking about estimating parameter values such as the population mean or the differences between population means).

What are variance components?  

In order to appreciate what the concept of a variance component entails, imagine an experiment with nc treatment conditions in which np participants respond to ni items (or stimuli) in all conditions. This is called a fully-crossed experimental design.

Now consider the variable Xcpi. a random score in condition t, of participant p responding to item i. The variance of this variable is σ2(Xcpi) = E((Xcpi –  μ)2). But note that just as the single score is influenced by e.g. the actual treatment condition, the particular person or item, so too can this variance be decomposed into components reflecting the influence of these factors. Crucially, the total variance  σ2(Xcpi) can be considered as the sum of independent variance components, each reflecting the influence of some factor or interaction of factors.

The following figure represents the components of the total variance in the fully crossed design (as you can see the participants are now promoted to actual persons…)

The symbols used in this figure represent the following. Θ2 is used to indicate that a component is considered to be a so-called fixed variance component (the details of which are beyond the scope of this post), and the symbol σ2 is used to indicate components associated with random effects. (The ANOVA model contains a mixture of fixed and random effects, that’s why we call such models mixed effects models or mixed model ANOVAs). Components with a single subscript represent variances associated with main effects, components with two subscripts two-way interactions, and the only component with three subscripts represents a three-way-interaction confounded with error.

Let’s consider one of these variance components (you can also refer to them simply as variances) to see in more detail how they can be interpreted. Take σ2(p), the person-variance. Note that this is an alternative symbol for the variance, in the figure, the p is in the subscript in stead of between brackets (I am sorry for the inconvenience of switching symbols, but I do not want to rely too much on mathjax ($sigma^2_p = sigma^2(p)$) and I do not want to change the symbols in the figure).

The variance σ2(p) is the expected squared deviation of the score of an idealized randomly selected person and the expectation of this score, the population mean. This score is the person score averaged of the conditions of the experiment and all of the items that could have been selected for the experiment. (This conceptualization is from Generalizability Theory; the Venn-diagram representation as well), the person score is also called the universe score of the person).

The component represents E((μp – μ)2), the expected squared person effect. Likewise, the variance associated with items σ2(i) is the expected squared item effect, and σ2(cp), is the expected squared interaction effect of condition and person.

The figure indicates that the total variance is (modeled as) the sum of seven independent variance components. The Precision app asks you to supply the values for six of these components (the components associated with the random effects), and now I hope it is a little clearer why these components are also referred to as expected squared effects.

Relative error variance of a treatment mean

The basic goal of contrast analysis is to compare a mean (or more) relative to one (or more) other treatment means. That is, we are interested in the relative position of a mean compared to the rest.  Due to sampling error our estimate of the relative position of a mean differs from the ‘true’ relative position. The relative error variance of a treatment mean is the expected squared deviation between the obtained relative position and the expected relative position. 
Let’s consider the Venn-diagram above again. In particular, take a look at the condition-circle and the components contained in it. A total of four variance components are included in the circle. The component  Θ2(c), is the component associated with the treatment effect (μc – μ).  All the other components contained in the circle contribute to the relative error variance. Thus, the interaction of treatment and participant, the interaction of treatment and stimulus, and the error contribute to the deviation between the true effect (relative position) and the estimated effect. Or, in other words, the relative error variance of the treatment mean consists of the three variance components associated with the treatment by participant interaction, the treatment by stimulus interaction, and error.

But these components specify the variance in terms of individual measurements, whereas the treatment mean is obtained on the basis of averaging over the np*ni measurements we have in the corresponding treatment condition. So let’s see how we can take into account the number of measurements.

Unfortunately, things get a little complicated to explain, but I’ll have a go nonetheless.  The explanation takes two steps: 1) We consider how to specify expected mean squares in terms of the components contained in the condition-circle 2) We’ll see how to get from the formulation of the expected mean square to the formulation of the relative error variance of a treatment mean.

Obtaining an expected mean square

I will use the term Mean Square (MS) to refer to a variance estimate. For instance, MST is an estimate of the variance associated with the treatment condition. The expected MS (EMS) is the average value of these estimates.

We can use the Venn-diagram to obtain an EMS for the treatment factor and all the other factors in the design, but we will focus on the EMS associated with treatment. However, we cannot use the variance components directly, because the mixed ANOVA model I have been using for the application contains sum-to-zero restrictions on the treatment effects and the two-way interaction-effects of treatment and participant and treatment and stimulus (item).  The consequence of this is that we will have to multiply the variance components associated with the treatment-by-participant and treatment-by-stimulus with a constant equal to nc / (nc – 1), where nc is the number of conditions.(This is the hard part to explain, but I didn’t really explain it, but simply stated it).

The second step of obtaining the EMS is to multiply the components with the number of participants and items, as follows. Multiply the component by the sample size if and only if the subscript of the component does not contain a reference to the particular sample size. That is, for instance, multiply a component by the number of participants np, if and only if the subscript of the component does not contain a subscript associated with participants.

This leads to the following:

 E(MST) = npniΘ2(T) + ni(nc / (nc – 1))σ2(cp) + np(nc / (nc – 1))σ2(ci) + σ2(cpi, e).

Obtaining the relative error variance of the treatment mean

Notice that  ni(nc / (nc – 1))σ2(cp) + np(nc / (nc – 1))σ2(ci) + σ2(cpi, e) contains the components associated with the relative error variance. Because the treatment mean is based on np*ni scores, to obtain the relative error variance for the treatment mean, we divide by np*ni to obtain.

Relative error variance of the treatment mean = (nc / (nc – 1))σ2(cp) / np + (nc / (nc – 1))σ2(ci) / ni+ σ2(cpi, e) /  (npni).

As an aside, in the post describing the app, I have used the symbols σ2(αβ) to refer to nc / (nc – 1))σ2(cp), and σ2(αγ) to refer to (nc / (nc – 1))σ2(ci).

Comparing means: the error variance of a contrast 

From the relative error variance of a treatment mean, we can get to the variance of a contrast, simply by multiplying the relative error variance by the sum of the squared contrast weights. For instance, if we want to compare two  treatment means we can do so by estimating the contrast ψ = 1*μ1 + (-1)*μ2, where the values 1 and (-1) are the contrast weights. The sum of the squared contrast weights equals 2, the error variance of the contrast is therefore 2*(nc / (nc – 1))σ2(cp) / np + (nc / (nc – 1))σ2(ci) / ni+ σ2(cpi, e) /  (npni).

Note that the latter gives us the expected squared deviation between the estimated contrast value and the true contrast value (see also the explanation of the concept of variance above).

It should be noted that for the calculation of a 95% confidence interval for the contrast estimate (or for the Margin of Error; the half-width of the confidence interval) we make use of the square root of the error variance of the contrast. This square root is the standard error of the contrast estimate. The calculation of MOE also requires a value for the degrees of freedom. I will write about forming a confidence interval for a contrast estimate in one of the next posts. 

Planning for Precision: simulation results for four designs with four conditions

This is the third post about the Planning for Precision app (in the future I’ll explain the difference between Planning for Precision and Precision for Planning). Some background information about the application can be found here: http://the-small-s-scientist.blogspot.com/2017/04/planning-for-precision.html. 

In this post, I want to present the simulation results for 4 designs with 4 conditions. The designs are: the counter balanced design (see previous post), the fully-crossed design, the stimulus-within-condition design, and the stimulus-and-participant-within-condition design (the both-within-condition design). I have not included the participants-within-condition design, because this is simply the mirror-image (so to say) of the stimulus-within-condition design.

In one of my next posts, I will describe some more background information about planning for precision, but some of the basics are as follows. We have a design with 4 treatment conditions, and what we want do is to estimate differences between these condition means by using contrasts. For instance, we may be interested in the (amount of) difference between the first mean, maybe because it is a control-condition with the average of the other three conditions: μ1 – (μ2 + μ3 + μ4)/3 =  1*μ1 – 1/3*μ2 -1/3*μ3 – 1/3*μ4.  The values {1, -1/3, -1/3, -1/3} are the contrast weights, and for the result we use the term ψ.

The value of ψ is estimated on the basis of estimates of the population means, that is, the sample means or condition means. Due to sampling error, the contrast estimate varies from sample to sample and the amount of sampling error can be expressed by means of a confidence interval. Conceptually, the confidence interval expresses the precision of the estimate: the wider the confidence interval, the less precise the estimate is.

The Margin of Error (MOE) of an estimate is the half-width of the confidence interval, so the confidence interval is the estimate plus or minus MOE. We will take MOE as an expression of the precision of the estimate (the less the value of MOE the more precise the estimate).  Now, if you want to estimate an effect size, more precision (lower value of MOE; less wide confidence interval) is better than less precision (higher value of MOE; wider confidence interval).  The app let’s you specify the design and the contrast weights and helps you find the minimum required sample sizes (for participants and stimuli) for a given target MOE. (You can also play with the designs to see which design gives you smallest expected MOE).

Crucially, if you plan for precision, you also want to have some assurance that the MOE you are likely to obtain in you actual experiment will not be larger than you target MOE. Compare this with power: 80% power means that the probability that you will reject the null-hypothesis is 80%. Likewise, assurance MOE of 80% means that there is an 80% probability that your obtained MOE will be no larger than assurance MOE.

The simulations (with N = 10000 replications) estimate Expected MOE as well as Assurance MOE for assurances of .80, .90, .95, and .99, for 4 designs with 4 treatment conditions, with a total number of 48 participants and 24 stimuli (items).  The MOEs are given for three standard constrasts: 1) the difference between the first mean and the mean of the other three, with weights {1, -1/3, -1/3, -1/3}; 2) the difference between the second mean and the mean of conditions three and four, with weights {0, 1, -1/2, -1/2}; 3) the difference between the third and fourth condition means, with weights {0, 0, 1, -1}.

I will present the results in  separate tables for the 4 designs considered and include percentage difference between expected values of assurance MOE and the estimated values estimated values.

The fully crossed design 

The results are in the following table.

The percentage difference between the expected quantiles (= assurance MOEs for given insurance;  i.e. q.80 is expected or estimated  80% Assurance MOE) and the estimated quantiles are: .80: 0.11%; .90: 0.05%; .95: -0.14%; 99: -0.05%.

The counter balanced design 

The results are presented in the following table. 

The percentage difference between the expected quantiles and the estimated quantiles are: .80: 0.03%; .90: 0.13%;  .95: 0.09%, .99: -0.23%.

The stimulus-within-condition design 

The following table contains the details. 
The percentage difference between the expected quantiles and the estimated quantiles are: .80: -0.11%; .90: -0.33%;  .95: -0.55%, .99: -0.70%.  

Both-participant-and-stimulus-within-condition design 

Here is the table. 
And the percentage differences are: .80: -0.34%; .90: -0.59%;  .95: -0.82%;  .99: -1.06%. 

Conclusion

The results show that the simulation results are quite consistent with the expected values based on mixed model ANOVA. We can see that the differences between expected and estimated values increase the less the number of participants and items per condition. For instance, in the both within condition design 12 participants respond to 6 stimuli in one of the four treatment conditions. The fact that even with these small samples sizes the results seem to agree to an acceptable degree is (to my mind) encouraging. Note that with small samples the expected assurance MOES are slightly lower than the estimates, but the largest difference is -1.06% (see the MOE for 99% assurance). 

Planning for Precision: first simulation results

In this post, I want to share the results of the first simulation study to “test” my Planning for Precision app. More details about the app can be found in a previous post: here.

I have included the basic logic of the simulations (including R code) in a document that you can download: https://drive.google.com/open?id=0B4k88F8PMfAhSlNteldYRWFrQTg.

The simulation study simulates responses from a four condition counter balanced design, with p = 48 participants and q = 24 stimuli/items. Here, we will focus on expected and assurance MOE for three contrasts. The first contrast estimates the difference between the first mean and the average of the other three, the second contrast the difference between the second mean and the average of the third and fourth means, and the final contrast the difference between the means of the third and fourth contrasts.

Expected MOE is compared to the mean of the estimated MOE for each of the contrasts (based on 10000 replications). Assurance MOE is judged for assurance of .80, .90, .95 and .99, by comparing the calculations in the app with the corresponding quantile estimates of the simulated distributions.

Results 

Note that in the above table, the Expected Mean MOE is what I have called Expected MOE, and the q.80 through q.99 are quantiles of the distribution of MOE. As an example, q.80 is the quantile corresponding to assurance MOE with 80% assurance, Expected q.80 is the value of assurance MOE calculated with the theoretical approach, and Estimated q.80 is the estimated quantile based on the simulation studies. 
Importantly, we can see that most of the figures agree to a satisfying degree. If we look at the relative differences, expressed in percentages for the assurance MOEs, we get 0.0325% for q.80,  0.1260% for q.90, 0.0933% for q.95, and  -0.2324% for q.99. 

Conclusion

The first simulation results seem promising. But I still have a lot of work to do for the rest of the designs.