Tutorial 2 - Understanding fMRI Statistics, especially the General Linear Model (GLM)

Goals

To develop an intuitive understanding of the how the General Linear Model (GLM) is used for univariate, single-subject, single-run fMRI data analysis
To understand the significance of correlation coefficients, beta weights and p-values in time course analyses
To learn to perform single-subject correlational and GLM analysis in BrainVoyager
To learn how to use contrasts in statistical analyses

Relevant Lectures

General Linear Model

Accompanying Data

Background and Overview

This lecture assumes that you have an understanding of basic statistics. If you have taken a stats course in Psychology and worked with behavioral data, many of the statistical tests that can be employed for fMRI will be familiar to you — correlations, t tests, analyses of variance (ANOVAs) and so forth. These tests can be applied to fMRI data as well, using similar logic. As one example, you can compute a correlation between an expected time course and the actual time course from a voxel or a cluster of voxels in a specific brain region to see how well your expectations were met. As another example, you can extract an estimate of how active the voxel or region was in different conditions and then simply perform a t test to examine whether activation levels differ between two conditions. Alternatively, if you have a factorial design, such as our Main Experiment, which tested Faces/Hands Categories x Left/Centre/Right Directions (a 2 x 3 design), you can run an ANOVA to look for main effects and interactions.

All of these common statistical tests can be applied through a General Linear Model (GLM). The GLM is the "swiss army knife" of statistics, allowing you to accomplish many different goals with one tool.

Any statistical test needs two pieces of information:

signal: how much variance in the data can be explained by a model (e.g., a correlation, a difference between two conditions in a t test, or a difference between a combination of conditions in a factorial design that is analyzed by an ANOVA)
noise: how much variance remains unexplained

Statistical significance -- the degree of confidence that an effect is "real" depends on the ratio of explained variance to total variance (explained + unexplained variance). Put simply, if you have a big difference between conditions and little variance within each condition, you can be quite confident there is a legitimate difference; whereas, if you have a small effect in a very noisy data set, you would have little confidence that there was really a difference.

For fMRI data, there are two levels at which we can compute variance.

Variability over time. If we only have data from one participant, we can look at the variability of the time course in a voxel or region. This temporal variability can be parsed into effects due to the expected changes between conditions vs. noise. In this case, data points are time points.
Variability over participants. If we have data from multiple participants, we can look at the average size of an activation difference across participants relative to the variability among participants. In this case, data points are participants.

In this tutorial, we will examine the first case -- how statistics can be used in the data from a single participant to estimate effect sizes relative to variability over time. We will begin with a Pearson correlation, which is one of the simplest and most straightforward statistical tests to understand. Then we will see how the GLM can be used to perform a correlation and also to generate more complex models that better explain the data.

In a later tutorial, we will learn how to employ the second approach, in which we can combine data between participants and look at the consistency of effects across individuals.

Understanding fMRI statistics for data over time

Time Series: Data, Signal, and Noise

Let's begin by looking at how we can statistically analyze data that changes over time. A time course is shown below. This time course depict fMRI activation over time... or it could be any kind of time series (e.g., daily temperature over a two-year period).

We might be able to explain some of this variability with a model that captures much of the variability in the data, here a sinusoidal model of changes over time. The variance that can be accounted for by the model is the signal. However, the model can not explain all of the data. The error or noise (the part of the variance we cannot explain/model) can be seen in the residuals, which are computed by subtracting the model from the data.

These three parts are depicted in the figure below.

Applying correlations to fMRI data

In order to determine whether a voxel is reliably reacts to a certain stimulus, we must first measure how reliabily it changes with that stimulus/condition. To do this, we can use correlations to asses how closely the pattern of a predictor is related to the pattern of a signal(or in this case a time course). A high correlation between a predictor and a timecourse indicates that the voxel or group of voxels' activity is highly related to the type of stimuli presented, while a low correlation indicates that the voxel or voxels' activity is not related to that specific type of stimuli.

Another way of thinking about it is in terms of signal-to-noise ratio . One way of quantifying this is calculating the percent of explained variance (explained variance/total variance). This perentage can easily be calculated by squaring the correlation coefficient(r²). A high signal to noise ratio(lots of signal with little noise) will have greater percent of explained variance, while a lower signal to noise ratio(little signal with lots of noise) will result in a smaller percent of explained variance.

Let examine a few simple cases.

Activity #1

Open Widget 1 in a new window by clicking on the plot. (Note: if you have trouble opening the animation, please see our troubleshooting page).

Widget 1 shows a simulated 'data' time course in grey and a single predictor in purple using arbitrary units (y axis) over time in seconds (x axis). The predictor can be multiplied by a beta weight(scaling factor) and a constant can be added using the sliders(more will be said on these two parameters later in the tutorial). Below the time course is the residual time course, which we obtain by subtracting the predictor/model from our data at each point in time (i.e., data - predictor = residuals).

Adjust the sliders for Voxel 1 so that the purple line (predictor) overlaps the black line (signal).

Question 1:

a) What is the value of the correlation coefficient? What does this value mean? Hint: consider the pattern(shapes) of the purple and black lines.

b) What percentage of the total variance does the current model(purple predictor) explain? What percentage of the total variance would the best model account for?

c) Does the correlation coefficient change when moving the sliders(i.e., changing the beta weights)? Why or why not?

In practice, we don't expect to find a perfect fit between predictors and data. Voxel 2 is a more realistic example of noisy data.

Select Voxel 2 and adjust the sliders until you find the beta weight and constant that best fit the data.

Question 2:

a) What is the new correlation coefficient? Why is it different from Voxel 1?

b) What percentage of the total variance does the current model(purple predictor) explain for Voxel 2? What can we say about the difference in signal to noise ratio between Voxel 1 and Voxel 2?

Now let's look at some real time courses by considering the simplest possible example: one run from one participant with two conditions. Note that we always need at least two conditions so that we can examine differences between conditions using subtraction logic. This is because measures of the BOLD signal in a single condition are affected not only by neural activation levels but also by extraneous physical factors (e.g., the coil sensitivity profile) and biological factors (e.g., the vascularization of an area). As such, we always need to compare conditions to look at neural activation (indirectly) while subtracting out other factors.

In this case, we will look at the first localizer run. Recall that this run is composed of four visual conditions(faces, hands, bodies and scrambled) and 1 baseline condition. For now, we will group the four visual conditions into one visual condition, leaving us with only two conditions(visual vs baseline).

Activity #2

Open the widget in a new window and adjust the sliders to fit the visual predictor (in pink) to Voxel A's signal as closely as possible.

Question 3:

a) What is the correlation coefficient for Voxel A? How would you interpret this coefficient in terms of strength of the relationship and the voxel's prefered type of stimuli? What is the percentage of explained variance?

b) What is the correlation coefficient for Voxels B, C, and D? Compare the signal to noise ratio between Voxel A, B, C and D. From this information, which voxels respond specifically to visual stimuli? Bonus question: Where are these voxels more likely to be located?

c) Go to Voxel B, adjust the sliders to get the best fitting line and click on the Linear drift checkbox. What occurs to the signal when you correct for linear drift? Does the correlation coefficient change? In what direction? Why?

Bonus question: Voxel D has a correlation of -0.38. Looking at the time course, how can we tell whether this is a true BOLD signal or not? Why might the correlation be so high even though this voxel doesn't look related to the visual stimuli?

What about p-values?

We learned that wcan use correlations to measure the strength of the relationship between a time course and a predictor. However, we often want to know whether the relationship is statistically significant . This is where p-values come in. The p-value is the probability of obtaining this correlation coefficient in the data if the true (or "real") correlation coefficient was equal to zero.

Consider the relationship between the visual predictor and Voxel A. Let's pretend that, in reality, there is no relationship between Voxel A and the visual predictor. The time course we recorded for Voxel A is due to random variation, the real correlation coefficient is zero. However, we measured a correlation coefficient of 0.78. If we calculate the p-value, the probability of obtaining a correlation coefficient of 0.78 when the real correlation coefficent is 0 of 6.9 x 10 ^-70 (or one in 1.5 duovigintillion (10 ⁶⁹ )). Given this very small probability, either there is no effect and you are just extremely lucky OR there is a true relationship between the predictor and timecourse (the "real" correlation is not equal to zero). In this case, we will conclude that the timecourse is most likely related to the visual predictor.

Although the previous example is fairly obvious (i.e., you can see the signal pattern despite the noise), these relationships are not always so straight forward. Consider voxel B, which has the lowest correlation coefficient(0.33). At first glance, there doesn't appear to be a clear relationship between the predictor and the timecourse. However, the p-value indicates that the probability of obtaining a correlation of 0.33 completely by chance (if there was no relationship between activity and predictor, the "real" correlation coefficient is equal to 0) is of 3.87 x 10 ^-10 (or one in 2.6 billion), which also seems unlikely to occur by chance.

Now that we understand that p-values represent a probability, how do we know whether a voxel's activity is significantly related to a stimulus or condition(the predictor)? We use an arbitrary threshold, often p = 0.05 or 0.01 (or it's corresponding correlation coefficient). In other words, we say there is a statisically signicifant effect when the correlation coeffieicent between the time course of a voxel and the predictor has less than 0.05 (5%) probability of having occured by chance (but more on this when we talk about controlling for multiple comparisons).

Question 4:

a) If we obtain a p-value of 0.06 for a correlation x, what is the probability of having obtained that correlation coefficient due to chance?

b) Is a p-value always associated to the same correlation coefficient? Why or why not?

c) If we use a threshold of p = 0.05 and examine the correlation between a predictor and a time course for 100 different voxels (basically, we calculated 100 correlations), how many voxels might be statistically significant due to chance? If we used a threshold of p = 0.01? What type of error(I or II) would this consist of?

II. Whole-brain Voxelwise Correlations in BrainVoyager

In the previous sections, we learned how to quantify the strength of the relationship between apredictor and a time course for a single voxel using correlation coefficients and p-values. However, we have over 300,000 voxels in our data set. How can we get the bigger picture?

One way is to do a "Voxelwise" analysis, meaning that we run not just one correlation but one correlation for every voxel. This is sometimes also called a "whole-brain" analysis because typically the scanned volume encompasses the entire brain (although this is not necessarily the case, for example if a researcher scanned only part of the brain at high spatial resolution).

Instructions

1) Load the anatomical volume and the functional data. This procedure is explained in tutorial 1, but will be reminded for you here:

Anatomical data: Select File/Open... , navigate to the BrainVoyager Data folder(or where ever the files are located), and then select and open sub-10_ses-01_T1w_BRAIN_IIHC.vmr
Functional data: Select Analysis/Link Volume Time Course (VTC) File... . In the window that opens, click Browse , and then select and open sub-10_ses-01_task-Localizer_run-01_bold_256_sinc3_2x1.0_NATIVEBOX.vtc , and then click OK . The VTC should already be linked to the protocol file. You can verify this by looking at the VTC properties(follow steps in Tutorial 1).

Now we can conduct our voxel-wise correlation analysis.

2) Select Analysis/Compute Linear Correlation Maps... . In the window that opens, click Load RTC... , and then select and open sub-10_ses-01_run-01_corr-using-corr.rtc . Notice that our predictor appears on the protocol graph. Then, click Close . BrainVoyager will then compute the correlation between this predictor the timecourse of each voxel in the brain, and produce a correlation coefficient and p-value associated for each voxel.

Question 5: What kind of stimuli does the predictor represent? In other words, what are your conditions for this linear correlation?

Once BrainVoyager is done computing, you will obtain a statistical "heat-map" overlaid on the anatomical scan (but remember the map was generated from the functional scans). An example of these are shown below where significant voxels are colored in either warm colors(red/yellow) or cold colors(green/blue). If you hover your mouse over the voxels, you will see the correlation coefficent(r) and p-value for any given voxel.

Note: If you want the maps in your Version of BrainVoyager to match the appearance of those in the tutorials, select File/Settings..., and in the GUI tab, turn off Dark mode and change the Default Map Look-up Table to "Classic Map Colors (pre-v21)".

Does this mean that all of the colored voxels are relevant to our analysis? Not necessarily! Remember that, although we can calculate a correlation coefficient and p-value for every voxel, statistical signficance is determined by the threshold we use. There are a few ways to modify the threshold.

3) First, you can manually adjust the threshold by using the icons(circled in red) in the left side bar.

Question 6: Look at the color scale on the right of the coronal view.

a) What do the numbers on the scale represent? How do the numbers on the scale change when decreasing and increasing the threshold manually?

b) What do the colors in the scale represent? (Hint: Think about the valence of the correlation coefficients.) How can those colors be interpreted in the context of this Localizer run?

c) Given the nature of the predictor, which treshold would be most appropriate? In other words, what threshold highlights mostly relevant voxels for this type of stimulus? (Hint: Where would you expect activation?)

d) How many degrees of freedom are there in this analysis?

Now, setting thresholds manually is not an optimal way of going about it. Luckily, we have a useful tool allowing us to set very precise thresholds.

4) Select Analysis/Overlay Volume maps... . This will open the Volume Maps window. Next, click on the Statistics tab. Here, we can set the maximum and minimum correlation threshold manually in the Correlation range section. Try changing the min and max values and observe the result on the statistical maps.

Question 7:

a) Set the minimum threshold to 0.30 and the emaximum threshold to 0.80. What occurs when you increase the minimum threshold while keeping the maximum threshold constant?Consider the number of significant voxels and their location. If you keep increasing the minimum threshold, what type of error might you commit?

b) Now decrease the threshold until nearly every voxel has been coloured in. What is that threshold equivalent to? What does this map show us? How can you interpret the colours in this heatmap – does a voxel being coloured in orange-yellow vs. blue-green tell you anything about how much it activates to visual stimuli vs baseline?

c) What are two hints that a coloured voxel might not be "important" even though they are colored/statistically significant? What type of error might this situation be linked to?

5) Now, click on the Map Options tab. Here, we can set a threshold using p-values instead of correlation coefficients.

Question 8: Set the threshold p-value to 0.05 and click Apply. Given that a p-value threshold of .05 is the usual statistical standard, why is it suboptimal for voxelwise fMRI analyses?

We will discuss later on issues concerning mutliple comparisons, but for now, we will leave it like this. Simply remember that you can set the threshold using either the correlation coefficient or p-values of individual voxels.

So far, we have talked about how we can quantify the relationship between our data and a predictor as well as quantify explained variance using correlation. However, we have not discussed how we arrive to(or rather model) those predictors. This is what we will be discussing in the next section.

Modeling the Hemodynamic Response Function (HRF)

Recall that fMRI is an indirect mesure of brain activity. The expected neural activation is often modeled using a box-car function. We then use a mathematical procedure called convolution to model the hemodynamic response function(HRF).

In the following activity, fit the boxcar function to the data using the sliders. Then press on the convovle button to transform the boxcar function into the HRF.

Question 9: Based on the effect of convolution on your predictors, which type of Hemodynamic Response Function was employed in the convolution?

Now that we have understood how to model the HRF, let's put this into practice with BrainVoyager.

1) First, we will remove our previous model (skip this step if already done). Go back to the Browse tab in Volume Maps window, click Delete all, and then Close.

2) Select Analysis/Compute Linear Correlation Maps... . However, instead of loading a RTC like in the previous section, we will build our model directly on the graph showing the conditions. For each epoch(interval of time with one condition), left click to indicate no activation and right-click to indicate activation.

3) Once every condition has been covered, click on the HRF button. This will convolve our boxcar model into a HRF model. Finally, click Go .

You will notice that the maps are exactly the same to the previous section.

Question 10: Repeat the same procedure, but now model a predictor for Faces. Include a screen shot of your model in your assignment.

III. Applying the GLM to Single Voxel Time Courses

Beta Weights, Residuals and the Best Fit

In the first part of the tutorial, we discussed how we can know whether a voxel reliably activates to a certain stimulus. However, that doe not inform us on how activated the voxel is. Luckily the GLM can be used to estimate these levels of activation.

The GLM can be used to model data using the following form: y = β0 + β1X1 + β2X2 + ... + βnXn + e where:

y is the observed or recorded data,
β0 is a constant or intercept adjustment,
β1 ... βn are the beta weights or scaling factors
X1 ... Xn are the independent variables or predictors.
e is the error(residuals).

In the case of fMRI analyses, you can think of this as modelling a data series (time course) with a combination of predictor time courses, each scaled by beta weights, plus the "junk" time course (residuals) that is left over once you've modelled as much of the variance as you can. These three components of the GLM are illustrated for a very simple signal in the figure below.

Using this model, we can try to find the best fit. The best fitting model is found by finding the parameters(beta weights and constant) that maximzes the explained variance(signal), which is also equivalent to minizing the residuals.

Estimating Beta Weights

In the following animation, adjust the sliders until you think you have optimized the model for Voxel A (i.e., minimized the squared error). Make sure you understand the effect that the four betas and the constant are having on your model. You can test how well you did by clicking the Optimize GLM button, which will find the mathematically ideal solution.

Question 11:

a) How can you tell when you've chosen the best possible beta weight and constant? Consider (1) the similarity of the weighted predicted (purple) and the data (black) in the top panel; (2) the appearance of the residual plot in the bottom panel; and (3) the squared error and sum of residuals?

b) What can you conclude about Voxel A's selectivity for images of faces, hands, bodies and/or scrambled images?

c) Do the same for Voxel's E and F. What can you conclude about their selectivity for images of faces, hands, bodies and/or scrambled images?

Question 12: Define what a beta weight is and what it is not. In other words, what is the difference between a beta weight and a correlation?

What about the baseline?

You might have noticed that our models only have predictors(beta weights) for conditions and none for the baseline, and you might be wondering why that is. To explain this, let's go back to basic agebra.

Imagine you trying to find the value of x in the following equation: 2x + 3 = 5. You might have guessed the answer is x = 1.

Now imagine you are trying to find x for the following equation: 2x + 3y = 5. In this case, there are an inifinite number of solutions, making it hard to know which is the correct one!

If we use the same number of predictors as there are conditions, we have similar problem with time courses in fMRI analyses. In other words, having an equal number of conditions and predictors makes the model redundant. Let's try it out.

Adjust the sliders until you think you have optimized the model while keeping the constant at 0.

Question 13:

a) What betas did you find? Now compare these to the optimal betas for a single visual predictor. What do you notice?

b) How many possible combinations of betas and constant are possible for the model with baseline and visual predictors? Why is this suboptimal?

IV. Whole-brain Voxelwise GLM in BrainVoyager

In the previous section, we explained how the GLM can be used to model data in a single voxel by finding the best fit(in other words, the optimal beta weights and constant). Now, we can apply this to the whole brain by fitting one GLM for every voxel in the brain.

Conducting a GLM analysis

1) Again, we will remove our previous model (skip this step if already done). Go back to the Browse tab in Volume Maps window, click Delete all, and then Close.

2) Select Analysis/General Linear Model: Single Study. By single study, BrainVoyager means that we will be applying the GLM to a single run.

Click options and make sure that Exclude last condition ("Rest") is checked (the default seems to be that Exclude first condition ("Rest") is checked).

3) Click Define Preds to define the predictors in terms of the PRT file opened earlier and tick the Show All checkbox to visualize them. Notice the shape of the predictors – they are identical to the ones we used earlier for in Question 2. Now click GO to compute the GLM in all the voxels in our functional data set.

After the GLM has been fit, you should see a statistical "heat-map" . This initial map defaults to showing us statistical differences between the average of all conditions vs. the baseline . More on this later.

In order to create this heat map, BrainVoyager has computed the optimal beta weights at each voxel such that, when multiplied with the predictors, maximal variance in the BOLD signal is explained (under certain assumptions made by the model). Another, equivalent interpretation, is that BrainVoyager is computing beta weights that minimize the residuals. For each voxel, we then ask, “How well does our model of expected activation fit the observed data?” – which we can answer by computing the ratio of explained to unexplained variance.

It is important to understand that, in our example, BrainVoyager is computing four beta weights for each voxel – one for the Face predictor, one for the Hand predictor, one for the Bodies predictor and one for the Scrambled images predictors. And for each voxel, residuals are obtained by computing the difference between the observed signal, and the modelled or predicted signal – which is simply vertically scaled by the beta weights. This is the same as you did manually in Question 2.

But we don't simply want to estimate activation levels for each condition by computing beta weights; we also want to be able to tell if activation levels differ statistically between conditions!

Comparing Activation between Conditions

Just as with statistics on other data (such as behavioral data), we can use the same types of statistics (e.g, t-tests, ANOVAs) to understand brain activation differences and the choice of test depends on our experimental design (e.g., use a t-test to compare two conditions; use an ANOVA to compare multiple conditions or factorial designs). Statistics provides a way to determine how reliable the differences between conditions are, taking into account how noisy our data are. For fMRI data on single participants, we can estimate how noisy our data are based on the residuals.

For example, we can use a hypothesis test to test whether activation for the Face condition (i.e., the beta weight for Faces) is significantly higher than for Hands, Bodies and Scrambled images (i.e., the beta weight for Hands, Bodies and Scrambled images). Informally, for each voxel we ask, “Was activation for faces significantly higher than activation for Hands, Bodies and Scrambled images?”

To answer this, it is insufficient to consider the beta weights alone. We also need to consider how noisy the data is, as reflected by the residuals. Intuitively, we can expect that the relationship between beta weights is more accurate when the residuals are small.

Question 14: Why can we be more confident about the relationship between beta weights when the residual is small? If the residual were 0 at all time points, what would this say about our model, and about the beta weights? Think about this in terms of the examples in Questions 1 and 2.

We can perform these kind of hypothesis tests using contrasts . Contrasts allow you to specify the relationship between beta weights you want to test. Let's do an example where we wish to look at voxels with significantly difference activation for Faces vs Hands.

4) Select Analysis/Overlay General Linear Model... , then click the box next to Hands until it changes to [-], set Faces to [+] and the others to [].

Note: In BrainVoyager, the Face, Hand, Bodies and Scrambled predictor labels appear black while the Constant predictor label appears gray. This is because we are not usually interested in the value of the Constant predictor so it is something we can call a "predictor of no interest" and it is grayed out to make it less salient. Later, we'll see other predictors like this.

Question 15: In the lecture, we discussed contrast vectors. What is the contrast vector for the contrast you just specified?

Click OK to apply the contrast. Use the mouse to move around in the brain and search for hotspots of activation. By doing this, you are specifying a hypothesis test to test whether the beta weight for Faces is significantly different (and greater than) the beta weight for Hands. This hypothesis test will be applied over all voxels, and the resulting t-statistic and p-value determine the colour intensity in the resulting heat map.

Question 16: How can you interpret the colours in this heatmap – does a voxel being coloured in orange-yellow vs. blue-green tell you anything about how much it activates to visual stimuli vs baseline? Can you find any blobs that may correspond to the fusiform face area or the hand-selective region of the lateral occipitotemporal cortex?

Question 17: Imagine you wanted to use contrasts to find voxels that have significantly greater activation to Faces than to Hands, Bodies and Scrambled. What are two possible contrast vectors you could use to find these voxels? Which one is more liberal and which one is stricter?