Tutorial 9 - Multivariate Analyses Methods
This tutorial was adapted from a class project developed by Geoffrey Laforge, Kathleen Lyons, and Clara Stafford. Thanks!!!
Goals
To understand the difference between approaches that focus on multivariate (multivoxel) analyses rather than univariate (subtraction) analyses
To understand the method of Representational Similarity Analysis
To understand how Multidimensional Scaling can be applied the RSA data
To understand how theoretical models can be tested with RSA data
Relevant Lectures
Lecture 08a: Intro to MVPA
Lecture 08d: Representational Similarity Analysis (RSA)
Lecture 08f: Review of RSA
Accompanying Data
There is no fMRI data for this tutorial. There are two Excel files showing how Data RSMs are generated.
RSA-Example_sub-01_MainExp_Right-FFA_NoSplit.xlsx
RSA-Example_sub-01_MainExp_Right-FFA_OddEvenSplit.xlsx
Multi-voxel pattern analysis (MVPA)
Multivoxel pattern analysis (MVPA) is an analysis technique that allows us to ask questions about how information is represented spatially in different areas in the brain (Norman et al., 2006). To do this, instead of focusing on BOLD activity averaged across an entire brain region and across all participants, MVPA investigates the spatial pattern of activity across individual voxels in individual participants to determine if different cognitive states or "representations" of stimuli can be distinguished within those voxels. The different approaches of MVPA ask if the patterns of activity elicited from one condition or a set of conditions are different from or similar to the patterns of activity elicited from another condition or set of conditions in the same region (Mur et al., 2009). MVPA is different from a univariate approach to analyzing fMRI data because MVPA analyzes the relationship between experimental conditions and the pattern of activity across voxels, thus, it can characterize how different conditions (>2) are related to one another (Davis et al., 2014). For example, using a univariate approach, a researcher might determine that there is greater activation for faces than hands in the fusiform face area (FFA), suggesting that the FFA is sensitive to faces and not hands, but using a MVPA approach, we may be able to determine that the FFA carries information not only about faces and hands but a wide variety of other stimulus categories (e.g., chairs). This approach was pioneered by Haxby and colleagues (2001, Science) to show that even areas in the ventral temporal cortex that show peak activation to one type of stimulus category over others still (such as the FFA) carry information about those other categories (e.g., chairs, shoes) in the pattern of activity in each voxel in those areas.
There are two main "flavours" of MVPA:
1) Classifiers, which will not be discussed here
2) Correlations, especially representational similarity analysis, which is the most common correlational approach and will be explored further in this tutorial
Representational similarity analysis (RSA)
RSA is a type of multi-voxel pattern analysis (MVPA) for fMRI data. This method is based on the idea that populations of neurons within a brain region jointly represent information about a stimuli in a specific population code, or pattern of activity (Diedrichsen & Kriegeskorte, 2017; Kriegeskorte & Kievit, 2013). Specifically, representations of content are understood as points in a high dimensional space, and when the brain perceives this content, this leads to a specific pattern of firing of neurons that represent this point in representational space (Davis & Poldrack, 2013). Furthermore, we can characterize the information these populations of neurons are representing using representational geometry , or how far apart the points in a high dimensional pattern space are for different represented stimuli. An important point in this theory is that downstream neurons can read out the information being encoded from the neurons representing this information, and thus it is possible to decode represented information based on this population code (Diedrichsen & Kriegeskorte, 2017; Kriegeskorte & Kievet, 2013).
A note for the philosophically inclined: "Representations" is always a controversial term. Researchers who use RSA, operationally define the term as spatial patterns. Note that the brain may "represent" things in other ways (e.g., at a resolution finer than we can scan or by connections this approach can't measure). Be sure you don't take the term too literally.
Implementation and calculation of approaches
For RSA in fMRI, spatial activity patterns are computed for each ROI (or in searchlights, for each point in the brain). These activity patterns will often be beta weights for each voxel for each condition (but could alternatively be %signal change values, t values, etc.)
Comparisons of spatial patterns across conditions can be represented either in terms of the similarity of the voxel patterns or the dissimilarity . The first RSA studies used a simple Pearson correlation, r , to evaluate similarity. With this metric, dissimilarity is simple to compute: 1 - r .
Geek note: other metrics of distance have been proposed -- Euclidean distance, Mahalanobis distance and cross-validated Mahalanobis or "crossnobis" distance. Crossnobis distances are prefered but are more computationally intensive to calculate.
To keep things simple, because r is a relatively intuitive statistic, we will use r values for similarity and 1-r values for dissimilarity.
Computation of Represenatational Similarity Matrices
Unsplit Data
Open the Excel file: RSA-Example_sub-01_MainExp_Right-FFA_NoSplit.xlsx
This file shows how a Representational Similarity Matrix is computed when the data are combined, i.e., NOT split . This data is from one individual participant (sub-01).
Examine the first tab: Raw Betas.
Question 1: What do columns reflect? How would you use a GLM to generate the values in this tab? What do those values measure?
Question 2: Would you spatially smooth the data? Why or why not?
Step through each of the four tabs -- Step 1-4. Examine the computations in the cells and try to understand the series of steps performed.
Geek note: During Step 2, we normalized each voxel to its own mean , by subtracting the voxel’s overall mean activation from each condition’s activation, so that the overall mean is zero. This is often done to deal with the ‘common activation pattern’ that arises due to some voxels being generally more active or less active than others, which will be shared across some or all conditions and will impact the correlation across conditions (Diedrichsen & Kriegeskorte, 2017). There is some debate whether this is a valid way to correct for this problem (see Garrido et al, 2013), but a discussion of this is beyond this tutorial.
Question 3: Using one sentence per tab, explain each of the four steps in words.
Question 4: Why is it important to perform RSA on each individual participant rather than on averaged data across participants?
Understanding RSMs
The figure on the left depicts an example of a non-split representational similarity matrix (RSM) displaying the correlations between the patterns of activity representing each stimulus condition in the left FFA. Sometimes we will refer to such matrices derived from the data as RSMdata.
Note: Warmer colours indicate higher correlations. Note: This lookup table for r values is different from the color coding in the Excel spreadsheet.
Question 5: How would you interpret this data?
Let’s look at the RSMdata for the hand region of primary motor cortex (M1).
Question 6: How do you interpret this RSM and in what ways does it differ compared to the FFA RSM?
The data above was based on unsplit data. You may wish to refer back to the Excel file, RSA-Example_sub-01_MainExp_Right-FFA_NoSplit.xlsx
Question 7: Why do the diagonals have a value of similarity = 1, dissimilarity = 0 in unsplit data?
Computation of Representational Similarity Matrices
Split data
Open the Excel file: RSA-Example_sub-01_MainExp_Right-FFA_OddEvenSplit.xlsx
Now the data are split into even and odd runs. Note that for a single split, this is the simplest and best approach. For real data, a better approach is to to multiple splits and average the results. For example, with eight runs, we could do eight splits (comparing the correlations between each of the eight runs and the remaining seven runs) and average the correlation matrices.
For simplicity, we will just examine the even-odd split.
Question 8: Examining Step 1 of the spreadsheet, what is the key difference between unsplit and split data?
Question 9: How has the correlation matrix changed with split data? Consider (a) the diagaonal, (b) the diagonal symmetry of the matrix; and (c) how noisy the correlation data will be when data is split vs. unsplit.
RSM displaying the correlations between the patterns of activity representing each stimulus condition split into odd and even runs in the left FFA. Note: Warmer colours indicate higher correlations.
Question 10: What is the value added of seeing actual (dis)similarity metrics along the diagonal?
Multidimensional Scaling applied to RSA Data
We can visually represent the similarities between conditions using multidimensional scaling (MDS). Multidimensional scaling provides a spatial visualization of the relationship between conditions by preserving the similarities in a high-dimensional space and projecting them into a lower-dimensional representation. For example, the MDS for the left FFA is shown below.
Question 11:
a) How many dimensions were in the original data? How many dimensions are represented in the MDS plots?
b) What are the advantages and disadvantages of reducing dimensionality?
Question 12: How would you explain the MDS plots for Left FFA and Left M1 Hand? Can you tell from MDS plots whether conditions are quantitatively (i.e., statistically) different? Why or why not?
Statistical inferences in RSA
RSM Model Matrices (RSMmodel)
Prior to computing any of the processing steps described above, we want to start by defining some theoretical models that we think could represent the data. Keep in mind, these models are not data-driven, but hypothesis-driven , based on our expectation of the relationships between conditions. For the rest of this Tutorial green will indicate higher correlations and red will indicate lower correlations.
As a “sanity check”, one theoretical model can test whether each condition itself more similar to itself than any other condition.
Based on this class project, we would also hypothesize that faces would be more similar to faces and hands would be more similar to hands, – however faces would be relatively dissimilar to hands.
Question 13: What other hypothesis driven model could we potentially investigate using RSA in our data set? What would this model look like as an RSM?
Now that we’ve settled not only on some theoretical models but have also now computed ROI-driven RSMs and MDSs, we can move onto the final step RSA: statistical inference.
Comparing RSMdata to RSMmodel
Compare RSMs of each ROI to each theoretical model. The first step is to compute the correlation between a RSMdata from a particular brain region and a RSMmodel. Because this is a correlation between actual and hypothetical correlations, Jody called this a metacorrelation. Usually, and in the case of the course data, a rank order correlation (Spearman) correlation is used because we do not want to assume a linear relationship between the data RSM and model RSM values. Indeed, because the model RSM has ordinal values (e.g., +1 vs. -1) rather continuous values, a Spearman correlation is more appropriate.
For this step, the correlations will be equivalent if we use similarity metrics (r) or dissimilarity metrics (1-r). Either way, a higher correlation indicates a better fit between the RSMdata and the RSMmodel.
We would compute the metacorrelations for each region and each model in each participant.
Since we have computed similarity, in the first model (Diagonal), we can see that most of the reference RSMs have positive mean correlations with the candidate RSM, confirming our sanity check that each individual condition will be more similar to itself than any other condition in the majority of the ROIs. In all but 3 regions this mean correlation is above zero. However, note that the correlations within a stimulus category do not reach a value of 1.0.
Question 14: What could be reasons for the correlation not reaching 1?
Question 15: What do you conclude when inspecting the Hands vs. Face results?
RSA allows us to develop hypothesis-driven theoretical models that are designed to more accurately/plausibly capture the response similarity between different regions for the same (or different) stimulus categories. Overall, the majority of the mean correlations between the reference RSMs and the candidate RSMs in our ROIs were substantially higher than those observed using the diagonal model alone, particularly in the left and right FFA, OFA, and LOTC Hand regions.
Question 16: What does the fact that Hand vs. Face is a better model than the diagonal model tell you about how these areas represent these stimuli?
Question 17:
a) How could adding error bars allow you to determine whether the models account for significant variance in the patterns? What should those error bars represent?
b) Statistically, how could you determine whether one model performs significantly better than another?
If your brain is fried, you can stop here, but if you are going to use RSA in your own work, work through the next sections before you start analyzing data.
Geek Notes (Optional): Noise Ceiling
If participants' Data RSMs are highly consistent, then a Model RSM can explain the patterns well. However, if participants' Data RSMs are all very different from one another, then no model will perform well.
Brain regions differ in how consistent participants' data can be. Generally "primary" brain areas are more consistent than "association areas". For example, if we are examining primary visual cortex (V1) in a visual task, participants' data RSMs will be very consistent; whereas, if we are examining parietal cortex in a sensorimotor task, Data RSMs may be quite idiosyncratic.
The noise ceiling provides and estimate of how well the best possible model could perform. The noise ceiling is actually a range with a lower bound and an upper bound. These bounds are computed by correlating the Data RSM for each participant with a group-averaged Data RSM and averaging these values across participants. For the lower bound, the group-average Data RSM excludes the participant in question; for the upper bound, it includes the participant in question.
If a Data-Model metacorrelation falls within the range of the noise ceiling, we can conclude that the model does as well as any model could be expected to perform.
Examine the differences in the noise ceiling for the two brain regions shown below.
Geek Notes (Optional): Fisher Transforming r Values
Correlation values (Pearson r, Spearman rho) are not normally distributed, especially for absolute values greater than .5. This violates the assumptions of many statistical tests (like a one-sample t-test to determine whether Data-Model metacorrelations are significantly greater than zero). One way to normalize correlation coefficients is to apply a Fisher (z) transformation. You should do this before plotting r values and doing stats tests, especially if r > .5.
Geek Notes (Optional): Dealing with the Diagonal
Because unsplit data forces the Data RSM diagonal to have similarity = 1 (dissimilarity = 0), this can create problems for certain types of statistical testing of the metacorrelations between Data RSMs and Model RSMs. For a more detailed discussion, see Ritchie & Op de Beeck (2017).
One problem is that if the diagonal isn't excluded, Data-Model metacorrelations become spuriously high.
One easy solution to this problem is to exclude the diagonal.
The approach of excluding the diagonal works well for non-factorial designs. For example, Kreigeskorte's classic paper had 92 stimuli (human and animal faces & bodies, natural and manmade objects). This was NOT a factorial design.
Our course experiment, however, is a 2 Category (Faces/Hands) x 3 Orientation (Left/Centre/Right) factorial design. If we only discard the diagonal, the factorial contrast becomes unbalanced. For example, a contrast of Same Category vs. Different Category with the only the diagonal removed means that the Same Category correlation values always have different orientations; whereas, the Different Category correlation values can have the same or different orientations. Thus if there is a significant effect of Orientation regardless of category, this could erroneously lead to a negative correlation between RSM data and RSM model for category.
One solution to the problem with factorial designs is to also exclude off-diagonals to ensure that contrasts for one factor remain properly balanced over the other factor.
This all becomes rather complicated so a nicer solution is to analyze data with split data (ideally split across multiple permutations rather than just an even-odd split).