Across-Trials Within Session Testing with glhmm toolbox

In this tutorial, we are going to look at how to implement the across trials within sessions testing using the glhmm toolbox. This test is used to assess effect differences between trials in one or more experimental sessions and is therefore ideal to see trial by trial variability within a session.

In the real world scenarios, one would typically fit a Hidden Markov Model (HMM) to an actual dataset. However, for the sake of showing the concept of statistical testing, we just use synthetic data for both the independent variable and the dependent variable for the across_trials_within_session test.

We create synthetic data using the toolbox Genephys, developed by Vidaurre in 2023 (accessible at https://doi.org/10.7554/eLife.87729.2). Genephys makes it possible to simulate electrophysiological data in the context of a psychometric experiment. Hence, it can create scenarios where, for example, a subject is exposed to one or multiple stimuli while simultaneously recording EEG or MEG data.

While the process of preparing the data requires some explanation, executing the test (across_trials_within_session) itself is straightforward —simply input the variables D and R, and define the specific method you wish to apply. The methods include permutation using regression or permutation using correlation and is described in the paper Vidaurre et al. 2023.

Table of Contents

  1. Load and prepare data

    • Look at data

    • Prepare data for HMM

  2. Load or initialise and train HMM

    • Data restructuring

  3. Across-trials within session testing

    • Across-trials within sessions testing - Regression

    • Across-trials within sessions testing - Correlation

Install necessary packages

If you dont have the GLHMM-package installed, then run the following command in your terminal:

pip install glhmm

To use the function glhmm.statistics.py you also need to install the library’s:

pip install statsmodels
pip install tqdm
pip install -U scikit-image

```

Import libraries

Let’s start by importing the required libraries and modules.

[2]:
import os
import numpy as np
from glhmm import glhmm, graphics, preproc, statistics, auxiliary, io

1. Load and prepare data

First, we’ll load the synthetic data from this folder and use the glhmm toolbox to train a classic HMM on the synthetic data that represents EEG or MEG measurements.
Let’s start by loading the essential data for this tutorial:
[6]:
# Get the current directory
current_directory = os.getcwd()
folder_name = "\\data_statistical_testing"

# Load D data
file_name = '\\D_trials.npy'
file_path = os.path.join(current_directory+folder_name+file_name)
D_trials = np.load(file_path)

# Load R data
file_name = '\\R_trials.npy'
file_path = os.path.join(current_directory+folder_name+file_name)
R_trials = np.load(file_path).astype(int)


# Load indices
file_name = '\\idx_trials_session.npy'
file_path = os.path.join(current_directory+folder_name+file_name)
idx_trials_session = np.load(file_path)


# Load time indices for every trial
file_name = '\\idx_trials.npy'
file_path = os.path.join(current_directory+folder_name+file_name)
idx_trials = np.load(file_path)


print(f"Data dimension of D-session data: {D_trials.shape}")
print(f"Data dimension of R-session data: {R_trials.shape}")
print(f"Data dimension of indices for the number of trials for each session: {idx_trials_session.shape}")
print(f"Data dimension of indices: {idx_trials.shape}")
Data dimension of D-session data: (250, 1421, 16)
Data dimension of R-session data: (1421,)
Data dimension of indices for each trial: (10, 2)
Data dimension of indices: (1421, 2)

Look at data

Now we can look at the data structure. - D_sessions: 3D array of shape (n_timepoints, n_trials, n_features) - R_sessions: 3D array of shape (n_trials,) - idx_trials_session: 2D array of shape (n_sessions, 2) - idx_trials: 2D array of shape (n_trials, 2)

D_sessions represents the data collected from the subject, structured as a list with three elements: [250, 1421, 16]. The first element indicates that the subject underwent measurement across 250 timepoints. The second element, 1421 corresponds to the total number of trials conducted. In this context, 10 distinct sessions were executed, each comprising 150 trials, lead up to a total of 1421 trials (150*10). Each individual trial involved the measurement of 16 channels within the EEG or MEG scanner.

R_sessions categorial values of when a stimulus is prestented. It contain values of 1 and 2.

idx_trials_session.shape = [10, 2] shows the indices for the number of sessions conducted, which in this case is 10 The values in each row represent the indices for the first and last trial for a given session. As you can see the number of trials are different from each session.

np.diff(idx_trials_session)
array([[129],
       [132],
       [150],
       [150],
       [150],
       [150],
       [150],
       [150],
       [111],
       [149]])

Lastly, we have idx_trials.shape = [1421, 2], which marks the number of trials conducted, which in this case is 1421 The values in each row represent the start and end indices for every trial. Having this index to seperate the time period for each trial is required when training a time-delay embedded HMM (TDE-HMM).

Prepare data for HMM

When you’re getting into training a Hidden Markov Model (HMM), the input data needs to follow a certain setup. The data shape should look like ((number of timepoints * number of trials), number of features). This means you’ve lined up all the trials from different sessions side by side in one long row. The second dimension is the number of features, which could be the number of parcels or channels.

So, in our scenario, we’ve got this data matrix, D_session, shaped like [250, 1421, 16] (timepoints, trials, channels). Now, when we bring all those trials together, it’s like stacking them up to create a new design matrix, and it ends up with a shape of [355250, 16] (timepoints * trials, channels). Beside that we also need to update R_session and idx_trials_session to sync up with the newly concatenated data. To make life easier, we’ve got the function get_concatenate_sessions. It does the heavy lifting for us.

Note: it is important to use ``idx_trials_session`` for this function as the concatenation is done trial-by-trial basis for every defined session.

[13]:
D_con,R_con,idx_con=statistics.get_concatenate_sessions(D_trials, R_trials, idx_trials_session)
print(f"Data dimension of the concatenated D-data: {D_con.shape}")
print(f"Data dimension of the concatenated R-data: {R_con.shape}")
print(f"Data dimension of the updated time stamp indices: {idx_con.shape}")
Data dimension of the concatenated D-data: (355250, 16)
Data dimension of the concatenated R-data: (355250,)
Data dimension of the updated time stamp indices: (10, 2)

For a quick sanity check, let’s verify whether the concatenation was performed correctly on D_trials. We’ve essentially stacked up every timepoint from each trial sequentially.

To do this, we can compare a slice of our original design matrix, say D_trials[:, 0, :], with the corresponding slice in the concatenated data, D_con[0:250, :].
If the comparison D_trials[:, 0, :] == D_con[0:250, :] holds true, we’re essentially confirming that all timepoints in the first trial align perfectly with the first 250 values in our concatenated data. It’s like double-checking to make sure everything lined up as expected.
[15]:
D_trials[:,0,:]==D_con[0:250,:]
[15]:
array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]])

Here, it’s evident that the concatenation process has been executed accurately.

Next up, let’s confirm if the values in idx_con have been appropriately updated. Each row in this matrix should represent the total count of timepoints and trials for each of the 10 sessions.

[16]:
idx_con
[16]:
array([[     0,  32250],
       [ 32250,  65250],
       [ 65250, 102750],
       [102750, 140250],
       [140250, 177750],
       [177750, 215250],
       [215250, 252750],
       [252750, 290250],
       [290250, 318000],
       [318000, 355250]])

Indeed, each session now aligns with the number of datapoints. This means that when we pooled together the timepoints and trials, the count for each session ended up exactly as expected. It’s a reassuring confirmation that our concatenation didn’t miss a beat.

NOTE: It is important to standardise your timeseries and, if necessary, apply other kinds of preprocessing before fitting the model.
This will be done separately for each session/subject as specified in the indices. The data provided here are already close to standardised (so the code below will not do much), but see Prediction tutorial to see the effect on real data.
[17]:
D_con_preproc,_ = preproc.preprocess_data(D_con, idx_con)

2. Load data or initialise and train HMM

You can either load the Gamma values from a pretrained model or train your own model. If you prefer the former option, load the data (Gamma_trials) from the data_statistical_testing folder. In this example we have trained a TDE-HMM, incorporating mean and covariance parameters for 6 distinct states. TDE-HMM calculates the covariance and then compares the covariance at different time points. If the covariances are similar, the time points are assigned to the same state; otherwise, they are assigned to different states.

If you decide to train your own TDE-HMM model, you need to prepare the dataset by adding lags that act like a sliding window. These lags capture temporal dependencies in the data and span from -n to n, passing through zero. The code below shows how to define such lags:

# Create lag array
lag_val =list(range(-7, 8, 1))
print(lag_val)
[-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]

Once the dataset is appropriately prepared, you can proceed with training the TDE-HMM.

[18]:
# Define the number of lags (Window size)
lag_val =list(range(-7, 8, 1))
NOTE: It is important to standardise your timeseries and, if necessary, apply other kinds of preprocessing before fitting the model.
This will be done separately for each session/subject as specified in the indices. The data provided here are already close to standardised (so the code below will not do much), but see Prediction tutorial to see the effect on real data.
[22]:
D_con_preproc,_ = preproc.preprocess_data(D_con, idx_trials)
We can now prepare the dataset by using the following function preproc.build_data_tde, so it is ready for the TDE-HMM.
The inputs will be the concatenated data (D_con), the indices that defines the running period for each trial (idx_trials) of the concatenated data, and the lag kernel (lag_val) that will run over each trial.
[23]:
# Prepare the dataset for TDE-HMM
D_con_tde,idx_tde = preproc.build_data_tde(D_con_preproc, idx_trials,lag_val)
[25]:
print(f"Data dimension of the concatenated TDE D-data: {D_con_tde.shape}")
print(f"Data dimension of the updated TDE time stamp indices: {idx_tde.shape}")
Data dimension of the concatenated TDE D-data: (335356, 240)
Data dimension of the updated TDE time stamp indices: (1421, 2)

Upon examination, you’ll notice that the dimension of the concatenated TDE D-data has undergone a transformation from (355250, 16) to (335356, 240).

This transformation is a result of the specified kernel size for the lag, applied individually to each trial. The defined lag kernel ranges from -7 to 7, indicating that we have pooled/excluded 7 data points from both the beginning and end of each trial. In essence, the original trial length of 250 time points has been reduced to 236 (250 - 14).

This adjustment applies for all 1421 trials in our dataset. Initially, the total number of data points in the concatenated data was 355250 (1421 * 250), as indicated at the beginning of this notebook when we loaded the data. However, after processing, D_con_tde now contains 335356 data points, equivalent to 1421 * 236.

If look into idx_tde, we can see that the number period for each trial are 236 points, which match with the pooled 7 data points from both the beginning and end of each trial based on the defined lag_val

Checking out idx_tde, you’ll notice that each trial now spans 236 points, matching the 7 pooled data points from both ends based on the specified kernel size of lag_val.

[26]:
# Look at idx_tde
idx_tde
[26]:
array([[     0,    236],
       [   236,    472],
       [   472,    708],
       ...,
       [334648, 334884],
       [334884, 335120],
       [335120, 335356]])

Initialize-HMM

If you want to train your own HMM you need to do the following steps. First, we need to initialize the hmm object and set the hyperparameters according to our modeling preferences.

In this case, we choose not to model an interaction between two sets of variables in the HMM states, so we set model_beta='no'. For our estimation, we choose K=6 states. If you wish to model a different number of states, simply adjust the value of K.

Our modeling approach involves representing states as Gaussian distributions with mean and a full covariance matrix. This means that each state is characterized by a mean amplitude and a functional connectivity pattern. To specify this configuration, set covtype='full'. If you prefer not to model the mean, you can include model_mean='no'. Optionally, you can check the hyperparameters to make sure that they correspond to how you want the model to be set up.

[27]:
# Create an instance of the glhmm class
K = 6 # number of states
hmm = glhmm.glhmm(model_beta='no', K=K, covtype='full')
print(hmm.hyperparameters)
{'K': 6, 'covtype': 'full', 'model_mean': 'state', 'model_beta': 'no', 'dirichlet_diag': 10, 'connectivity': None, 'Pstructure': array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True]]), 'Pistructure': array([ True,  True,  True,  True,  True,  True])}

Train an HMM

Now, let’s proceed to train the TDE-HMM using the data (D_con_tde) and time indices (idx_tde).

Since in this case, we are not modeling an interaction between two sets of timeseries but opting for a “classic” HMM, we set X=None. For training, Y should represent the timeseries from which we aim to estimate states (D_con_tde), and indices should encompass the beginning and end indices of each subject (idx_tde).

If you want to use the same Gamma values as used in this tutorial without training the HMM, you can just load Gamma_tde using the code below:

current_directory = os.getcwd()
folder_name = "\\data_statistical_testing"

# Load Gamma data
file_name = '\\Gamma_trials_tde.npy'
file_path = os.path.join(current_directory+folder_name+file_name)
Gamma_tde = np.load(file_path)
print(f"Data dimension of Gamma: {Gamma_tde.shape}")
[28]:
Gamma_tde,Xi,FE = hmm.train(X=None, Y=D_con_tde, indices=idx_tde)
Init repetition 1 free energy = 119264226.50762889
Init repetition 2 free energy = 119259168.97863053
Init repetition 3 free energy = 119257550.35557146
Init repetition 4 free energy = 119256521.38384846
Init repetition 5 free energy = 119271238.60120319
Best repetition: 4
Cycle 1 free energy = 119253203.09003511
Cycle 2 free energy = 119245210.24317718
Cycle 3, free energy = 119241854.33440413, relative change = 0.29570720193210026
Cycle 4, free energy = 119239426.05418883, relative change = 0.1762556360009167
Cycle 5, free energy = 119237531.72870162, relative change = 0.12087817049838594
Cycle 6, free energy = 119235904.07036808, relative change = 0.09408962848005521
Cycle 7, free energy = 119234575.11804754, relative change = 0.07134176073661225
Cycle 8, free energy = 119233489.23140731, relative change = 0.05508239968251576
Cycle 9, free energy = 119232576.87392563, relative change = 0.044232906163953985
Cycle 10, free energy = 119231804.17373762, relative change = 0.0361093140077085
Cycle 11, free energy = 119231134.52726391, relative change = 0.03034390959884523
Cycle 12, free energy = 119230519.87423247, relative change = 0.027097261551735526
Cycle 13, free energy = 119229955.57935247, relative change = 0.024273346411410088
Cycle 14, free energy = 119229443.16802765, relative change = 0.021566203990617278
Cycle 15, free energy = 119228964.52721101, relative change = 0.019747079070543096
Cycle 16, free energy = 119228503.53525163, relative change = 0.018663978497559092
Cycle 17, free energy = 119228069.05843864, relative change = 0.01728639559160872
Cycle 18, free energy = 119227671.83659898, relative change = 0.015558258456053548
Cycle 19, free energy = 119227319.46544246, relative change = 0.013613671271286822
Cycle 20, free energy = 119226993.80970526, relative change = 0.012425207144486392
Cycle 21, free energy = 119226680.07497308, relative change = 0.011828773291688225
Cycle 22, free energy = 119226368.01619634, relative change = 0.011628765346842746
Cycle 23, free energy = 119226078.5384506, relative change = 0.010672167052777983
Cycle 24, free energy = 119225785.61695561, relative change = 0.010683752442874707
Cycle 25, free energy = 119225516.74816173, relative change = 0.00971124300583444
Cycle 26, free energy = 119225278.73176758, relative change = 0.008523611961747111
Cycle 27, free energy = 119225056.93811518, relative change = 0.007880070179089574
Cycle 28, free energy = 119224843.83558838, relative change = 0.00751439101472628
Cycle 29, free energy = 119224637.62302731, relative change = 0.007218945904668896
Cycle 30, free energy = 119224450.96477953, relative change = 0.006491980892700244
Cycle 31, free energy = 119224258.23670292, relative change = 0.006658457529395798
Cycle 32, free energy = 119224067.36971983, relative change = 0.006550961535204878
Cycle 33, free energy = 119223886.99350812, relative change = 0.0061528045369760595
Cycle 34, free energy = 119223710.27984613, relative change = 0.00599175395126114
Cycle 35, free energy = 119223541.67490324, relative change = 0.005684318908536486
Cycle 36, free energy = 119223381.4774265, relative change = 0.005371858284363589
Cycle 37, free energy = 119223235.28902942, relative change = 0.00487818232148375
Cycle 38, free energy = 119223090.09272525, relative change = 0.004821715443275324
Cycle 39, free energy = 119222947.1667023, relative change = 0.004723902205021832
Cycle 40, free energy = 119222810.94752523, relative change = 0.0044820524588882645
Cycle 41, free energy = 119222671.333722, relative change = 0.004572740650943276
Cycle 42, free energy = 119222534.22222911, relative change = 0.004470706051310571
Cycle 43, free energy = 119222399.62240721, relative change = 0.0043696321310740215
Cycle 44, free energy = 119222259.80683613, relative change = 0.004518446545750393
Cycle 45, free energy = 119222120.45363739, relative change = 0.004483313350701833
Cycle 46, free energy = 119221984.73287804, relative change = 0.0043474664174107565
Cycle 47, free energy = 119221851.95204403, relative change = 0.004235279563278975
Cycle 48, free energy = 119221721.96828572, relative change = 0.004128943032440616
Cycle 49, free energy = 119221601.01146865, relative change = 0.003827495613025197
Cycle 50, free energy = 119221474.98727335, relative change = 0.00397200539360458
Cycle 51, free energy = 119221343.05676597, relative change = 0.004140940665764819
Cycle 52, free energy = 119221218.25587843, relative change = 0.0039018769623330833
Cycle 53, free energy = 119221102.32888918, relative change = 0.003611347055925424
Cycle 54, free energy = 119220986.94857867, relative change = 0.0035814441236432975
Cycle 55, free energy = 119220877.35949978, relative change = 0.003390150108704536
Cycle 56, free energy = 119220777.55981235, relative change = 0.0030778120433082607
Cycle 57, free energy = 119220671.2965957, relative change = 0.003266442007180269
Cycle 58, free energy = 119220577.42033805, relative change = 0.0028773741202792746
Cycle 59, free energy = 119220488.9181425, relative change = 0.0027053167001294376
Cycle 60, free energy = 119220404.41093114, relative change = 0.0025765431311874044
Cycle 61, free energy = 119220318.85549068, relative change = 0.0026017160393406987
Cycle 62, free energy = 119220239.07337594, relative change = 0.002420278923199473
Cycle 63, free energy = 119220154.93548843, relative change = 0.0025459178783746777
Cycle 64, free energy = 119220065.43711965, relative change = 0.002700805908405876
Cycle 65, free energy = 119219981.45515451, relative change = 0.0025279299300391784
Cycle 66, free energy = 119219902.00526185, relative change = 0.002385804942871341
Cycle 67, free energy = 119219816.29023233, relative change = 0.0025673328989033275
Cycle 68, free energy = 119219735.30982743, relative change = 0.002419652704661439
Cycle 69, free energy = 119219663.35695413, relative change = 0.002145302502281485
Cycle 70, free energy = 119219591.4740997, relative change = 0.002138631316327993
Cycle 71, free energy = 119219522.34538575, relative change = 0.002052469880644904
Cycle 72, free energy = 119219457.1521474, relative change = 0.0019318840260031782
Cycle 73, free energy = 119219393.19331424, relative change = 0.0018917192704269194
Cycle 74, free energy = 119219328.71732959, relative change = 0.0019033853470403867
Cycle 75, free energy = 119219251.4185293, relative change = 0.002276730330534853
Cycle 76, free energy = 119219171.80551215, relative change = 0.0023394067627096004
Cycle 77, free energy = 119219095.58795358, relative change = 0.0022346273960658332
Cycle 78, free energy = 119219022.12949583, relative change = 0.0021491045480902292
Cycle 79, free energy = 119218956.90003103, relative change = 0.0019047218038384254
Cycle 80, free energy = 119218890.664493, relative change = 0.001930365953124811
Cycle 81, free energy = 119218821.30192783, relative change = 0.0020174216926524243
Cycle 82, free energy = 119218746.32255158, relative change = 0.002176042087880125
Cycle 83, free energy = 119218668.16309215, relative change = 0.0022632003697617004
Cycle 84, free energy = 119218595.07122192, relative change = 0.0021119923283662045
Cycle 85, free energy = 119218527.850746, relative change = 0.001938572805558547
Cycle 86, free energy = 119218469.65362822, relative change = 0.0016755358470383341
Cycle 87, free energy = 119218411.62792337, relative change = 0.0016678144959651976
Cycle 88, free energy = 119218355.37918171, relative change = 0.0016141301761018617
Cycle 89, free energy = 119218304.68007606, relative change = 0.0014527626247063986
Cycle 90, free energy = 119218254.65085329, relative change = 0.0014315152247929561
Cycle 91, free energy = 119218206.49696681, relative change = 0.0013759592651815917
Cycle 92, free energy = 119218164.15929917, relative change = 0.0012083036426848088
Cycle 93, free energy = 119218124.63199206, relative change = 0.0011268256734526516
Cycle 94, free energy = 119218085.70574164, relative change = 0.0011084609858115764
Cycle 95, free energy = 119218044.59818137, relative change = 0.0011692071562110506
Cycle 96, free energy = 119217999.08090115, relative change = 0.0012929572892614212
Cycle 97, free energy = 119217953.29597104, relative change = 0.0012988708537347165
Cycle 98, free energy = 119217908.55694534, relative change = 0.0012675908074295272
Cycle 99, free energy = 119217869.32626413, relative change = 0.0011102887726603858
Cycle 100, free energy = 119217825.95355925, relative change = 0.0012260094851951626
Finished training in 12262.91s : active states = 6

Now, checking out Gamma, you’ll see it’s got the same number of points as the concatenated dataset (335356 ). Each column in Gamma stands for one of the six different states.

[30]:
Gamma_tde.shape
[30]:
(335356, 6)
Reconstruct the shape of Gamma
Now, let’s reshape Gamma to match the original dataset by padding the data. We’ll use the auxiliary.padGamma function, which takes Gamma, T, and option as inputs. * Gamma: This is the Gamma we obtained from the TDE-HMM. * T: Represents the desired number of data points for each trial, which needs to go back from 236 to 250. We can obtain T using the auxiliary.get_T function, which requires the original indices idx_trials as input. * option: A dictionary where the key can be either “embeddedlags” or “order”. Since we’re using lags in this case, we set the key to “embeddedlags,” and the value is the variable lag_val.
[31]:
# Get T
T = auxiliary.get_T(idx_trials)
# Define options
options ={'embeddedlags':lag_val}
Gamma =auxiliary.padGamma(Gamma_tde,T,options=options)

Look at Gamma

[42]:
#Look at shape for Gamma and D_con
Gamma.shape, D_con.shape
[42]:
((355250, 6), (355250, 16))

Now, you’ll notice that the number of data points in both Gamma and D_con are back in sync.

Taking a closer look at the padding for the initial trial in state 1, you can observe that padding has been applied to the first 7 data points and the last 7 data points.

[33]:
# As we can see the padding works, since the first and last 7 numbers
# has been added to the timepoints for every trial
Gamma[:10,0].round(3),Gamma[240:250,0].round(3)
[33]:
(array([0.229, 0.229, 0.229, 0.229, 0.229, 0.229, 0.229, 0.   , 0.   ,
        0.   ]),
 array([0.607, 0.775, 0.783, 0.229, 0.229, 0.229, 0.229, 0.229, 0.229,
        0.229]))

Data restructuring

Now we have trained our HMM and got our Gamma values we need to restructure the data back to the original data structure. In this case we are not doing HMM-aggregated statistics, but we will instead perform the statistical testing per time point. We will acheive this by applying the function reconstruct_concatenated_design. It takes a concatenated 2D matrix and converts it into a 3D matrix. So, it will convert Gamma, shaped like [355250, 6] back to the original format for number of time points and trials shaped like [250, 1421, 6] (timepoints, trials, states).

[34]:
# Reconstruct the Gamma matrix
n_timepoints, n_trials, n_channels = D_trials.shape[0],D_trials.shape[1],Gamma.shape[1]
Gamma_reconstruct =statistics.reconstruct_concatenated_design(Gamma,n_timepoints=n_timepoints, n_trials=n_trials, n_channels=n_channels)
Gamma_reconstruct.shape
[34]:
(250, 1421, 6)

As a sanity check we will see if Gamma_reconstruct is actually structured correctly by comparing it with Gamma.

To do this, we can compare a slice of our 3D-matrix, like Gamma_reconstruct[:, 0, :], with the corresponding slice in the concatenated 2D-data, Gamma[0:250, :].
If the comparison Gamma_reconstruct[:, 0, :] == Gamma[0:250, :] holds true, we’re essentially confirming that all timepoints in the first trial align perfectly with the first 250 values in our concatenated data.
[35]:
Gamma_reconstruct[:, 0, :] == Gamma[0:250, :]
[35]:
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       ...,
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True]])

What we can see here is that all the values are True and we therefore know that the restructuring is performed correctly.

3. Across-trials within session testing

As we move on to the next part of this tutorial, let’s dive into how we can use the across_trials_within_subject function. This function helps us to find connections between HMM state occurrences (D) and behavioral variables or individual traits (R) using permutation testing.

Permutation testing
Permutation testing does not assume any particular data distribution and the procedure shuffles the data around to create a null distribution. This null distribution comes in handy for testing our hypotheses without making any bold assumptions about the data. This null distribution becomes our benchmark to test the big question: is there any real difference or relationship between the variables we’re interested in?

image.png

Figure 5C in Vidaurre et al. 2023: A 9 x 4 matrix representing permutation testing across trials. Each number corresponds to a trial within a session, and permutations are performed within sessions.

Hypothesis * Null Hypothesis (H0): No significant relationship exists between the independent variables and the dependent variable. * Alternative Hypothesis (H1): There is a significant relationship between the independent variables and the dependent variable.

Across-trials within sessions testing - Regression

In regression analysis, we are trying to explain the relationships between predictor variables (D) the response variable (R). Our goal is to identify the factors that contribute to changes in our signal over time. The permutation test for explained variance assess the statistical significance of relationships between state time courses (D) and behavioral measurements (R). A significant result indicates that certain patterns within the state time courses (Gamma_reconstruct) significantly contribute in explaining why the behavioral measurements varies across trials. A non-significant result suggests that he state time courses may not play a role in accounting for the variability of the behavioral measurements.

Run the across_trials_within_session function:
To set the wheels in motion for the across_trials_within_session function, input the variables Gamma_reconstruct (D) and R_trials (R). Additionally, you can account for potential confounding variables by regressing them out through permutation testing. Initiating regression-based permutation testing involves setting method="regression". For an in-depth comprehension of the function look at the documentation.
[36]:
# Set the parameters for across sessions within subject testing
method = "regression"
Nperm = 1000 # Number of permutations (default = 1000)
test_statistics_option = True

# Perform across-trial testing
result_regression  =statistics.test_across_sessions_within_subject(Gamma_reconstruct, R_trials, idx_trials_session,method=method,
                                                                 Nperm=Nperm, test_statistics_option=test_statistics_option)
performing permutation testing per timepoint
100%|██████████| 250/250 [05:56<00:00,  1.43s/it]

We can now examine the local result_regression variable.

[43]:
result_regression
[43]:
{'pval': array([0.82617383, 0.83616384, 0.81518482, 0.82817183, 0.83516484,
        0.82717283, 0.83016983, 0.42557443, 0.49150849, 0.57042957,
        0.64135864, 0.4955045 , 0.20779221, 0.07792208, 0.07892108,
        0.06393606, 0.06393606, 0.05094905, 0.06593407, 0.02797203,
        0.03496503, 0.04895105, 0.08991009, 0.18681319, 0.13086913,
        0.03996004, 0.00999001, 0.004995  , 0.000999  , 0.000999  ,
        0.001998  , 0.001998  , 0.000999  , 0.001998  , 0.000999  ,
        0.000999  , 0.000999  , 0.000999  , 0.000999  , 0.000999  ,
        0.000999  , 0.000999  , 0.000999  , 0.000999  , 0.000999  ,
        0.000999  , 0.000999  , 0.000999  , 0.000999  , 0.000999  ,
        0.000999  , 0.000999  , 0.003996  , 0.00799201, 0.04495504,
        0.16183816, 0.22877123, 0.41258741, 0.38061938, 0.28871129,
        0.11888112, 0.17482517, 0.33566434, 0.4025974 , 0.47752248,
        0.43156843, 0.51148851, 0.41558442, 0.27972028, 0.09390609,
        0.03696304, 0.03896104, 0.05494505, 0.13486513, 0.52247752,
        0.91908092, 0.94305694, 0.82417582, 0.46353646, 0.26473526,
        0.30669331, 0.15984016, 0.16083916, 0.18981019, 0.15284715,
        0.22677323, 0.31268731, 0.31968032, 0.32267732, 0.4975025 ,
        0.5024975 , 0.49450549, 0.44755245, 0.34765235, 0.26073926,
        0.15384615, 0.13686314, 0.15284715, 0.05894106, 0.05094905,
        0.07892108, 0.14785215, 0.24875125, 0.47352647, 0.60739261,
        0.56843157, 0.5024975 , 0.46153846, 0.52147852, 0.42557443,
        0.31068931, 0.15284715, 0.07692308, 0.15684316, 0.15084915,
        0.11588412, 0.07092907, 0.06493506, 0.1008991 , 0.09090909,
        0.08991009, 0.19080919, 0.26473526, 0.35964036, 0.44555445,
        0.56343656, 0.43356643, 0.35664336, 0.15684316, 0.06193806,
        0.08791209, 0.08291708, 0.20979021, 0.49150849, 0.3986014 ,
        0.38161838, 0.33366633, 0.16983017, 0.09090909, 0.17682318,
        0.33666334, 0.50849151, 0.63236763, 0.54545455, 0.38361638,
        0.57442557, 0.86713287, 0.95404595, 0.85314685, 0.8011988 ,
        0.8001998 , 0.75424575, 0.75224775, 0.89410589, 0.82217782,
        0.64835165, 0.71528472, 0.82817183, 0.75324675, 0.82017982,
        0.82617383, 0.65534466, 0.63836164, 0.53246753, 0.2967033 ,
        0.28971029, 0.19180819, 0.1038961 , 0.14485514, 0.17882118,
        0.3026973 , 0.2987013 , 0.25574426, 0.42057942, 0.66733267,
        0.58441558, 0.41458541, 0.35664336, 0.33366633, 0.31968032,
        0.37862138, 0.51648352, 0.3016983 , 0.17182817, 0.23176823,
        0.3006993 , 0.31968032, 0.37262737, 0.4015984 , 0.52547453,
        0.46953047, 0.43556444, 0.46353646, 0.42757243, 0.47352647,
        0.61238761, 0.48151848, 0.38061938, 0.36463536, 0.45254745,
        0.4045954 , 0.52347652, 0.69030969, 0.47352647, 0.27072927,
        0.13886114, 0.03096903, 0.12987013, 0.31868132, 0.37462537,
        0.47452547, 0.38361638, 0.59440559, 0.72127872, 0.6983017 ,
        0.91508492, 0.83516484, 0.76023976, 0.45454545, 0.35664336,
        0.44055944, 0.55244755, 0.8021978 , 0.78721279, 0.79320679,
        0.73526474, 0.73226773, 0.64235764, 0.6953047 , 0.78721279,
        0.8031968 , 0.62337662, 0.54045954, 0.62037962, 0.88011988,
        0.9040959 , 0.84815185, 0.73526474, 0.8031968 , 0.8041958 ,
        0.81618382, 0.83016983, 0.85414585, 0.82917083, 0.83516484,
        0.83216783, 0.83516484, 0.83516484, 0.83616384, 0.81918082]),
 'base_statistics': array([0.00152557, 0.00152557, 0.00152557, 0.00152557, 0.00152557,
        0.00152557, 0.00152557, 0.00355335, 0.00335919, 0.00275029,
        0.00239868, 0.00299742, 0.00484498, 0.00664448, 0.00710931,
        0.0075206 , 0.00753915, 0.00761362, 0.00802744, 0.00909734,
        0.00929879, 0.00840498, 0.00688991, 0.0055566 , 0.00584881,
        0.00795057, 0.01015599, 0.01215342, 0.01295658, 0.01503012,
        0.01671473, 0.01732149, 0.01628994, 0.01533038, 0.01670374,
        0.01840546, 0.01909428, 0.02135494, 0.02396775, 0.02392465,
        0.02299056, 0.02205882, 0.01948206, 0.02028647, 0.02002757,
        0.02062544, 0.02311874, 0.02391866, 0.02629927, 0.0250046 ,
        0.0220388 , 0.01479333, 0.01242723, 0.00985529, 0.00799354,
        0.00573933, 0.0049909 , 0.00365704, 0.00377916, 0.00418488,
        0.00548839, 0.00490076, 0.00364511, 0.00354309, 0.00298708,
        0.0034286 , 0.00274963, 0.00311732, 0.00419286, 0.00640852,
        0.00733213, 0.00751996, 0.00715208, 0.005567  , 0.00266   ,
        0.00088286, 0.00072269, 0.00147211, 0.0031954 , 0.00447261,
        0.00414334, 0.00546242, 0.00558262, 0.00559047, 0.00596309,
        0.00507322, 0.00447394, 0.00427879, 0.00435773, 0.00326265,
        0.00310605, 0.00306627, 0.00342902, 0.00375874, 0.0046028 ,
        0.00527866, 0.00590675, 0.00539823, 0.00718405, 0.0078504 ,
        0.00695972, 0.0057358 , 0.00460667, 0.0033314 , 0.00241346,
        0.00261789, 0.00319234, 0.00338133, 0.00305567, 0.00359237,
        0.00435891, 0.00555919, 0.0069127 , 0.0059154 , 0.0059709 ,
        0.0063108 , 0.00724478, 0.00736088, 0.00621053, 0.00610742,
        0.006065  , 0.00494495, 0.00469717, 0.00393354, 0.00344847,
        0.00278131, 0.00364359, 0.00416027, 0.00605273, 0.00739671,
        0.00688738, 0.00680314, 0.00541966, 0.00334846, 0.00363692,
        0.00373487, 0.00386879, 0.00565754, 0.00712014, 0.00565005,
        0.00424801, 0.00312165, 0.0025666 , 0.00283141, 0.0038998 ,
        0.00277266, 0.00126029, 0.00078598, 0.00146858, 0.00164875,
        0.00161085, 0.00167066, 0.00169014, 0.00108937, 0.00154487,
        0.0022622 , 0.00199099, 0.00151932, 0.00184527, 0.00164931,
        0.00159995, 0.00247406, 0.00236926, 0.00306686, 0.0043187 ,
        0.00418697, 0.00498404, 0.0060107 , 0.00560621, 0.00530214,
        0.00417067, 0.00425688, 0.00424244, 0.00355153, 0.00224559,
        0.0026958 , 0.00345149, 0.00368147, 0.00387748, 0.00415365,
        0.00364888, 0.00287468, 0.00440553, 0.00551662, 0.00473598,
        0.00407955, 0.00404108, 0.00384604, 0.00359676, 0.00289264,
        0.00323725, 0.00338039, 0.0031771 , 0.00349182, 0.00330748,
        0.00273129, 0.00334223, 0.00399106, 0.00396614, 0.00353684,
        0.00389192, 0.0029765 , 0.00218807, 0.00312441, 0.00434908,
        0.00606284, 0.00871735, 0.00615877, 0.00423073, 0.00369278,
        0.00315194, 0.0033374 , 0.00247575, 0.00172842, 0.00190332,
        0.0009971 , 0.00133368, 0.00179085, 0.00318215, 0.0035914 ,
        0.00295294, 0.00258281, 0.0016815 , 0.00168344, 0.00157581,
        0.0018573 , 0.00184346, 0.00229882, 0.00202635, 0.00180091,
        0.00174787, 0.00272123, 0.00301945, 0.00246625, 0.00130246,
        0.0012035 , 0.0014933 , 0.00208307, 0.00159815, 0.00165499,
        0.00174984, 0.00153009, 0.00154382, 0.00152557, 0.00152557,
        0.00152557, 0.00152557, 0.00152557, 0.00152557, 0.00152557]),
 'test_statistics': array([[0.00152557, 0.00245233, 0.00281151, ..., 0.00265233, 0.01235024,
         0.00221838],
        [0.00152557, 0.00403095, 0.00454296, ..., 0.00253583, 0.0036282 ,
         0.0032053 ],
        [0.00152557, 0.00293381, 0.00512761, ..., 0.00566676, 0.00264688,
         0.00206329],
        ...,
        [0.00152557, 0.0034129 , 0.0056052 , ..., 0.0025086 , 0.00244584,
         0.0012976 ],
        [0.00152557, 0.0025109 , 0.00080733, ..., 0.00677906, 0.00176209,
         0.0074087 ],
        [0.00152557, 0.00750238, 0.00275865, ..., 0.00252591, 0.01071246,
         0.00226115]]),
 'test_type': 'test_across_subjects',
 'method': 'regression',
 'test_combination': False,
 'max_correction': False,
 'performed_tests': {'t_test_cols': [], 'f_test_cols': []},
 'Nperm': 1000}

What we can see here is that result_regression is a dictionary containing the outcomes of a statistical analysis conducted using the specified method and test_type.

Let us break it down: * pval: This array houses the p-values generated from the permutation test.

  • base_statistics: Stores the base statistics of the tests

  • test_statistic: Will by default always return a list of the base (unpermuted) statistics when test_statistic_option=False. This list can store the test statistics associated with the permutation test. It provides information about the permutation distribution that is used to calculate the p-values. The output will exported if we set test_statistic_option=True

  • test_type: Indicates the type of permutation test performed. In this case, it is across_trials_within_session.

  • method: Specifies the analytical method employed, which is 'regression', which means that the analysis is carried out using regression-based permutation testing.

  • test_combination Specifies if calculates geometric means of p-values using permutation testing has been performed.

  • max_correction: Boolean value that indicates whether Max correction has been applied when performing permutation testing

  • performed_tests: A dictionary that marks the columns in the test_statistics or p-value matrix corresponding to the (q dimension) where t-tests or F-tests have been performed.

  • Nperm: Is the number of permutations that has been performed.

Visualization of results
Now that we have performed our test, we can then visualize the p-value array.
We will use the function plot_heatmap from the graphics module.

Note: Elements marked in red indicate a p-value below 0.05, signifying statistical significance.

[44]:
# The alpha score we set for the p-values
alpha = 0.05
# Plot p-values
graphics.plot_p_values_over_time(result_regression["pval"], title_text ="P-values over time",figsize=(10, 3), xlabel="Time points", alpha=alpha)
../_images/notebooks_Testing_across_trials_within_session_49_0.png
Multiple Comparison
To be sure there is no type 1 error, we can apply the Benjamini/Hochberg to control the False Discovery Rate
[45]:
pval_corrected, rejected_corrected =statistics.pval_correction(result_regression["pval"], method='fdr_bh')
# Plot p-values
graphics.plot_p_values_over_time(pval_corrected, title_text ="Corrected p-values over time",figsize=(10, 3), xlabel="Time points")
../_images/notebooks_Testing_across_trials_within_session_51_0.png
Cluster based permutation testing
In order to provide a more strict control over Type I mistakes, we can also apply cluster-based permutation testing to control the Family-Wise Error Rate when conducting multiple comparisons. To use this p-value correction, set test_statistics_option=True while performing permutation testing, as it is an input to the function (pval_cluster_based_correction).
[46]:
pval_cluster =statistics.pval_cluster_based_correction(result_regression["test_statistics"],result_regression["pval"])
# Plot p-values
graphics.plot_p_values_over_time(pval_cluster, title_text ="Cluster based correction of p-values over time",figsize=(10, 3), xlabel="Time points")
../_images/notebooks_Testing_across_trials_within_session_53_0.png
Visualize average probabilities and differences
We can now compare if the results from result_regression["pval"] correspond to the difference for each state over time for the two conditions. This will be done using the function plot_condition_difference.
[51]:
# Detect the intervals of when there is a significant difference, will be highlighed
alpha = 0.05
intervals =statistics.detect_significant_intervals(pval_corrected, alpha)
# Plot the average probability for each state over time for two conditions and their difference.
graphics.plot_condition_difference(Gamma_reconstruct, R_trials, figsize=(12,3),vertical_lines=intervals , highlight_boxes=True, title="Average Probability and Differences (fdr_bh correction) ")
../_images/notebooks_Testing_across_trials_within_session_55_0.png
[55]:
# Detect the intervals of when there is a significant difference, will be highlighed
alpha = 0.05
intervals =statistics.detect_significant_intervals(pval_cluster, alpha)
# Plot the average probability for each state over time for two conditions and their difference.
graphics.plot_condition_difference(Gamma_reconstruct, R_trials, figsize=(12,3),vertical_lines=intervals , highlight_boxes=True, title="Average Probability and Differences (cluster correction) ")

../_images/notebooks_Testing_across_trials_within_session_56_0.png

Save the results

[ ]:
io.save_statistics(result_regression, file_name="result_regression")

Conclusion - Regression

Our exampled involved presenting participants with two distinct stimuli across multiple trials spanning 10 different sessions. Permutation testing was performed across all trials within each session while while keeping the session order the same. The analysis generated the variable result_regression[“pval”], containing 250 p-values for each time point.

After correcting for multiple comparisons using Benjamini/Hochberg, significant differences were mainly in the early and mid segments of the signal (specifically, timepoints 7 to 83, 89 to 106 and 117 to 119) between state time courses (D) and stimuli (R).

The consistent occurrence of significant effects in the early segments of the recordings for every trial suggests a robust and reliable neural reaction at the onset of the stimuli. On the other hand, the decreasing significance towards the end of the signal may indicate a diminishing neural response to the presented stimuli over time. This temporal pattern in the results implies an initial heightened attention or processing of the stimuli, followed by a reduction in neural responsiveness as the trial progresses. The findings thus provide insights into the temporal dynamics of neural responses, indicating specific time windows of heightened sensitivity.

Across-trials within sessions testing - Correlation

In correlation analysis, our goal is to find connections between predictor variables (D) and the response variable (R). Instead of explaining the variations in signal values over time, as in regression, our interest lies in figuring out the strength of the linear relationship between the state time courses (Gamma_reconstruct) and behavioral measurements, such as our stimuli. If the result is significant, it means that certain patterns in the state time courses (Gamma_reconstruct) contribute to explaining the variability in behavioral measurements across trials. Now if the results is non-significant, it indicates that the observed relationship might just be a random chance. It means that our state time courses might not explain the variability in behavioral measurements.

Executing the across_trials_within_session function:
To initiate the across_trials_within_session function using correlation, input the variables Gamma_reconstruct (D) and R_session (R). Additionally, you can address potential confounding variables by incorporating permutation testing.
Quick reminder: for correlation-based permutation testing, go with method="correlation" to get the correlation coefficients and p-values.
[57]:
# Set the parameters for between-subject testing
method = "univariate"
Nperm = 1000 # Number of permutations (default = 1000)
# Perform across-subject testing
result_univariate  =statistics.test_across_trials_within_session(Gamma_reconstruct, R_trials, idx_trials_session,method=method,Nperm=Nperm, test_statistics_option=True)
performing permutation testing per timepoint
100%|██████████| 250/250 [02:36<00:00,  1.59it/s]
Visualization of results
Now that we have performed our test, we can then visualize the p-value matrix.
We will use the function plot_heatmap from the graphics module.

Note: Elements marked in red indicate a p-value below 0.05, signifying statistical significance.

[58]:
# Plot p-values
alpha = 0.05 # threshold for p-values
graphics.plot_p_value_matrix(result_univariate["pval"].T, title_text ="Heatmap of p-values",figsize=(8, 5), xlabel="Time points", ylabel="HMM States", annot=False, alpha=alpha)
../_images/notebooks_Testing_across_trials_within_session_63_0.png
Multiple Comparison
To be sure there is no type 1 error, we can apply the Benjamini/Hochberg to control the False Discovery Rate
[59]:
alpha = 0.05 # threshold for p-values
pval_corrected, rejected_corrected =statistics.pval_correction(result_univariate["pval"], method='fdr_bh', alpha=0.5)
# Plot p-values
graphics.plot_p_value_matrix(pval_corrected.T, title_text ="Heatmap of corrected p-values",figsize=(8, 5), xlabel="Time points", ylabel="HMM States", annot=False, alpha = alpha)
../_images/notebooks_Testing_across_trials_within_session_65_0.png
Cluster based permutation testing
In order to provide a more strict control over Type I mistakes, we can also apply cluster-based permutation testing to control the Family-Wise Error Rate when conducting multiple comparisons.
[60]:
pval_cluster =statistics.pval_cluster_based_correction(result_univariate["test_statistics"],result_univariate["pval"])
# Plot p-values
graphics.plot_p_value_matrix(pval_cluster.T, title_text ="Heatmap of cluster corrected p-values",figsize=(8, 5), xlabel="Time points", ylabel="HMM States", annot=False, alpha = alpha)
../_images/notebooks_Testing_across_trials_within_session_67_0.png

Plot Correlation Coefficients

[61]:
# Plot correlation coefficients
# Correlations between reation time and each state as function of time
graphics.plot_correlation_matrix(result_univariate["base_statistics"].T, result_univariate["performed_tests"], title_text ="Heatmap of Correlation Coefficeints",figsize=(8, 5), xlabel="Time points", ylabel="HMM States", annot=False)
../_images/notebooks_Testing_across_trials_within_session_69_0.png

Conclusion - Correlation

In this example, the permutation testing analysis using correlation aims to study the relationship between state time courses and two different stimuli across multiple experimental trials. The resulting matrix, result_correlation["pval"], is structured as (time points x states), and it shows the trial-specific variations in the correlation between state time courses at every timepoint and the stimuli. Essentially, this matrix offers information about time-dependent and state-specific correlation patterns.

When correcting the p-values using Benjamini/Hochberg we found no significant p-values. However, when correcting with cluster based permutation testing, significant p-value for a specific HMM state and timepoint means that the correlation between state time courses at that timepoint and the stimuli significantly varies across experimental trials. This suggest that it is possible to pinpoint instances when changes in stimuli impact the state time courses, and thus highlight moments when stimuli induce observable effects on the state dynamics.

When looking at result_correlation["pval"], we observe time windows that shows a significant difference in the correlation between different HMM states and the stimuli. This can be interpreted as distinctive periods when specific neural states exhibit heightened responsiveness or susceptibility to the presented stimuli.