Standard Gaussian Hidden Markov Model

This notebook covers the fundamental procedures for training and examining a Gaussian Hidden Markov Model (HMM). The model is designed for analyzing a single set of time series data, such as neuroimaging or electrophysiological recordings. If you need to model interactions between two sets of time series, you can look at Gaussian-Linear HMM provided in the toolbox.

Outline

  1. Background

  2. Example: Modelling time-varying amplitude and functional connectivity in fMRI recordings

Background

The HMM is a generative probabilistic model that assumes that an observed timeseries (e.g., neuroimaging or electrophysiological recordings) were generated by a sequence of hidden “states”. In the Gaussian HMM, we model states as Gaussian distributions, so we assume that the observations Y at time point t were generated by a Gaussian distribution with parameters \(\mu\) and \(\Sigma\) when state k is active, i.e.:

\[Y_t\sim N(\mu^k,\Sigma^k)\]

Additionally, the HMM estimates the probabilities \(\theta\) of transitioning between each pair of states, i.e., the probability that the currently active state at time point t is k given that the state at the previous timepoint t-1 was l:

\[P(s_t=k|s_{t-1}=l) = \theta_{k,l}\]

And the probability \(\pi\) that a segment of the timeseries starts with state k:

\[P(s_0=k)=\pi_k\]

When we fit the model to the observations, we aim to estimate the parameters of the prior distributions for these parameters (\(\mu\), \(\Sigma\), \(\theta\), and \(\pi\)) using variational inference.

We define the posterior estimates as

\[\gamma_{t,k}:=P(s_t=k|s_{>t},s_{<t}, Y)\]
\[\xi_{t,k,l}:=P(s_t=k,s_{t-1}=l|s_{>t},s_{<t-1},Y)\]

where \(\gamma\) are the probabilities of state k being active at time point t (the state time courses) and \(\xi\) are the joint state probabilities. Instead of the state time courses, we can also use the Viterbi path, a discrete representation of which state is active at each time point.

A common application for the standard Gaussian HMM is the estimation of time-varying amplitude and functional connectivity in fMRI recordings (e.g., Vidaurre et al., 2017).

Example: Modelling time-varying amplitude and functional connectivity in fMRI recordings

We will now go through an example illustrating how to fit and inspect a standard Gaussian HMM. The example uses simulated data that can be found in the example_data folder. The data were generated to resemble fMRI timeseries, and our goal is to estimate time-varying amplitude and functional connectivity (FC) for a group of subjects. Imagine that the data were recorded from 20 different subjects. Each subject has been recorded for 1,000 timepoints and their timeseries were extracted in a parcellation with 50 brain regions. The data Y here thus has dimensions ((20 subjects * 1000 timepoints), 50 brain regions). We use the indices array to specify where in the timeseries each of the 20 subjects’ session starts and ends.

Preparation

If you dont have the GLHMM-package installed, then run the following command in your terminal:

pip install glhmm

Import libraries
Let’s start by importing the required libraries and modules.
[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from glhmm import glhmm, preproc, utils, graphics

Load data

The standard HMM requires only one input: a timeseries Y. When running the model on a concatenated timeseries, e.g., a group of subjects or several scanning sessions, you also need to provide the indices indicating where each subject/session in the concatenated timeseries Y starts and ends.

Input data for the glhmm should be in numpy format. Other data types, such as .csv, can be converted to numpy using e.g. pandas, as shown below. Alternatively, the io module provides useful functions to load input data in the required format, e.g., from existing .mat-files. If you need to create indices from session lengths (as used in the HMM-MAR toolbox) you can use the auxiliary.make_indices_from_T function.

Synthetic data for this example are provided in the glhmm/docs/notebooks/example_data folder. The file data.csv contains synthetic timeseries. The data should have the shape ((no subjects/sessions * no timepoints), no features), meaning that all subjects and/or sessions have been concatenated along the first dimension. The second dimension is the number of features, e.g., the number of parcels or channels. The file T.csv specifies the indices in the concatenated timeseries corresponding to the beginning and end of individual subjects/sessions in the shape (no subjects, 2). In this case, we have generated timeseries for 20 subjects and 50 features. Each subject has 1,000 timepoints. The timeseries has the shape (20000, 50) and the indices have the shape (20, 2).

[2]:
data = pd.read_csv('./example_data/data.csv', header=None).to_numpy()
T_t = pd.read_csv('./example_data/T.csv', header=None).to_numpy()
NOTE: It is important to standardise your timeseries and, if necessary, apply other kinds of preprocessing before fitting the model.
This will be done separately for each session/subject as specified in the indices. The data provided here are already close to standardised (so the code below will not do much), but see Prediction tutorial to see the effect on real data.
[3]:
data,_ = preproc.preprocess_data(data, T_t)

Initialise and train an HMM

We first initialise the glhmm object and specify hyperparameters. In the case of the standard Gaussian HMM, since we do not want to model an interaction between two sets of variables, we set model_beta='no'. The number of states is specified using the K parameter. We here estimate K=4 states. If you want to model a different number of states, change K to a different value. In this example, we want to model states as Gaussian distributions with a mean and full covariance matrix, so that each state is described by a mean amplitude and functional connectivity pattern. To do this, specify covtype='full', the state-specific mean is already set by default. If you do not want to model the mean, add model_mean='no'.

[4]:
hmm = glhmm.glhmm(model_beta='no', K=4, covtype='full')

Optionally, you can check the hyperparameters (including the ones set by default) to make sure that they correspond to how you want the model to be set up.

[5]:
print(hmm.hyperparameters)
{'K': 4, 'covtype': 'full', 'model_mean': 'state', 'model_beta': 'no', 'dirichlet_diag': 10, 'connectivity': None, 'Pstructure': array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]]), 'Pistructure': array([ True,  True,  True,  True])}

We then train the HMM using the data and indices loaded above. Since we here do not model an interaction between two sets of timeseries but run a standard HMM instead, we set X=None. Y should be the timeseries in which we want to estimate states (in here called data) and indices should be the beginning and end indices of each subject (here called T_t). During training, the output will show the progress in model fit at each iteration.

[ ]:
hmm.train(X=None, Y=data, indices=T_t)

Optionally, you can also return Gamma (the state probabilities at each timepoint), Xi (the joint probabilities of past and future states conditioned on the data) and FE (the free energy of each iteration) at this step, but it is also possible to retrieve them later using the decode function for Gamma and Xi and the get_fe function for the free energy.

[7]:
Gamma,Xi,FE = hmm.train(X=None, Y=data, indices=T_t)
Init repetition 1 free energy = 1379857.9802932388
Init repetition 2 free energy = 1380121.1907757511
Init repetition 3 free energy = 1380740.3796546296
Init repetition 4 free energy = 1379410.9678162064
Init repetition 5 free energy = 1379222.1917814433
Best repetition: 5
Cycle 1 free energy = 1379084.5428614754
Cycle 2 free energy = 1378500.6654551388
Cycle 3, free energy = 1378161.873566189, relative change = 0.3671866948217571
Cycle 4, free energy = 1377838.6755315338, relative change = 0.2594160926191503
Cycle 5, free energy = 1377555.8178547365, relative change = 0.1850284881521681
Cycle 6, free energy = 1377329.5137668254, relative change = 0.12894606055301452
Cycle 7, free energy = 1377155.5534626483, relative change = 0.0901820944598814
Cycle 8, free energy = 1377013.5015096772, relative change = 0.06858962658944788
Cycle 9, free energy = 1376903.788513003, relative change = 0.05030965397413329
Cycle 10, free energy = 1376812.9543020807, relative change = 0.039987087690936056
Cycle 11, free energy = 1376725.1116962356, relative change = 0.03723041686454307
Cycle 12, free energy = 1376652.0932207007, relative change = 0.03001849424171705
Cycle 13, free energy = 1376590.9118030097, relative change = 0.024535072052198494
Cycle 14, free energy = 1376542.2214213586, relative change = 0.019151937627843806
Cycle 15, free energy = 1376502.5051934882, relative change = 0.01538173836998875
Cycle 16, free energy = 1376462.3420230008, relative change = 0.015316588225466242
Cycle 17, free energy = 1376421.1703004525, relative change = 0.01545849166986358
Cycle 18, free energy = 1376388.7610054, relative change = 0.012022224639387282
Cycle 19, free energy = 1376356.0215199012, relative change = 0.011998984578168054
Cycle 20, free energy = 1376327.9085872034, relative change = 0.010198281636505456
Cycle 21, free energy = 1376295.0835885652, relative change = 0.011767513136652417
Cycle 22, free energy = 1376267.9452828593, relative change = 0.009635137767641605
Cycle 23, free energy = 1376245.0151626791, relative change = 0.00807532893229
Cycle 24, free energy = 1376220.5867522887, relative change = 0.008529603617888416
Cycle 25, free energy = 1376198.401576302, relative change = 0.007686794856751131
Cycle 26, free energy = 1376169.886613055, relative change = 0.009783302323410435
Cycle 27, free energy = 1376147.988525358, relative change = 0.007457068792339799
Cycle 28, free energy = 1376128.7490671424, relative change = 0.006509066448594153
Cycle 29, free energy = 1376109.9182406815, relative change = 0.006330488334297461
Cycle 30, free energy = 1376087.1723303662, relative change = 0.007588621453114678
Cycle 31, free energy = 1376057.253762409, relative change = 0.009882956988267688
Cycle 32, free energy = 1376021.7657036951, relative change = 0.011586888920016391
Cycle 33, free energy = 1375990.0999602044, relative change = 0.010233099947546455
Cycle 34, free energy = 1375961.5844705228, relative change = 0.009130922065511794
Cycle 35, free energy = 1375935.4017273972, relative change = 0.00831424887321211
Cycle 36, free energy = 1375911.141283696, relative change = 0.007644933396091927
Cycle 37, free energy = 1375883.174644295, relative change = 0.008735839648520396
Cycle 38, free energy = 1375845.304133151, relative change = 0.011691176328804736
Cycle 39, free energy = 1375795.4469541614, relative change = 0.015158323257980194
Cycle 40, free energy = 1375747.9302930152, relative change = 0.014240988478958673
Cycle 41, free energy = 1375707.7533319648, relative change = 0.011897976080313307
Cycle 42, free energy = 1375677.063305819, relative change = 0.009006664792734697
Cycle 43, free energy = 1375644.924406396, relative change = 0.009343739674232977
Cycle 44, free energy = 1375610.689544573, relative change = 0.009855010761812593
Cycle 45, free energy = 1375578.5866289416, relative change = 0.009156678949232985
Cycle 46, free energy = 1375550.588358629, relative change = 0.007922645945217008
Cycle 47, free energy = 1375520.8262991046, relative change = 0.008351410389557817
Cycle 48, free energy = 1375488.3447965356, relative change = 0.00903217842357393
Cycle 49, free energy = 1375451.2916828026, relative change = 0.01019833529552123
Cycle 50, free energy = 1375416.7748965614, relative change = 0.00941084239008423
Cycle 51, free energy = 1375377.4647678614, relative change = 0.010604073533737022
Cycle 52, free energy = 1375342.0962176528, relative change = 0.009450649154079602
Cycle 53, free energy = 1375308.633699241, relative change = 0.008862108958121243
Cycle 54, free energy = 1375279.329916205, relative change = 0.007700957464809627
Cycle 55, free energy = 1375254.5878943491, relative change = 0.006460133883627099
Cycle 56, free energy = 1375228.1503435802, relative change = 0.006855513448455742
Cycle 57, free energy = 1375195.1519079222, relative change = 0.00848421669407943
Cycle 58, free energy = 1375163.7191603645, relative change = 0.008016873482185218
Cycle 59, free energy = 1375139.999829965, relative change = 0.006013201075524199
Cycle 60, free energy = 1375120.0137846663, relative change = 0.005041215466306639
Cycle 61, free energy = 1375106.3330898162, relative change = 0.0034389073566766064
Cycle 62, free energy = 1375094.9663680978, relative change = 0.0028491048453907985
Cycle 63, free energy = 1375082.1556123274, relative change = 0.0032007786785618426
Cycle 64, free energy = 1375066.3281286687, relative change = 0.0039389342559160755
Cycle 65, free energy = 1375053.7567803282, relative change = 0.00311883292424827
Cycle 66, free energy = 1375045.1825044306, relative change = 0.0021226815979983747
Cycle 67, free energy = 1375036.1393577787, relative change = 0.002233756255689247
Cycle 68, free energy = 1375027.6994482426, relative change = 0.0020804129409158177
Cycle 69, free energy = 1375021.3050112035, relative change = 0.0015737294430377307
Cycle 70, free energy = 1375013.382772314, relative change = 0.001945941381845216
Cycle 71, free energy = 1375006.1687927947, relative change = 0.0017688371389480762
Cycle 72, free energy = 1375000.8198324533, relative change = 0.0013098244673809074
Cycle 73, free energy = 1374996.5965662494, relative change = 0.0010331021737785054
Cycle 74, free energy = 1374992.0309258904, relative change = 0.0011156083185131225
Cycle 75, free energy = 1374987.736604365, relative change = 0.0010482120109837182
Cycle 76, free energy = 1374984.3835899357, relative change = 0.0008177766294609318
Cycle 77, free energy = 1374982.2259330936, relative change = 0.0005259605437153088
Cycle 78, free energy = 1374980.5653469244, relative change = 0.0004046284764803443
Cycle 79, free energy = 1374978.7867644988, relative change = 0.0004331924214555605
Cycle 80, free energy = 1374976.3108791949, relative change = 0.0006026644343851011
Cycle 81, free energy = 1374973.332300257, relative change = 0.0007245016750141048
Cycle 82, free energy = 1374968.9730081435, relative change = 0.0010592195658945761
Cycle 83, free energy = 1374964.9261320294, relative change = 0.0009823428682499936
Cycle 84, free energy = 1374962.1334270001, relative change = 0.0006774448471551902
Cycle 85, free energy = 1374959.487636117, relative change = 0.0006413952634724473
Cycle 86, free energy = 1374956.8268195142, relative change = 0.0006446220078671648
Cycle 87, free energy = 1374954.7054074327, relative change = 0.0005136793166846053
Cycle 88, free energy = 1374952.9646098982, relative change = 0.00042133960162087114
Cycle 89, free energy = 1374951.394189518, relative change = 0.0003799573895976869
Cycle 90, free energy = 1374950.2871640832, relative change = 0.00026776898086565144
Cycle 91, free energy = 1374949.474842072, relative change = 0.00019644707351988849
Cycle 92, free energy = 1374948.8171954222, relative change = 0.00015901602351143506
Cycle 93, free energy = 1374948.0906398408, relative change = 0.00017564703821245088
Cycle 94, free energy = 1374946.7453907288, relative change = 0.00032511236267787825
Cycle 95, free energy = 1374944.2105316196, relative change = 0.0006122356630470211
Cycle 96, free energy = 1374941.456428079, relative change = 0.0006647468221940861
Cycle 97, free energy = 1374939.9616214966, relative change = 0.0003606652869829192
Cycle 98, free energy = 1374938.584343714, relative change = 0.00033219767558025955
Cycle 99, free energy = 1374936.4834540377, relative change = 0.0005064753104749447
Cycle 100, free energy = 1374935.2212486046, relative change = 0.00030419561337092436
Finished training in 51.18s : active states = 4

Inspect model

We can then inspect some interesting aspects of the model. We start by checking what the states look like by interrogating the the state means and the state covariances. We will then look at the dynamics, i.e., how states transition between each other (transition probabilities) and how the state sequence develops over time (the Viterbi path). Finally, we will have a look at some summary metrics that can be useful to describe the overall patterns or to relate the model to behaviour using statistical testing or machine learning/out-of-sample prediction. > We here show some simple result plots. For more plotting options, see the Graphics module

State means: Time-varying amplitude patterns

The state means can be interpreted as time-varying patterns of amplitude (relative to the baseline). They can be retrieved from the model using the get_means() function, or get_mean(k) to retrieve only the mean for state k:

[8]:
K = hmm.hyperparameters["K"] # the number of states
q = data.shape[1] # the number of parcels/channels
state_means = np.zeros(shape=(q, K))
state_means = hmm.get_means() # the state means in the shape (no. features, no. states)

We can then plot these amplitude patterns. This will show the states on the x-axis, each parcel/brain region/channel on the y-axis, and the mean activation of each parcel in each state as the color intensity.

[69]:
cmap = "coolwarm"
plt.imshow(state_means, cmap=cmap,interpolation="none")
plt.colorbar(label='Activation Level') # Label for color bar
plt.title("State mean activation")
plt.xticks(np.arange(K), np.arange(1,K+1))
plt.gca().set_xlabel('State')
plt.gca().set_ylabel('Brain region')
plt.tight_layout()  # Adjust layout for better spacing
plt.show()
../_images/notebooks_GaussianHMM_example_23_0.png

State covariances: Time-varying functional connectivity

The state covariances represent the time-varying functional connectivity patterns that we have estimated in the fMRI recordings. They can be obtained from the model using the get_covariance_matrix function:

[10]:
state_FC = np.zeros(shape=(q, q, K))
for k in range(K):
    state_FC[:,:,k] = hmm.get_covariance_matrix(k=k) # the state covariance matrices in the shape (no. features, no. features, no. states)

We can then plot the covariance (i.e., functional connectivity) of each state. These are square matrices showing the brain region by brain region functional connectivity patterns:

[72]:
for k in range(K):
    plt.subplot(2,2,k+1)
    plt.imshow(state_FC[:,:,k],cmap=cmap)
    plt.xlabel('Brain region')
    plt.ylabel('Brain region')
    plt.colorbar()
    plt.title("State covariance\nstate #%s" % (k+1))
plt.subplots_adjust(hspace=0.7, wspace=0.8)
plt.show()
../_images/notebooks_GaussianHMM_example_27_0.png

Transition probabilities

The transition probabilities indicate the temporal order in the state sequence, i.e., the probability of transitioning from any one state to any other state. They are contained in hmm.P with a shape of [K, K]:

[12]:
TP = hmm.P.copy() # the transition probability matrix

We can then plot the transition probability matrix. Note that self-transitions (i.e., staying in the same state) are considerably more likely in a timeseries that has some order, so there should be a strong diagonal pattern. For comparison, we also show the transition probabilities excluding self-transitions:

[78]:
# Plot Transition Probabilities
plt.figure(figsize=(7, 4))

# Plot 1: Original Transition Probabilities
plt.subplot(1, 2, 1)
plt.imshow(TP, cmap=cmap, interpolation='nearest')  # Improved color mapping
plt.title('Transition Probabilities')
plt.xlabel('To State')
plt.ylabel('From State')
plt.colorbar(fraction=0.046, pad=0.04)

# Plot 2: Transition Probabilities without Self-Transitions
TP_noself = TP - np.diag(np.diag(TP))  # Remove self-transitions
TP_noself2 = TP_noself / TP_noself.sum(axis=1, keepdims=True)  # Normalize probabilities
plt.subplot(1, 2, 2)
plt.imshow(TP_noself2, cmap=cmap, interpolation='nearest')  # Improved color mapping
plt.title('Transition Probabilities\nwithout Self-Transitions')
plt.xlabel('To State')
plt.ylabel('From State')
plt.colorbar(fraction=0.046, pad=0.04)

plt.tight_layout()  # Adjust layout for better spacing
plt.show()

../_images/notebooks_GaussianHMM_example_31_0.png

Viterbi path

The Viterbi path is a discrete representation of the state timecourse, indicating which state is active at which timepoint. We may be able to see whether some states tend to occur more for certain subjects, or are related to a stimulus that occurs at specific timepoints. The Viterbi path can also be informative to understand whether the HMM is “mixing”, i.e., states occur across subjects, or whether the model estimates the entire session of one subject as one state (see Ahrends et al. 2022). You can retrieve the Viterbi path using the decode function and setting viterbi=True:

[14]:
vpath = hmm.decode(X=None, Y=data, indices=T_t, viterbi=True)

And plot the Viterbi path (see also graphics module):

[23]:
graphics.plot_vpath(vpath, title="Viterbi path")
../_images/notebooks_GaussianHMM_example_35_0.png

We can also visualize the Viterbi path for the first subject from timepoints 0-1000

[24]:
num_subject = 0
graphics.plot_vpath(vpath[T_t[num_subject,0]:T_t[num_subject,1],:], title="Viterbi path")
../_images/notebooks_GaussianHMM_example_37_0.png

Plotting the Viterbi path indicates that the states show fast dynamics across all subjects (concatenated along the y-axis).

Summary metrics

Once we have estimated the model parameters, we can compute some summary metrics to describe the patterns we see more broadly. These summary metrics can be useful to relate patterns in the timeseries to behaviour, using e.g., statistical testing (see Statistical testing tutorial) or machine learning (see Prediction tutorial). The module utils provides useful functions to obtain these summary metrics.

The fractional occupancy (FO) indicates the fraction of time in each session that is occupied by each state. For instance, if one state was active for the entire duration of the session, this state’s FO would be 1 (100%) and all others would be 0. If, on the other hand, all states are present for an equal amount of timepoints in total, the FO of all states would be 1/K (the number of states). This can be informative to understand whether one state is more present in a certain group of subjects or experimental condition, or to interrogate mixing (explained above).

You can obtain the fractional occupancies using the get_FO function. The output is an array containing the FO of each subject along the first dimension and each state along the second dimension.

[25]:
FO = utils.get_FO(Gamma, indices=T_t)

And plot the Fractional Occupancies (see also graphics module):

[37]:
graphics.plot_FO(FO, num_ticks=FO.shape[0])
../_images/notebooks_GaussianHMM_example_42_0.png

The switching rate indicates how quickly subjects switch between states (as opposed to stay in the same state).

[19]:
SR = utils.get_switching_rate(Gamma, T_t)

And plot the switching rate (see also graphics module):

[63]:
graphics.plot_switching_rates(SR, num_ticks=SR.shape[0])

../_images/notebooks_GaussianHMM_example_46_0.png

The state lifetimes (also called dwell times) indicate how long a state is active at a time (either on average, median, or maximum). This can be informative to understand whether states tend to last longer or shorter times, pointing towards slower vs. faster dynamics:

[39]:
LTmean, LTmed, LTmax = utils.get_life_times(vpath, T_t)

Now we will plot the mean the state lifetime (see also graphics module):

[67]:
graphics.plot_state_lifetimes(LTmean, num_ticks=LTmean.shape[0], ylabel='Mean lifetime')
../_images/notebooks_GaussianHMM_example_50_0.png