glhmm.preproc¶

Preprocessing functions - General/Gaussian Linear Hidden Markov Model @author: Diego Vidaurre 2023

glhmm.preproc.apply_pca(X, d, whitening=False, exact=True)[source]¶

Applies PCA to the input data X.

Parameters:¶

Xarray-like of shape (n_samples, n_parcels): The input data to be transformed.
dint or float: If int, the number of components to keep. If float, the percentage of explained variance to keep. If array-like of shape (n_parcels, n_components), the transformation matrix.
whiteningbool, default=False: Whether to whiten the transformed data.
exactbool, default=True: Whether to use full SVD solver for PCA.

Returns:¶

X_transformedarray-like of shape (n_samples, n_components): The transformed data after applying PCA.

glhmm.preproc.build_data_autoregressive(data, indices, autoregressive_order=1, connectivity=None, center_data=True)[source]¶

Builds X and Y for the autoregressive model, as well as an adapted indices array and predefined connectivity matrix in the right format. X and Y are centered by default.

Parameters:¶

dataarray-like of shape (n_samples,n_parcels): The data timeseries.
indicesarray-like of shape (n_sessions, 2): The start and end indices of each trial/session in the input data.
autoregressive_orderint, optional, default=1: The number of lags to include in the autoregressive model.
connectivityarray-like of shape (n_parcels, n_parcels), optional, default=None: The matrix indicating which regressors should be used for each variable.
center_databool, optional, default=True: If True, the data will be centered.

Returns:¶

Xarray-like of shape (n_samples - n_sessions*autoregressive_order, n_parcels*autoregressive_order): The timeseries of set of variables 1 (i.e., the regressors).
Yarray-like of shape (n_samples - n_sessions*autoregressive_order, n_parcels): The timeseries of set of variables 2 (i.e., variables to predict, targets).
indices_newarray-like of shape (n_sessions, 2): The new array of start and end indices for each trial/session.
connectivity_newarray-like of shape (n_parcels*autoregressive_order, n_parcels): The new connectivity matrix indicating which regressors should be used for each variable.

glhmm.preproc.build_data_partial_connectivity(X, Y, connectivity=None, center_data=True)[source]¶

Builds X and Y for the partial connectivity model, essentially regressing out things when indicated in connectivity, and getting rid of regressors / regressed variables that are not used; it return connectivity with the right dimensions as well.

Parameters:¶

Xnp.ndarray of shape (n_samples, n_parcels): The timeseries of set of variables 1 (i.e., the regressors).
Ynp.ndarray of shape (n_samples, n_parcels): The timeseries of set of variables 2 (i.e., variables to predict, targets).
connectivitynp.ndarray of shape (n_parcels, n_parcels), optional, default=None: A binary matrix indicating which regressors affect which targets (i.e., variables to predict).
center_databool, default=True: Center data to zero mean.

Returns:¶

X_newnp.ndarray of shape (n_samples, n_active_parcels): The timeseries of set of variables 1 (i.e., the regressors) after removing unused predictors and regressing out the effects indicated in connectivity.
Y_newnp.ndarray of shape (n_samples, n_active_parcels): The timeseries of set of variables 2 (i.e., variables to predict, targets) after removing unused targets and regressing out the effects indicated in connectivity.
connectivity_newnp.ndarray of shape (n_active_parcels, n_active_parcels), optional, default=None: A binary matrix indicating which regressors affect which targets The matrix has the same structure as connectivity after removing unused predictors and targets.

glhmm.preproc.build_data_tde(data, indices, lags, pca=None, standardise_pc=True)[source]¶

Builds X for the temporal delay embedded HMM, as well as an adapted indices array.

Parameters:¶

datanumpy array of shape (n_samples, n_parcels): The data matrix.
indicesarray-like of shape (n_sessions, 2): The start and end indices of each trial/session in the input data.
lagslist or array-like: The lags to use for the embedding.
pcaNone or int or float or numpy array, default=None: The number of components for PCA, the explained variance for PCA, the precomputed PCA projection matrix, or None to skip PCA.
standardise_pcbool, default=True: Whether or not to standardise the principal components before returning.

Returns:¶

Xnumpy array of shape (n_samples - n_sessions*rwindow, n_parcels*n_lags): The delay-embedded timeseries data.
indices_newnumpy array of shape (n_sessions, 2): The adapted indices for each segment of delay-embedded data.

PCA can be run optionally: if pca >=1, that is the number of components; if pca < 1, that is explained variance; if pca is a numpy array, then it is a precomputed PCA projection matrix; if pca is None, then no PCA is run.

glhmm.preproc.load_files(files, I=None, do_only_indices=False)[source]¶

glhmm.preproc.preprocess_data(data, indices, fs=1, standardise=True, filter=None, detrend=False, onpower=False, pca=None, whitening=False, exact_pca=True, downsample=None)[source]¶

Preprocess the input data.

Parameters:¶

dataarray-like of shape (n_samples, n_parcels): The input data to be preprocessed.
indicesarray-like of shape (n_sessions, 2): The start and end indices of each trial/session in the input data.
fsint or float, default=1: The frequency of the input data.
standardisebool, default=True: Whether to standardize the input data.
filtertuple of length 2 or None, default=None: The low-pass and high-pass thresholds to apply to the input data. If None, no filtering will be applied. If a tuple, the first element is the low-pass threshold and the second is the high-pass threshold.
detrendbool, default=False: Whether to detrend the input data.
onpowerbool, default=False: Whether to calculate the power of the input data using the Hilbert transform.
pcaint or float or None, default=None: If int, the number of components to keep after applying PCA. If float, the percentage of explained variance to keep after applying PCA. If None, no PCA will be applied.
whiteningbool, default=False: Whether to whiten the input data after applying PCA.
exact_pcabool, default=True: Whether to use full SVD solver for PCA.
downsampleint or float or None, default=None: The new frequency of the input data after downsampling. If None, no downsampling will be applied.

Returns:¶

data_processedarray-like of shape (n_samples_processed, n_parcels): The preprocessed input data.
indices_processedarray-like of shape (n_sessions_processed, 2): The start and end indices of each trial/session in the preprocessed data.