glhmm.preproc

Preprocessing functions - General/Gaussian Linear Hidden Markov Model @author: Diego Vidaurre 2023

glhmm.preproc.apply_pca(X, d, whitening=False, exact=True)[source]

Applies PCA to the input data X.

Parameters:

Xarray-like of shape (n_samples, n_parcels)

The input data to be transformed.

dint or float

If int, the number of components to keep. If float, the percentage of explained variance to keep. If array-like of shape (n_parcels, n_components), the transformation matrix.

whiteningbool, default=False

Whether to whiten the transformed data.

exactbool, default=True

Whether to use full SVD solver for PCA.

Returns:

X_transformedarray-like of shape (n_samples, n_components)

The transformed data after applying PCA.

glhmm.preproc.build_data_autoregressive(data, indices, autoregressive_order=1, connectivity=None, center_data=True)[source]

Builds X and Y for the autoregressive model, as well as an adapted indices array and predefined connectivity matrix in the right format. X and Y are centered by default.

Parameters:

dataarray-like of shape (n_samples,n_parcels)

The data timeseries.

indicesarray-like of shape (n_sessions, 2)

The start and end indices of each trial/session in the input data.

autoregressive_orderint, optional, default=1

The number of lags to include in the autoregressive model.

connectivityarray-like of shape (n_parcels, n_parcels), optional, default=None

The matrix indicating which regressors should be used for each variable.

center_databool, optional, default=True

If True, the data will be centered.

Returns:

Xarray-like of shape (n_samples - n_sessions*autoregressive_order, n_parcels*autoregressive_order)

The timeseries of set of variables 1 (i.e., the regressors).

Yarray-like of shape (n_samples - n_sessions*autoregressive_order, n_parcels)

The timeseries of set of variables 2 (i.e., variables to predict, targets).

indices_newarray-like of shape (n_sessions, 2)

The new array of start and end indices for each trial/session.

connectivity_newarray-like of shape (n_parcels*autoregressive_order, n_parcels)

The new connectivity matrix indicating which regressors should be used for each variable.

glhmm.preproc.build_data_partial_connectivity(X, Y, connectivity=None, center_data=True)[source]

Builds X and Y for the partial connectivity model, essentially regressing out things when indicated in connectivity, and getting rid of regressors / regressed variables that are not used; it return connectivity with the right dimensions as well.

Parameters:

Xnp.ndarray of shape (n_samples, n_parcels)

The timeseries of set of variables 1 (i.e., the regressors).

Ynp.ndarray of shape (n_samples, n_parcels)

The timeseries of set of variables 2 (i.e., variables to predict, targets).

connectivitynp.ndarray of shape (n_parcels, n_parcels), optional, default=None

A binary matrix indicating which regressors affect which targets (i.e., variables to predict).

center_databool, default=True

Center data to zero mean.

Returns:

X_newnp.ndarray of shape (n_samples, n_active_parcels)

The timeseries of set of variables 1 (i.e., the regressors) after removing unused predictors and regressing out the effects indicated in connectivity.

Y_newnp.ndarray of shape (n_samples, n_active_parcels)

The timeseries of set of variables 2 (i.e., variables to predict, targets) after removing unused targets and regressing out the effects indicated in connectivity.

connectivity_newnp.ndarray of shape (n_active_parcels, n_active_parcels), optional, default=None

A binary matrix indicating which regressors affect which targets The matrix has the same structure as connectivity after removing unused predictors and targets.

glhmm.preproc.build_data_tde(data, indices, lags, pca=None, standardise_pc=True)[source]

Builds X for the temporal delay embedded HMM, as well as an adapted indices array.

Parameters:

datanumpy array of shape (n_samples, n_parcels)

The data matrix.

indicesarray-like of shape (n_sessions, 2)

The start and end indices of each trial/session in the input data.

lagslist or array-like

The lags to use for the embedding.

pcaNone or int or float or numpy array, default=None

The number of components for PCA, the explained variance for PCA, the precomputed PCA projection matrix, or None to skip PCA.

standardise_pcbool, default=True

Whether or not to standardise the principal components before returning.

Returns:

Xnumpy array of shape (n_samples - n_sessions*rwindow, n_parcels*n_lags)

The delay-embedded timeseries data.

indices_newnumpy array of shape (n_sessions, 2)

The adapted indices for each segment of delay-embedded data.

PCA can be run optionally: if pca >=1, that is the number of components; if pca < 1, that is explained variance; if pca is a numpy array, then it is a precomputed PCA projection matrix; if pca is None, then no PCA is run.

glhmm.preproc.load_files(files, I=None, do_only_indices=False)[source]
glhmm.preproc.preprocess_data(data, indices, fs=1, standardise=True, filter=None, detrend=False, onpower=False, pca=None, whitening=False, exact_pca=True, downsample=None)[source]

Preprocess the input data.

Parameters:

dataarray-like of shape (n_samples, n_parcels)

The input data to be preprocessed.

indicesarray-like of shape (n_sessions, 2)

The start and end indices of each trial/session in the input data.

fsint or float, default=1

The frequency of the input data.

standardisebool, default=True

Whether to standardize the input data.

filtertuple of length 2 or None, default=None

The low-pass and high-pass thresholds to apply to the input data. If None, no filtering will be applied. If a tuple, the first element is the low-pass threshold and the second is the high-pass threshold.

detrendbool, default=False

Whether to detrend the input data.

onpowerbool, default=False

Whether to calculate the power of the input data using the Hilbert transform.

pcaint or float or None, default=None

If int, the number of components to keep after applying PCA. If float, the percentage of explained variance to keep after applying PCA. If None, no PCA will be applied.

whiteningbool, default=False

Whether to whiten the input data after applying PCA.

exact_pcabool, default=True

Whether to use full SVD solver for PCA.

downsampleint or float or None, default=None

The new frequency of the input data after downsampling. If None, no downsampling will be applied.

Returns:

data_processedarray-like of shape (n_samples_processed, n_parcels)

The preprocessed input data.

indices_processedarray-like of shape (n_sessions_processed, 2)

The start and end indices of each trial/session in the preprocessed data.