glhmm.statistics

Permutation testing from Gaussian Linear Hidden Markov Model @author: Nick Y. Larsen 2023

glhmm.statistics.calculate_baseline_difference(vpath_array, R_data, state, pairwise_statistic)[source]

Calculate the difference between the specified statistics of a state and all other states combined.

Parameters:

vpath_data (numpy.ndarray):

The Viterbi path as of integer values that range from 1 to n_states.

R_data (numpy.ndarray):

The dependent-variable associated with each state.

state(numpy.ndarray):

The state for which the difference is calculated from.

pairwise_statistic (str)

The chosen statistic to be calculated. Valid options are “mean” or “median”.

Returns:

difference (float)

The calculated difference between the specified state and all other states combined.

glhmm.statistics.calculate_geometric_pval(p_values, test_combination)[source]

Calculate test statistics of z-scores converted from p-values based on the specified combination.

Parameters:

p_values (numpy.ndarray):

Matrix of p-values.

test_combination (str):

Specifies the combination method. Valid options: “True”, “across_columns”, “across_rows”. Default is “True”.

Returns:

result (numpy.ndarray):

Test statistics of z-scores converted from p-values.

glhmm.statistics.calculate_nan_correlation_matrix(D_data, R_data, test_combination=False, reduce_pval_dims=False)[source]

Calculate the correlation matrix between independent variables (D_data) and dependent variables (R_data), while handling NaN values column by column of dimension p without without removing entire rows.

Parameters:

D_data (numpy.ndarray):

Input D-matrix for the independent variables.

R_data (numpy.ndarray):

Input R-matrix for the dependent variables.

Returns:

correlation_matrix (numpy.ndarray): Correlation matrix between columns in D_data and R_data.

glhmm.statistics.calculate_nan_f_test(D_data, R_column, nan_values=False)[source]
Calculate F-statistics for each feature of D_data against categories in R_data, while handling NaN values column by column without removing entire rows.
  • The function handles NaN values for each feature in D_data without removing entire rows.

  • NaN values are omitted on a feature-wise basis, and the F-statistic is calculated for each feature.

  • The resulting array contains F-statistics corresponding to each feature in D_data.

Parameters:

D_data (numpy.ndarray):

The input matrix of shape (n_samples, n_features).

R_column (numpy.ndarray):

The categorical labels corresponding to each sample in D_data.

Returns:

f_test (numpy.ndarray):

F-statistics for each feature in D_data against the categories in R_data.

glhmm.statistics.calculate_nan_regression(Din, Rin, proj)[source]

Calculate the R-squared values for the regression of each dependent variable in Rin on the independent variables in Din, while handling NaN values column-wise.

Parameters:

Din (numpy.ndarray):

Input D-matrix for the independent variables.

Rin (numpy.ndarray):

Input D-matrix for the dependent variables.

proj (numpy.ndarray):

Projection matrix.

Returns:

R2_test (numpy.ndarray):

Array of R-squared values for each regression.

glhmm.statistics.calculate_nan_regression_f_test(Din, Rin, proj, nan_values=False)[source]

Calculate the f-test values for the regression of each dependent variable in Rin on the independent variables in Din, while handling NaN values column-wise.

Parameters:

Din (numpy.ndarray):

Input D-matrix for the independent variables.

Rin (numpy.ndarray):

Input D-matrix for the dependent variables.

proj (numpy.ndarray):

Projection matrix.

Returns:

R2_test (numpy.ndarray):

Array of f-test values for each regression.

glhmm.statistics.calculate_nan_t_test(D_data, R_column, nan_values=False)[source]
Calculate the t-statistics between paired independent (D_data) and dependent (R_data) variables, while handling NaN values column by column without removing entire rows.
  • The function handles NaN values for each feature in D_data without removing entire rows.

  • NaN values are omitted on a feature-wise basis, and the t-statistic is calculated for each feature.

  • The resulting array contains t-statistics corresponding to each feature in D_data.

Parameters:

D_data (numpy.ndarray):

The input matrix of shape (n_samples, n_features).

R_column (numpy.ndarray):

The binary labels corresponding to each sample in D_data.

Returns:

t_test (numpy.ndarray):

t-statistics for each feature in D_data against the binary categories in R_data.

glhmm.statistics.calculate_statepair_difference(vpath_array, R_data, state_1, state_2, stat)[source]

Calculate the difference between the specified statistics of two states.

Parameters:

vpath_data (numpy.ndarray):

The Viterbi path as of integer values that range from 1 to n_states.

R_data (numpy.ndarray):

The dependent-variable associated with each state.

state_1 (int):

First state for comparison.

state_2 (int):

Second state for comparison.

statistic (str):

The chosen statistic to be calculated. Valid options are “mean” or “median”.

Returns:

difference (float):

The calculated difference between the two states.

glhmm.statistics.deconfound_values(D_data, R_data, confounds=None)[source]

Deconfound the variables R_data and D_data for permutation testing.

Parameters:

D_data (numpy.ndarray):

The input data array.

R_data (numpy.ndarray or None):

The second input data array (default: None). If None, assumes we are working across visits, and R_data represents the Viterbi path of a sequence.

confounds (numpy.ndarray or None):

The confounds array (default: None).

Returns:

D_data (numpy.ndarray):

Deconfounded D_data array.

R_data (numpy.ndarray):

Deconfounded R_data array (returns None if R_data is None). If R_data is None, assumes we are working across visits

glhmm.statistics.detect_significant_intervals(pval, alpha)[source]

Detect intervals of consecutive True values in a boolean array.

Parameters:
  • p_values (numpy.ndarray) – An array of p-values.

  • alpha (float, optional) – Threshold for significance.

  • Returns

  • ----------

  • tuple (list of) – A list of tuples representing the start and end indices (inclusive) of each interval of consecutive True values.

  • Example – array = [False, False, False, True, True, True, False, False, True, True, False] detect_intervals(array) output: [(3, 5), (8, 9)]

glhmm.statistics.generate_vpath_1D(vpath)[source]

Convert a 2D array representing a matrix with one non-zero element in each row into a 1D array where each element is the column index of the non-zero element.

Parameters: vpath(numpy.ndarray):

A 2D array where each row has only one non-zero element. Or a 1D array where each row represents a sate number

Returns: vpath_array(numpy.ndarray):

A 1D array containing the column indices of the non-zero elements. If the input array is already 1D, it returns a copy of the input array.

glhmm.statistics.get_concatenate_sessions(D_sessions, R_sessions=None, idx_sessions=None)[source]

Converts a 3D matrix into a 2D matrix by concatenating timepoints of every trial session into a new D-matrix.

Parameters:

D_sessions (numpy.ndarray):

D-matrix for each session.

R_sessions (numpy.ndarray):

R-matrix time for each trial.

idx_sessions (numpy.ndarray):

Indices representing the start and end of trials for each session.

Returns:

D_con (numpy.ndarray):

Concatenated D-matrix.

R_con (numpy.ndarray):

Concatenated R-matrix.

idx_sessions_con (numpy.ndarray):

Updated indices after concatenation.

glhmm.statistics.get_concatenate_subjects(D_sessions)[source]

Converts a 3D matrix into a 2D matrix by concatenating timepoints of every subject into a new D-matrix.

Parameters:

D_sessions (numpy.ndarray):

D-matrix for each subject.

Returns:

D_con (numpy.ndarray):

Concatenated D-matrix.

glhmm.statistics.get_indices_array(idx_data)[source]

Generates an indices array based on given data indices.

Parameters:

idx_data (numpy.ndarray):

The data indices array.

Returns:

idx_array (numpy.ndarray):

The generated indices array.

glhmm.statistics.get_indices_from_list(data_list, count_timestamps=True)[source]

Generate indices representing the start and end timestamps for each subject or session from a given data list.

Parameters:

data_list (list):

List containing data for each subject or session.

count_timestamps (bool), default=True:

If True, counts timestamps for each element in data_list, otherwise assumes each element in data_list is already a count of timestamps.

Returns:

indices (ndarray):

Array with start and end indices for each subject’s timestamps.

glhmm.statistics.get_indices_session(data_label)[source]

Generate session indices in the data based on provided labels. This is done by using ‘data_label’ to define sessions and generates corresponding indices. The resulting ‘idx_data_sessions’ array represents the intervals for each session in the data.

Parameters:

data_label (ndarray):

Array representing the labels for data to be indexed into sessions.

Returns:

idx_data_sessions (ndarray):

The indices of datapoints within each session. It should be a 2D array where each row represents the start and end index for a trial.

Example: get_indices_session(np.array([1, 1, 2, 2, 2, 3, 3, 3, 3])) array([[0, 2],

[2, 5], [5, 9]])

glhmm.statistics.get_indices_timestamp(n_timestamps, n_subjects)[source]

Generate indices of the timestamps for each subject in the data.

Parameters:

n_timestamps (int):

Number of timestamps.

n_subjects (int):

Number of subjects.

Returns:

indices (ndarray):

Array representing the indices of the timestamps for each subject.

Example: get_indices_timestamp(5, 3) array([[ 0, 5],

[ 5, 10], [10, 15]])

glhmm.statistics.get_indices_update_nan(idx_data, nan_mask)[source]

Update interval indices based on missing values in the data.

Parameters:

idx_data (numpy.ndarray):

Array of shape (n_intervals, 2) representing the start and end indices of each interval.

nan_mask (bool):

Boolean mask indicating the presence of missing values in the data.

Returns:

idx_data_update (numpy.ndarray):

Updated interval indices after accounting for missing values.

glhmm.statistics.get_input_shape(D_data, R_data, verbose)[source]

Computes the input shape parameters for permutation testing.

Parameters:

D_data (numpy.ndarray):

The input data array.

R_data (numpy.ndarray):

The dependent variable.

verbose (bool):

If True, display progress messages. If False, suppress progress messages.

Returns:

n_T (int):

The number of timepoints.

n_ST (int):

The number of subjects or trials.

n_p (int):

The number of features.

D_data (numpy.ndarray):

The updated input data array.

R_data (numpy.ndarray):

The updated dependent variable.

glhmm.statistics.get_pval(test_statistics, Nperm, method, t, pval, FWER_correction=False, test_combination=False)[source]

Computes p-values and correlation matrix for permutation testing. # Ref: https://github.com/OHBA-analysis/HMM-MAR/blob/master/utils/testing/permtest_aux.m

Parameters:

test_statistics (numpy.ndarray):

The permutation array.

pval_perms (numpy.ndarray):

The p-value permutation array.

Nperm (int):

The number of permutations.

method (str):

The method used for permutation testing.

t (int):

The timepoint index.

pval (numpy.ndarray):

The p-value array.

Returns:

pval (numpy.ndarray):

Updated updated p-value .

glhmm.statistics.identify_coloumns_for_t_and_f_tests(R_data, method, identify_categories=True, category_lim=None)[source]

Detect columns in R_data that are categorical. Used to detect which columns to perm t-statistics and F-statistics for later analysis.

Parameters:

R_datanumpy.ndarray

The 3D array containing categorical values.

identify_categoriesbool or list or numpy.ndarray, optional, default=True

If True, automatically identify categorical columns. If list or ndarray, use the provided list of column indices.

methodstr, optional, default=”univariate”

The method to perform the tests. Only “univariate” is currently supported.

category_limint or None, optional, default=None

Maximum allowed number of categories for F-test. Acts as a safety measure for columns with integer values, like age, which may be mistakenly identified as multiple categories.

Returns:

dict

A dictionary containing the columns for t-test (“t_test_cols”) and F-test (“f_test_cols”).

glhmm.statistics.initialize_arrays(R_data, n_p, n_q, n_T, method, Nperm, test_statistics_option, test_combination=False)[source]
glhmm.statistics.initialize_permutation_matrices(method, Nperm, n_p, n_q, D_data, test_combination=False)[source]

Initializes the permutation matrices and projection matrix for permutation testing.

Parameters:

method (str):

The method to use for permutation testing.

Nperm (int):

The number of permutations.

n_p (int):

The number of features.

n_q (int):

The number of predictions.

D_data (numpy.ndarray):

The independent variable.

Returns:

test_statistics (numpy.ndarray):

The permutation array.

pval_perms (numpy.ndarray):

The p-value permutation array.

proj (numpy.ndarray or None):

The projection matrix (None for correlation methods).

glhmm.statistics.permutation_matrix_across_subjects(Nperm, D_t)[source]

Generates a normal permutation matrix with the assumption that each index is independent across subjects.

Parameters:

Nperm (int):

The number of permutations.

D_t (numpy.ndarray):

D-matrix at timepoint ‘t’

Returns:

permutation_matrix (numpy.ndarray):

Permutation matrix of subjects it got a shape (n_ST, Nperm)

glhmm.statistics.permutation_matrix_across_trials_within_session(Nperm, R_t, idx_array, trial_timepoints=None)[source]

Generates permutation matrix of within-session across-trial data based on given indices.

Parameters:

Nperm (int):

The number of permutations.

R_t (numpy.ndarray):

The preprocessed data array.

idx_array (numpy.ndarray):

The indices array.

trial_timepoints (int):

Number of timepoints for each trial (default: None)

Returns:

permutation_matrix (numpy.ndarray):

Permutation matrix of subjects it got a shape (n_ST, Nperm)

glhmm.statistics.permutation_matrix_within_subject_across_sessions(Nperm, D_t, idx_array)[source]

Generates permutation matrix of within-session across-session data based on given indices.

Parameters:

Nperm (int):

The number of permutations.

D_t (numpy.ndarray):

The preprocessed data array.

idx_array (numpy.ndarray):

The indices array.

Returns:

permutation_matrix (numpy.ndarray):

The within-session continuos indices array.

glhmm.statistics.permute_subject_trial_idx(idx_array)[source]

Permutes an array based on unique values while maintaining the structure.

Parameters:

idx_array (numpy.ndarray):

Input array to be permuted.

Returns:

permuted_array (numpy.ndarray):

Permuted matrix based on unique values.

glhmm.statistics.process_family_structure(dict_family, Nperm)[source]

Process a dictionary containing family structure information.

Parameters:

dict_family (dict): Dictionary containing family structure information.

file_location (str): The file location of the family structure data in CSV format. M (numpy.ndarray, optional): The matrix of attributes, which is not typically required.

Defaults to None.

nP (int): The number of permutations to generate. CMC (bool, optional): A flag indicating whether to use the Conditional Monte Carlo method (CMC).

Defaults to False.

EE (bool, optional): A flag indicating whether to assume exchangeable errors, which allows permutation.

Defaults to True.

Nperm (int): Number of permutations.

Returns:

dict_mfam (dict): Modified dictionary with processed values.
EB (numpy.ndarray):

Block structure representing relationships between subjects.

M (numpy.ndarray, optional), default=None:

The matrix of attributes, which is not typically required.

nP (int):

The number of permutations to generate.

CMC (bool, optional), default=False:

A flag indicating whether to use the Conditional Monte Carlo method (CMC).

EE (bool, optional), default=True:

A flag indicating whether to assume exchangeable errors, which allows permutation.

glhmm.statistics.pval_cluster_based_correction(test_statistics, pval, alpha=0.05)[source]

Perform cluster-based correction on test statistics using the output from permutation testing. The function corrects p-values by using the test statistics and p-values obtained from permutation testing. It converts the test statistics into z-based statistics, allowing to threshold and identify cluster sizes. The p-value map from permutation testing results is then thresholded using the cluster size derived from z-based statistics.

Parameters:
  • test_statistics ((numpy.ndarray)) – 2D or 3D array of test statistics. 2D if you have applied permutation testing using “regression”.

  • pval ((numpy.ndarray)) – 2D or 1D array of p-values obtained from permutation testing. 1D if you have applied permutation testing using “regression”.

  • alpha ((float, optional), default=0.05) – Significance level for cluster-based correction.

Returns:

p_values – Corrected p-values after cluster-based correction.

Return type:

(numpy.ndarray)

glhmm.statistics.pval_correction(pval, method='fdr_bh', alpha=0.05, include_nan=True, nan_diagonal=False)[source]

Adjusts p-values for multiple testing.

Parameters:

pval (numpy.ndarray):

numpy array of p-values.

method (str, optional): method used for FDR correction, default=’fdr_bh.

bonferroni : one-step correction sidak : one-step correction holm-sidak : step down method using Sidak adjustments holm : step-down method using Bonferroni adjustments simes-hochberg : step-up method (independent) hommel : closed method based on Simes tests (non-negative) fdr_bh : Benjamini/Hochberg (non-negative) fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative) fdr_tsbky : two stage fdr correction (non-negative)

alpha (float, optional):

Significance level (default: 0.05).

include_nan, default=True:

Include NaN values during the correction of p-values if True. Exclude NaN values if False.

nan_diagonal, default=False:

Add NaN values to the diagonal if True.

Returns:

pval_corrected (numpy.ndarray):

numpy array of corrected p-values.

significant (numpy.ndarray):

numpy array of boolean values indicating significant p-values.

glhmm.statistics.reconstruct_concatenated_design(D_con, D_sessions=None, n_timepoints=None, n_trials=None, n_channels=None)[source]

Reconstructs the concatenated D-matrix to the original session variables.

Parameters:

D_con (numpy.ndarray):

Concatenated D-matrix.

D_sessions (numpy.ndarray, optional):

Original D-matrix for each session.

n_timepoints (int, optional):

Number of timepoints per trial.

n_trials (int, optional):

Number of trials per session.

n_channels (int, optional):

Number of channels.

Returns:

D_reconstruct (numpy.ndarray):

Reconstructed D-matrix for each session.

glhmm.statistics.remove_nan_values(D_data, R_data, method)[source]

Remove rows with NaN values from input data arrays.

Parameters:
  • D_data (numpy.ndarray) – Input data array containing features.

  • R_data (numpy.ndarray) – Input data array containing response values.

Returns:

  • D_data (numpy.ndarray) – Cleaned feature data (D_data) with NaN values removed.

  • R_data (numpy.ndarray) – Cleaned response data (R_data) with NaN values removed.

  • nan_mask(bool) – Array that mask the position of the NaN values with True and False for non-nan values

glhmm.statistics.surrogate_state_time(perm, viterbi_path, n_states)[source]

Generates surrogate state-time matrix based on a given Viterbi path.

Parameters:

perm (int):

The permutation number.

viterbi_path (numpy.ndarray):

1D array or 2D matrix containing the Viterbi path.

n_states (int):

The number of states

Returns:

viterbi_path_surrogate (numpy.ndarray):

A 1D array representing the surrogate Viterbi path

glhmm.statistics.surrogate_viterbi_path(viterbi_path, n_states)[source]

Generate surrogate Viterbi path based on state-time matrix.

Parameters:

viterbi_path (numpy.ndarray):

1D array or 2D matrix containing the Viterbi path.

n_states (int):

Number of states in the hidden Markov model.

Returns:

viterbi_path_surrogate (numpy.ndarray):

Surrogate Viterbi path as a 1D array representing the state indices. The number of states in the array varies from 1 to n_states

glhmm.statistics.test_across_sessions_within_subject(D_data, R_data, idx_data, method='regression', Nperm=0, confounds=None, verbose=True, test_statistics_option=False, FWER_correction=False, identify_categories=False, category_lim=10, test_combination=False)[source]

Perform permutation testing across sessions within the same subject, while keeping the trial order the same. This procedure is particularly valuable for investigating the effects of long-term treatments or monitoring changes in brain responses across sessions over time. Three options are available to customize the statistical analysis to a particular research questions:

  • ‘regression’: Perform permutation testing using regression analysis.

  • ‘correlation’: Conduct permutation testing with correlation analysis.

  • ‘cca’: Apply permutation testing using canonical correlation analysis.

Parameters:

D_data (numpy.ndarray):

Input data array of shape that can be either a 2D array or a 3D array. For 2D array, it got a shape of (n, p), where n_ST represent the number of subjects, and each column represents a feature (e.g., brain region). For a 3D array,it got a shape (T, n, p), where the first dimension represents timepoints, the second dimension represents the number of trials, and the third dimension represents features/predictors.

R_data (numpy.ndarray):

The dependent-variable can be either a 2D array or a 3D array. For 2D array, it got a shape of (n, q), where n represent the number of trials, and q represents the outcome/dependent variable For a 3D array,it got a shape (T, n, q), where the first dimension represents timepoints, the second dimension represents the number of trials, and the third dimension represents a dependent variable

idx_data (numpy.ndarray):

The indices for each trial within the session. It should be a 2D array where each row represents the start and end index for a trial.

method (str, optional), default=”regression”:

The statistical method to be used for the permutation test. Valid options are “regression”, “univariate”, or “cca”. Note: “cca” stands for Canonical Correlation Analysis

Nperm (int), default=0:

Number of permutations to perform.

confounds (numpy.ndarray or None, optional):

The confounding variables to be regressed out from the input data (D_data). If provided, the regression analysis is performed to remove the confounding effects. (default: None):

verbose (bool, optional), default=False:

If True, display progress messages and prints. If False, suppress messages.

test_statistics_option (bool, optional), default=False:

If True, the function will return the test statistics for each permutation.

FWER_correction (bool, optional), default=False:

Specify whether to perform family-wise error rate (FWER) correction for multiple comparisons using the MaxT method. Note: FWER_correction is not necessary if pval_correction is applied later for multiple comparison p-value correction.

identify_categoriesbool or list or numpy.ndarray, optional, default=True.

If True, automatically identify categorical columns. If list or ndarray, use the provided list of column indices.

category_limint or None, optional, default=None.

Maximum allowed number of categories for F-test. Acts as a safety measure for columns with integer values, like age, which may be mistakenly identified as multiple categories.

test_combination, default=False:

Calculates geometric means of p-values using permutation testing. Valid options are: - True (bool): Return a single geometric mean per time point. - “across_rows” (str): Calculate geometric means for each row. - “across_columns” (str): Calculate geometric means for each column.

Returns:

result (dict):

A dictionary containing the following keys. Depending on the test_statistics_option and method, it can return the p-values, correlation coefficients, test statisticss. ‘pval’: P-values for the test with shapes based on the method:

  • method==”Regression”: (T, p)

  • method==”univariate”: (T, p, q)

  • method==”cca”: (T, 1)

‘test_statistics’: test statistics is the permutation distribution if test_statistics_option is True, else None.
  • method==”Regression”: (T, Nperm, p)

  • method==”univariate”: (T, Nperm, p, q)

  • method==”cca”: (T, Nperm, 1)

‘base_statistics’: Correlation coefficients for the test with shape (T, p, q) if method==”univariate”, else None. ‘test_type’: the type of test, which is the name of the function ‘method’: the method used for analysis Valid options are

“regression”, “univariate”, or “cca”, “one_vs_rest” and “state_pairs” (default: “regression”).

‘max_correction’: Specifies if FWER has been applied using MaxT, can either output True or False. ‘Nperm’ :The number of permutations that has been performed.

glhmm.statistics.test_across_subjects(D_data, R_data, method='regression', Nperm=0, confounds=None, dict_family=None, verbose=True, test_statistics_option=False, FWER_correction=False, identify_categories=False, category_lim=10, test_combination=False)[source]

Perform permutation testing across subjects. Family structure can be taken into account by inputting “dict_family”. Three options are available to customize the statistical analysis to a particular research questions:

  • “regression”: Perform permutation testing using regression analysis.

  • “univariate”: Conduct permutation testing with correlation analysis.

  • “cca”: Apply permutation testing using canonical correlation analysis.

Parameters:

D_data (numpy.ndarray):

Input data array of shape that can be either a 2D array or a 3D array. For 2D, the data is represented as a (n, p) matrix, where n represents the number of subjects, and p represents the number of predictors. For 3D array, it has a shape (T, n, q), where the first dimension represents timepoints, the second dimension represents the number of subjects, and the third dimension represents features. For 3D, permutation testing is performed per timepoint for each subject.

R_data (numpy.ndarray):

The dependent variable can be either a 2D array or a 3D array. For 2D array, it has a shape of (n, q), where n represents the number of subjects, and q represents the outcome of the dependent variable. For 3D array, it has a shape (T, n, q), where the first dimension represents timepoints, the second dimension represents the number of subjects, and the third dimension represents a dependent variable. For 3D, permutation testing is performed per timepoint for each subject.

method (str, optional), default=”regression”:

The statistical method to be used for the permutation test. Valid options are “regression”, “univariate”, or “cca”. Note: “cca” stands for Canonical Correlation Analysis

Nperm (int), default=0:

Number of permutations to perform.

confounds (numpy.ndarray or None, optional), default=None:

The confounding variables to be regressed out from the input data (D_data). If provided, the regression analysis is performed to remove the confounding effects.

dict_family (dict):

Dictionary containing family structure information. - file_location (str): The file location of the family structure data in CSV format. - M (numpy.ndarray, optional): The matrix of attributes, which is not typically required.

Defaults to None.

  • CMC (bool, optional), default=False:

    A flag indicating whether to use the Conditional Monte Carlo method (CMC).

  • EE (bool, optional), default=True: A flag indicating whether to assume exchangeable errors, which allows permutation.

verbose (bool, optional):

If True, display progress messages. If False, suppress progress messages.

test_statistics_option (bool, optional), default=False:

If True, the function will return the test statistics for each permutation.

FWER_correction (bool, optional), default=False:

Specify whether to perform family-wise error rate (FWER) correction using the MaxT method. Note: FWER_correction is not necessary if pval_correction is applied later for multiple comparison p-value correction.

identify_categoriesbool or list or numpy.ndarray, optional, default=True

If True, automatically identify categorical columns. If list or ndarray, use the provided list of column indices.

category_limint or None, optional, default=10

Maximum allowed number of categories for F-test. Acts as a safety measure for columns with integer values, like age, which may be mistakenly identified as multiple categories.

test_combination, default=False:

Calculates geometric means of p-values using permutation testing. In the context of p-values from permutation testing, calculating geometric means can be useful for summarizing results across multiple tests to get insights into the overall statistical significance across experimental conditions. Valid options are:

  • True (bool): Return a single geometric mean value.

  • “across_rows” (str): Calculates geometric means aggregated across rows.

  • “across_columns” (str): Calculates geometric means aggregated across columns.

Returns:

result (dict):

A dictionary containing the following keys. Depending on the test_statistics_option and method, it can return the p-values, correlation coefficients, test statisticss. ‘pval’: P-values for the test with shapes based on the method:

  • method==”Regression”: (T, p)

  • method==”univariate”: (T, p, q)

  • method==”cca”: (T, 1)

‘test_statistics’: test statistics is the permutation distribution if test_statistics_option is True, else None.
  • method==”Regression”: (T, Nperm, p)

  • method==”univariate”: (T, Nperm, p, q)

  • method==”cca”: (T, Nperm, 1)

‘base_statistics’: Correlation coefficients for the test with shape (T, p, q) if method==”univariate”, else None. ‘test_type’: the type of test, which is the name of the function ‘method’: the method used for analysis Valid options are “regression”, “univariate”, or “cca”, “one_vs_rest” and “state_pairs”. ‘max_correction’: Specifies if FWER has been applied using MaxT, can either output True or False. ‘performed_tests’: A dictionary that marks the columns in the test_statistics or p-value matrix corresponding to the (q dimension) where t-tests or F-tests have been performed. ‘Nperm’ :The number of permutations that has been performed.

glhmm.statistics.test_across_trials_within_session(D_data, R_data, idx_data, method='regression', Nperm=0, confounds=None, trial_timepoints=None, verbose=True, test_statistics_option=False, FWER_correction=False, identify_categories=False, category_lim=10, test_combination=False)[source]

Perform permutation testing across different trials within a session. An example could be if we want to test if any learning is happening during a session that might speed up times.

Three options are available to customize the statistical analysis to a particular research questions:
  • ‘regression’: Perform permutation testing using regression analysis.

  • ‘correlation’: Conduct permutation testing with correlation analysis.

  • ‘cca’: Apply permutation testing using canonical correlation analysis.

Parameters:

D_data (numpy.ndarray):

Input data array of shape that can be either a 2D array or a 3D array. For 2D array, it got a shape of (n, p), where n represent the number of trials, and p represents the number of predictors (e.g., brain region) For a 3D array,it got a shape (T, n, p), where the first dimension represents timepoints, the second dimension represents the number of trials, and the third dimension represents features/predictors. In the latter case, permutation testing is performed per timepoint for each subject.

R_data (numpy.ndarray):

The dependent-variable can be either a 2D array or a 3D array. For 2D array, it got a shape of (n, q), where n represent the number of trials, and q represents the outcome/dependent variable For a 3D array,it got a shape (T, n, q), where the first dimension represents timepoints, the second dimension represents the number of trials, and the third dimension represents a dependent variable

idx_data (numpy.ndarray):

The indices for each trial within the session. It should be a 2D array where each row represents the start and end index for a trial.

method (str, optional), default=”regression”:

The statistical method to be used for the permutation test. Valid options are “regression”, “univariate”, or “cca”. Note: “cca” stands for Canonical Correlation Analysis

Nperm (int), default=0:

Number of permutations to perform.

confounds (numpy.ndarray or None, optional), default=None:

The confounding variables to be regressed out from the input data (D_data). If provided, the regression analysis is performed to remove the confounding effects.

trial_timepoints (int), default=None:

Number of timepoints for each trial.

verbose (bool, optional), default=True:

If True, display progress messages. If False, suppress progress messages.

test_statistics_option (bool, optional), default=False:

If True, the function will return the test statistics for each permutation.

FWER_correction (bool, optional), default= False:

Specify whether to perform family-wise error rate (FWER) correction for multiple comparisons using the MaxT method. Note: FWER_correction is not necessary if pval_correction is applied later for multiple comparison p-value correction.

identify_categories, default=True:

bool or list or numpy.ndarray, optional. If True, automatically identify categorical columns. If list or ndarray, use the provided list of column indices.

category_limint or None, optional, default=None

Maximum allowed number of categories for F-test. Acts as a safety measure for columns with integer values, like age, which may be mistakenly identified as multiple categories.

test_combination, default=False:

Calculates geometric means of p-values using permutation testing. Valid options are: - True (bool): Return a single geometric mean per time point. - “across_rows” (str): Calculate geometric means for each row. - “across_columns” (str): Calculate geometric means for each column.

Returns:

result (dict): A dictionary containing the following keys. Depending on the test_statistics_option and method, it can return the p-values,

correlation coefficients, test statisticss. ‘pval’: P-values for the test with shapes based on the method:

  • method==”Regression”: (T, p)

  • method==”univariate”: (T, p, q)

  • method==”cca”: (T, 1)

‘test_statistics’: test statistics is the permutation distribution if test_statistics_option is True, else None.
  • method==”Regression”: (T, Nperm, p)

  • method==”univariate”: (T, Nperm, p, q)

  • method==”cca”: (T, Nperm, 1)

‘base_statistics’: Correlation coefficients for the test with shape (T, p, q) if method==”univariate”, else None. ‘test_type’: the type of test, which is the name of the function ‘method’: the method used for analysis Valid options are:

“regression”, “univariate”, or “cca”, “one_vs_rest” and “state_pairs”.

‘max_correction’: Specifies if FWER has been applied using MaxT, can either output True or False. ‘Nperm’ :The number of permutations that has been performed.

glhmm.statistics.test_across_visits(input_data, vpath_data, n_states, method='regression', Nperm=0, verbose=True, confounds=None, test_statistics_option=False, pairwise_statistic='mean', FWER_correction=False, category_lim=None, identify_categories=False)[source]
glhmm.statistics.test_statistics_calculations(Din, Rin, perm, test_statistics, proj, method, category_columns=[], test_combination=False)[source]

Calculates the test_statistics array and pval_perms array based on the given data and method.

Parameters:

Din (numpy.ndarray):

The data array.

Rin (numpy.ndarray):

The dependent variable.

perm (int):

The permutation index.

pval_perms (numpy.ndarray):

The p-value permutation array.

test_statistics (numpy.ndarray):

The permutation array.

proj (numpy.ndarray or None):

The projection matrix (None for correlation methods).

method (str):

The method used for permutation testing.

Returns:

test_statistics (numpy.ndarray):

Updated test_statistics array.

pval_perms (numpy.ndarray):

Updated pval_perms array.

glhmm.statistics.validate_condition(condition, error_message)[source]

Validates a given condition and raises a ValueError with the specified error message if the condition is not met.

Parameters:

condition (bool):

The condition to check.

error_message (str):

The error message to raise if the condition is not met.

glhmm.statistics.viterbi_path_to_stc(viterbi_path, n_states)[source]

Convert Viterbi path to state-time matrix.

Parameters:

viterbi_path (numpy.ndarray):

1D array or 2D matrix containing the Viterbi path.

n_states (int):

Number of states in the hidden Markov model.

Returns:

stc (numpy.ndarray):

State-time matrix where each row represents a time point and each column represents a state.