ComScan.neurocombat module¶
-
class
ComScan.neurocombat.AutoCombat(features, sites_features=None, sites=None, size_min=10, metric='distortion', use_ref_site=False, scaler_clustering=StandardScaler(), discrete_cluster_features=None, continuous_cluster_features=None, features_reduction=None, n_components=2, threshold_missing_sites_features=25, drop_site_columns=False, discrete_combat_covariates=None, continuous_combat_covariates=None, empirical_bayes=True, parametric=True, mean_only=False, return_only_features=False, n_jobs=1, random_state=123, copy=True)[source]¶ Bases:
ComScan.neurocombat.CombatHarmonize/normalize features using Combat’s parametric empirical Bayes framework.
Combat need to have well-known acquisition sites or scanner to harmonize features. It is sometimes difficult to define an imaging acquisition site if on two sites imaging parameters can be really similar. ComScan gives the possibility to automatically determine the number of sites and their association based on imaging features (e.g. dicom tags) by clustering. Thus ComScan can be used on data not seen during training because it can predict which imager best matches the one it has seen during training.
- Parameters
features (Target features to be harmonized.) –
sites_features (Target variable for define (acquisition sites or scanner) by clustering.) –
sites (Target variable for ComScan problems (e.g. acquisition sites or scanner)) – This argument is Optional. If this argument is provided will run traditional ComBat else AutoCombat. In this case args: sites_features, size_min, method, scaler_clustering, discrete_cluster_features, continuous_cluster_features, threshold_missing_sites_features, drop_site_columns are unused.
size_min (Constraint of the minimum size of site for clustering.) –
metric ("distortion", "silhouette" or "calinski_harabasz".) – Metric to define the optimal number of cluster. Default: distortion.
use_ref_site (Use a ref site to be used as reference for batch adjustment. The ref site used is the cluster) – with the minimal inertia. i.e minimizing within-cluster sum-of-squares.
scaler_clustering (Scaler to use for continuous site features. Need to be a scikit learn scaler.) – Default is
StandardScaler().discrete_cluster_features (Target sites_features which are categorical to one-hot (e.g. ManufacturerModelName)) –
continuous_cluster_features (Target sites_features which are continuous to scale (e.g. EchoTime)) –
features_reduction (Method for reduction of the embedded space with n_components. Can be 'pca' or 'umap'.) – Default is None.
n_components (Dimension of the embedded space for features reduction.) – Default is 2.
threshold_missing_sites_features (Threshold of acceptable missing features for sites features clustering.) – 25 specify that 75% of all samples need to have this features. Default is 25.
drop_site_columns (Drop sites columns find by clustering in return.) – Default is False.
discrete_combat_covariates (Target covariates which are categorical (e.g. male or female)) –
continuous_combat_covariates (Target covariates which are continuous (e.g. age)) –
empirical_bayes (Performed empirical bayes.) – Default is True.
parametric (Performed parametric adjustements.) – Default is True.
mean_only (Adjust only the mean (no scaling)) – Default is False.
return_only_features (Return only features.) – Default is False.
n_jobs (The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.) – If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Default is 1.
- random_stateint, RandomState instance or None, optional, default: 123
If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- copySet to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
Default is True.
-
cls_¶ - Type
clustering classifier object
-
info_clustering_¶ wicss_clusters, best_wicss_cluster
- Type
Dictionary that stores info of clustering from sites_features with cluster_nb, labels, ref_label
-
cls_feature_reduction_¶ - Type
feature reduction object
-
clustering_data_features_mean_¶ - Type
dict of mean for clustering data (use for imputation)
-
X_hat_¶ - Type
array after fit
-
clustering_data_features_¶ - Type
column features for clustering from train (after encoding + scaling)
-
clustering_data_discrete_features_¶ - Type
column features for clustering after one-hot encoding
-
dict_cls_fitted¶ - Type
dict of columns of fitted cls used for fitted clustering data
Examples
>>> data = pd.DataFrame([{"features_1": 0.97, "site_features_0": 2, "site_features_1": 0}, >>> {"features_1": 1.35, "site_features_0": 1.01, "site_features_1": 1}, >>> {"features_1": 1.43, "site_features_0": 1.09, "site_features_1": 1}, >>> {"features_1": 0.85, "site_features_0": 2.3, "site_features_1": 0}])
>>> auto_combat = AutoCombat(features=["features_1"], sites_features=["site_features_0", "site_features_1"], >>> continuous_cluster_features=["site_features_0", "site_features_1"], size_min=2)) >>> print(auto_combat.fit(data)) AutoCombat(continuous_cluster_features=['site_features_0', 'site_features_1'], discrete_cluster_features=[], features=['features_1'], sites=['sites'], sites_features=['site_features_0', 'site_features_1'], size_min=2))
Notes
NaNs values are not treated.
Warning
Be sure to have the same sites features between fit and transform. The choice has not been to imposed an entry format to check a colum name or a slice.
-
fit(X, *y)[source]¶ Compute sites, ref_site using clustering. Then compute the stand mean, var pooled, gamma star, delta star to be used for later adjusted data from Combat.
- Parameters
X (array-like or DataFrame of shape (n_samples, n_features)) – Requires the columns needed by the ComScan(). The data used to find adjustments.
*y (y in scikit learn: None) – Ignored.
- Returns
self – Fitted ComScan estimator.
- Return type
object
-
transform(X)[source]¶ Scale features of X according to combat estimator.
- Parameters
X (array-like or DataFrame of shape (n_samples, n_features) Requires the columns needed by the Combat()) – Input data that will be transformed.
- Returns
Xt – Transformed data.
- Return type
array-like of shape (n_samples, n_features)
-
class
ComScan.neurocombat.Combat(features, sites, discrete_covariates=None, continuous_covariates=None, ref_site=None, empirical_bayes=True, parametric=True, mean_only=False, return_only_features=False, raise_ref_site=True, copy=True)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinHarmonize/normalize features using Combat’s parametric empirical Bayes framework
- Parameters
features (Target features to be harmonized) –
sites (Target variable for ComScan problems (e.g. acquisition sites or scanner)) –
discrete_covariates (Target covariates which are categorical (e.g. male or female)) –
continuous_covariates (Target covariates which are continuous (e.g. age)) –
ref_site (Variable value (acquisition sites or scanner) to be used as reference for batch adjustment.) – Default is False.
empirical_bayes (Performed empirical bayes.) – Default is True.
parametric (Performed parametric adjustements.) – Default is True.
mean_only (Adjust only the mean (no scaling)) – Default is False.
return_only_features (Return only features.) – Default is False.
raise_ref_site (raise when the reference site pass as arguments not exist, else set to no reference.) – Default is True.
copy (Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array)) – Default is True.
-
info_dict_fit_¶ batch_levels, ref_level, n_batch, n_sample, sample_per_batch, batch_info
- Type
dictionary that stores batch info of fitted data with:
-
stand_mean_¶ Standardized mean
- Type
array-like
-
var_pooled_¶ Variance pooled
- Type
array-like
-
mod_mean_¶ Mod mean
- Type
array-like
-
gamma_star_¶ Adjustement gamma star
- Type
array-like
-
delta_star_¶ Adjustement delta star
- Type
array-like
-
info_dict_transform_¶ batch_levels, ref_level, n_batch, n_sample, sample_per_batch, batch_info
- Type
dictionary that stores batch info of transformed data with
Examples
>>> data = pd.DataFrame([{"features_1": 0.97, "features_2": 2, "sites": 0}, >>> {"features_1": 1.35, "features_2": 1.01, "sites": 1}, {"features_1": 1.43, "features_2": 1.09, "sites": 1}, >>> {"features_1": 0.85, "features_2": 2.3, "sites": 0}])
>>> combat = Combat(features=["features_1", "features_2"], sites=["sites"], ref_site=1) >>> print(combat.fit(data)) Combat(continuous_covariates=[], discrete_covariates=[], features=['features_1', 'features_2'], ref_site=1, sites=['sites']) >>> print(combat.gamma_star_) [[-11.85476756 27.30493785] [ 0. 0. ]] >>> print(combat.transform(data)) [[1.40593957 1.01395564 0. ] [1.35 1.01 1. ] [1.43 1.09 1. ] [1.37064296 1.08999992 0. ]]
Notes
NaNs values are not treated.
-
fit(X, *y)[source]¶ Compute the stand mean, var pooled, gamma star, delta star to be used for later adjusted data.
- Parameters
X (array-like or DataFrame of shape (n_samples, n_features) Requires the columns needed by the Combat()) – The data used to find adjustments.
*y (y in scikit learn: None) – Ignored.
- Returns
self – Fitted combat estimator.
- Return type
object
-
load_fit(filepath)[source]¶ load a fitted model attribute
info_dict_fit_,stand_mean_,var_pooled_,gamma_star_,delta_star_- Parameters
filepath (
str) – filepath of the pkl file to load- Return type
None
-
save_fit(filepath)[source]¶ save a fitted model attribute
info_dict_fit_,stand_mean_,var_pooled_,gamma_star_,delta_star_- Parameters
filepath (
str) – filepath were to save. if no extension .pkl will add it- Return type
None
-
transform(X)[source]¶ Scale features of X according to combat estimator.
- Parameters
X (array-like or DataFrame of shape (n_samples, n_features) Requires the columns needed by the Combat()) – Input data that will be transformed.
- Returns
Xt – Transformed data.
- Return type
array-like of shape (n_samples, n_features)
-
class
ComScan.neurocombat.ImageCombat(image_path, sites_features=None, sites=None, save_path_fit='fit_data', save_path_transform='transform_data', size_min=10, method='silhouette', use_ref_site=False, scaler_clustering=StandardScaler(), discrete_cluster_features=None, continuous_cluster_features=None, features_reduction=None, n_components=2, threshold_missing_sites_features=25, drop_site_columns=True, discrete_combat_covariates=None, continuous_combat_covariates=None, empirical_bayes=True, parametric=True, mean_only=False, random_state=123, flattened_dtype=<class 'numpy.float16'>, output_dtype=<class 'numpy.float32'>, copy=True)[source]¶ Bases:
ComScan.neurocombat.AutoCombatHarmonize/normalize features using Combat’s parametric empirical Bayes framework directly on image.
ImageCombat allow the possibility to Harmonize/normalize a set of NIFTI images. All images must have the same dimensions and orientation. A common mask is created based on an heuristic proposed by T.Nichols. Images are then vectorizing for ComScan. ImageCombat allows the possibily to use Combat (well-defined site) or AutoCombat (clustering for sites finding)
- Parameters
image_path (image_path of nifti files.) –
sites_features (Target variable for define (acquisition sites or scanner) by clustering.) –
sites (Target variable for ComScan problems (e.g. acquisition sites or scanner)) – This argument is Optional. If this argument is provided will run traditional ComBat. In this case args: sites_features, size_min, method, scaler_clustering, discrete_cluster_features, continuous_cluster_features, threshold_missing_sites_features, drop_site_columns are unused.
size_min (Constraint of the minimum size of site for clustering.) –
method ("silhouette" or "elbow". Method to define the optimal number of cluster.) – Default is silhouette.
use_ref_site (Use a ref site to be used as reference for batch adjustment. The ref site used is the cluster) – with the minimal inertia. i.e minimizing within-cluster sum-of-squares. Default is False.
scaler_clustering (Scaler to use for continuous site features. Need to be a scikit learn scaler.) – Default is
StandardScaler().discrete_cluster_features (Target sites_features which are categorical to one-hot (e.g. ManufacturerModelName)) –
continuous_cluster_features (Target sites_features which are continuous to scale (e.g. EchoTime)) –
features_reduction (Method for reduction of the embedded space with n_components. Can be 'pca' or 'umap'.) – Default is None.
n_components (Dimension of the embedded space for features reduction.) – Default is 2.
threshold_missing_sites_features (Threshold of acceptable missing features for sites features clustering.) – 25 specify that 75% of all samples need to have this features. Default is 25.
drop_site_columns (Drop sites columns find by clustering in return.) –
discrete_combat_covariates (Target covariates which are categorical (e.g. male or female)) –
continuous_combat_covariates (Target covariates which are continuous (e.g. age)) –
empirical_bayes (Performed empirical bayes.) – Default is True.
parametric (Performed parametric adjustements.) – Default is True.
mean_only (Adjust only the mean (no scaling)) – Default is False.
random_state (int, RandomState instance or None, optional, default: 123) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
copy (Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array)) – Default is True.
-
mask_¶ - Type
array-like of the common brain mask
-
flattened_array_¶ - Type
flattened array of all the training set
Notes
NaNs values are not treated.
-
fit(X, *y)[source]¶ Compute sites, ref_site using clustering. Then compute the stand mean, var pooled, gamma star, delta star to be used for later adjusted data from Combat.
- Parameters
X (array-like or DataFrame of shape (n_samples, n_features)) – Requires the columns needed by the ComScan(). The data used to find adjustments.
*y (y in scikit learn: None) – Ignored.
- Returns
self – Fitted ComScan estimator.
- Return type
object