PyCaret Official: Home --PyCaret PyCaret Guide: PyCaret Guide --PyCaret PyCaret Github:pycaret/pycaret: An open source, low-code machine learning library in Python
Parameters | Description | Details |
---|---|---|
data | {array-like, sparse matrix} | Shape (n_samples, n_features)Where n_samples is the number of samples, n_features is the number of features. |
target | string | The column name to be passed as a character string. The target variable can be binary or multiclass. For multiclass targets, all estimators are wrapped in the OneVs Rest classifier. |
train_size | float, default = 0.7 | Training set size. By default, 70 of the data%Is used for training and validation. The rest of the data is tested/Used for holdout sets. |
sampling | bool, default = True | Sample size is 25,Beyond 000 samples, pycaret builds base estimators of various sample sizes from the original dataset. It returns performance plots of AUC, Accuracy, Recall, Precision, Kappa, and F1 values at various sample levels to help determine a suitable sample size for modeling. You must then enter the desired sample size for training and validation in a piecaret environment. Input sample_finalize if size is less than 1_model()Remaining dataset (1) only when is called-Sample) is used to fit the model. |
sample_estimator | object, default = None | If None, logistic regression is used by default. |
categorical_features | string, default = None | Categorical if the inferred data type is incorrect_You can use features to override the inferred type. When running the setup'column1'Use this parameter to categorical if the type of is inferred to be numeric instead of algorithmic_features = ['column1']You can override that type by passing. |
categorical_imputation | string, default = 'constant' | If a missing value is found in the category feature, a certain "not" is found._Entered with the "available" value. Another available option is'mode'In, enter the missing value using the most frequent value in the training dataset. |
ordinal_features | dictionary, default = None | Ordinal if the data contains ordinal features_You have to do different encoding with the features parameter. The data is'low'、'medium'、'high'Has a categorical variable with the value of, low< medium <ordinal if known to be high_features = { 'column_name' : ['low', 'medium', 'high'] }Can be passed as. The order of the list should be from lowest to highest. |
high_cardinality_features | string, default = None | If a feature with high cardinality is included, it can be compressed to a smaller level by passing it as a list of column names with high cardinality. |
high_cardinality_method | string, default = 'frequency' | Frequency'frequency'When set to, the original value of the feature is replaced with a frequency distribution and quantified. Another available method is "clustering", which clusters the statistical attributes of the data and replaces the original value of the feature with a cluster label. |
numeric_features | string, default = None | If the inferred data type is incorrect, numeric_You can use features to override the inferred type. When running the setup'column1'If the type of is inferred as a category rather than a number, use this parameter to numeric_features = ['column1']It can be overwritten by passing. |
numeric_imputation | string, default = 'mean' | If a missing value is found in the numerical features, the average value of the features is used for input. Another available option is'median'In, enter the value using the median of the training dataset. |
date_features | string, default = None | If your data has a DateTime column that is not auto-discovered during setup_features = 'date_column_name'You can use this parameter by passing. It can work with multiple date columns. Date columns are not used in modeling. Instead, a feature extraction is performed and the date column is removed from the dataset. If the date column contains a timestamp, time-related features are also extracted. |
ignore_features | string, default = None | Param ignore if there are features that should be ignored for modeling_You can pass it to features. The ID and DateTime columns when inferred are automatically set to ignore for modeling purposes. |
normalize | bool, default = False | If set to True, the parameter normalized_The feature space is transformed using method. In general, linear algorithms perform better with normalized data, but the results may vary. |
normalize_method | string, default = 'zscore' | Defines the method used for normalization. By default, the normalization method is'zscore'Is set to. Standard zscore is z= (x - u) /Calculated as s. |
minmax | 0 -Scale and transform each feature individually so that it is within the range of 1. | |
maxabs | The maximum absolute value of each feature is 1.Each feature is scaled and converted individually so that it becomes 0. This is a data shift/Since it does not center, it does not destroy the spparity. | |
robust | Each feature is scaled and converted according to the range between the quark tiles. Robust scalers often give better results when the dataset contains outliers. | |
transformation | bool, default = False | When set to True, the power conversion is applied to make the data look more regular Gaussian. This is useful for modeling problems related to heterogeneity and other situations where normality is desired. The optimal parameters for stabilizing the variance and minimizing the skewness are estimated by the maximum likelihood method. |
transformation_method | string, default = 'yeo-johnson' | Defines the conversion method. By default, the conversion method is'yeo-johnson'Is set to. As another option'quantile'There is a conversion. Both transformations transform the feature set to follow a Gaussian or normal distribution. Note that the quantile transformation is non-linear and can distort the linear correlation between variables measured on the same scale. |
handle_unknown_categorical | bool, default = True | New if set to True/The unknown category level of unseen data is replaced with the most frequent or least frequent level trained in the training data. This method is unknown_categorical_It is defined by the method parameter. |
unknown_categorical_method | string, default = 'least_frequent' | A method used to replace an unknown categorical level of invisible data. The method is'least_frequent'Or'most_frequent'Can be set to. |
pca | bool, default = False | If set to True, pca_Dimensionality reduction is applied to project the data into lower dimensional space using the method defined by the method parameter. In supervised learning, pca is generally executed when dealing with high feature spaces or when memory is constrained. Note that not all datasets can be efficiently decomposed using linear PCA techniques, and applying PCA can result in information loss. Therefore, different pca to assess its impact_We recommend that you perform multiple experiments using methods. |
pca_method | string, default = 'linear' | The linear method uses singular value decomposition to perform linear dimensionality reduction. Other available options are: |
kernel | Dimensionality reduction using RVF kernel. | |
incremental | If the data set you want to decompose is too large to fit in memory'linear'Replace pca. | |
pca_components | int/float, default = 0.99 | pca_If the components are float, they are treated as a target percentage to retain the information. pca_If components are integers, they are treated as the number of features to be retained. pca_The components must be strictly less than the original features of the dataset. |
ignore_low_variance | bool, default = False | When set to True, all category features with non-statistically significant variances are removed from the dataset. The variance is calculated using the ratio of unique values to the number of samples and the ratio of the most common values to the frequency of the second highest value. |
combine_rare_levels | bool, default = False | If set to True, param rare_level_All levels of categorical features below the threshold defined by threshold are combined as one level. There must be at least two levels below the threshold for this to take effect. rare_level_threshold represents the percentile distribution of level frequency. In general, this method is applied to limit sparse matrices due to the large number of levels in categorical features. |
rare_level_threshold | float, default = 0.1 | Percentile distribution where rare categories are combined. combine_rare_Only enabled if levels is set to True. |
bin_numeric_features | list, default = None | When a list of numerical features is passed, they are converted to categorical features using KMeans. The number of clusters is'sturges'It is determined based on the law. This is only best for Gaussian data and underestimates the number of bins for large non-Gaussian datasets. |
remove_outliers | bool, default = False | When set to True, outliers are removed from the training data using PCA linear dimensionality reduction using singular value decomposition techniques. |
outliers_threshold | float, default = 0.05 | Percentage of outliers in the dataset/Percentage is the parameter outliers_It can be defined using threshold. By default, 0.05 is used. This is 0 for the value of each side of the tail of the distribution.It means that 025 is deleted from the training data. |
remove_multicollinearity | bool, default = False | When set to True, multicollinearity_Variables with cross-correlation higher than the threshold defined by the threshold parameter are removed. If the two features have a high correlation with each other, the feature with the lowest correlation with the target variable is deleted. |
multicollinearity_threshold | float, default = 0.9 | The threshold used to remove the correlated features. remove remove_Only enabled if multicollinearity is set to True. |
remove_perfect_collinearity | bool, default = False | When set to True, perfect collinearity(correlation=Feature of 1)Is removed from the dataset and the two features are 100%correlationしている場合、そのうちの1つがランダムにデータセットから削除されます。 |
create_clusters | bool, default = False | When set to True, it creates additional features where each instance is assigned to a cluster. The number of clusters is Calinski-Determined using a combination of Harabasz and Silhouette criteria. |
cluster_iter | int, default = 20 | The number of iterations used to create the cluster. Each iteration represents the size of the cluster. create create_Only valid if the clusters parameter is set to True. |
polynomial_features | bool, default = False | When set to True, polynomial exists in the numerical features in the dataset_New features are created based on the combination of all polynomials up to the degree defined by degree param. |
polynomial_degree | int, default = 2 | Degree of polynomial features. For example, the input sample is two-dimensional[a, b]In the form of=The polynomial features of 2 are as follows. 1, a, b, a^2, ab, b^2]It becomes. |
trigonometry_features | bool, default = False | If set to True, the polynomial that exists in the numerical features in the dataset_A new feature is created based on the combination of all trigonometric functions up to the order defined by the degree parameter. |
polynomial_threshold | float, default = 0.1 | The dataset holds polynomial and trigonometric features that fall within the defined threshold percentile of feature importance based on a combination of random forest, AdaBoost, and linear correlation. The remaining features are deleted before further processing is performed. |
group_features | list or list of list, default = None | Group if it contains features that have features related to the dataset_The features parameter can be used for statistical feature extraction. For example, numerical features in which datasets are related to each other ('Col1', 'Col2', 'Col3')If you have group_By passing a list containing column names under features, you can extract statistics such as mean, median, mode, and standard deviation. |
group_names | list, default = None | group_When features are passed, group as a list containing strings_You can pass the group name in the names parameter. group_The length of the list of names is group_Must be equal to the length of features. If the lengths don't match or the name isn't passed, group_1, group_New features are named in sequence, such as 2. |
feature_selection | bool, default = False | When set to True, a subset of features are selected using a combination of different sort importance techniques, such as Random Forest, Adaboost, and linear correlation with target variables. The size of the subset is feature_selection_It depends on param. This is commonly used to constrain the feature space to improve modeling efficiency. polynomial_features and features_When using interaction, feature_selection_It is highly recommended to define the threshold parameter with a lower value. |
feature_selection_threshold | float, default = 0.8 | Threshold used for feature selection (including newly created polynomial features). The higher the value, the larger the feature space. Features of different values, especially when polynomial features and inter-feature interactions are used_selection_It is recommended to make multiple trials using the threshold. Setting a very low value may be efficient, but it can result in underfitting. |
feature_interaction | bool, default = False | When set to True, interacts (a) with all numeric variables in the dataset, including polynomial and trigonometric features (if created).*b) Create a new feature by doing. This feature is not scalable and may not work as expected on datasets with large feature spaces. |
feature_ratio | bool, default = False | When set to True, the ratio of all numeric variables in the dataset (a/b) Calculate to create a new feature. This feature is not scalable and may not work as expected on datasets with large feature spaces. |
interaction_threshold | bool, default = 0.01 | polynomial_Similar to threshold, it is used to compress a newly created sparse matrix of features by interaction. Features whose importance based on a combination of Random Forest, AdaBoost, and Linear Correlation falls within the defined threshold percentiles are stored in the dataset. The remaining features are deleted before further processing. |
fix_imbalance | bool, default = False | If the dataset has an uneven distribution of target classes, fix_It can be modified using the imbalance parameter. When set to True, SMOTE by default(Synthetic Minority Over-sampling Technique)Is applied to create a composite data point for the minority class. |
fix_imbalance_method | obj, default = None | fix_Set imbalance to True and fix_imbalance_If method is set to None, by default it will oversample minority classes during cross-validation.'smote'Is applied. This parameter is'fit_resample'Supports methods'imblearn'Any module can be accepted. |
data_split_shuffle | bool, default = True | Set to False to prevent rows from being shuffled when splitting the data. |
folds_shuffle | bool, default = False | Set to False to prevent rows from being shuffled when using cross-validation. |
n_jobs | int, default = -1 | Specifies the number of jobs to run in parallel(For functions that support parallel processing)-1 means to use all processors. To execute all functions on a single processor, n_Set jobs to None. |
html | bool, default = True | Set to False to disable the run-time display of the monitor. If you are using an environment that does not support HTML, you need to set it to False. |
session_id | int, default = None | If None, a random seed will be generated and returned to the Information grid. A unique number is then distributed as a seed for all functions used during the experiment. This can be used for reproducibility after the entire experiment. |
log_experiment | bool, default = False | When set to True, all metrics and parameters are recorded on the MLFlow server. |
experiment_name | str, default = None | The name of the experiment to log. If set to None, by default'clf'Is used as an alias for the experiment name. |
log_plots | bool, default = False | When set to True, records a particular plot as a png file in MLflow. The default is set to False. |
log_profile | bool, default = False | If set to True, the data profile will also be recorded in MLflow as an html file. The default is set to False. |
log_data | bool, default = False | When set to True, training and test data will be recorded as csv. |
silent | bool, default = False | If set to True, no data type confirmation is required. All preprocessing is performed assuming an automatically inferred data type. Direct use outside of established pipelines is not recommended. |
verbose | Boolean, default = True | If verbose is set to False, the information grid will not be printed. |
profile | bool, default = False | When set to true, the data profile for exploratory data analysis is displayed in an interactive HTML report. |
Parameters | Description | Details |
---|---|---|
data | {array-like, sparse matrix} | Shape (n_samples, n_features)Where n_samples is the number of samples, n_features is the number of features. |
target | string | The column name to be passed as a character string. |
train_size | float, default = 0.7 | Training set size. By default, 70 of the data%Is used for training and validation. The rest of the data is tested/Used for holdout sets. |
sampling | bool, default = True | Sample size is 25,Beyond 000 samples, pycaret builds base estimators of various sample sizes from the original dataset. This returns performance plots of R2 values at different sample levels to help determine a suitable sample size for modeling. Next, you need to enter the desired sample size for training and validation in the pycaret environment. Input sample_finalize if size is less than 1_model()Remaining dataset (1) only when is called-sample) is used to fit the model. |
sample_estimator | object, default = None | If none, linear regression is used by default. |
categorical_features | string, default = None | Categorical if the inferred data type is incorrect_You can use features to override the inferred type. When running the setup'column1'Use this parameter to categorical if the type of is inferred to be numeric instead of algorithmic_features = ['column1']You can override that type by passing. |
categorical_imputation | string, default = 'constant' | If a missing value is found in the category feature, a certain "not" is found._Entered with the "available" value. Another available option is'mode'In, enter the missing value using the most frequent value in the training dataset. |
ordinal_features | dictionary, default = None | Ordinal if the data contains ordinal features_You have to do different encoding with the features parameter. The data is'low'、'medium'、'high'Has a categorical variable with the value of, low< medium <ordinal if known to be high_features = { 'column_name' : ['low', 'medium', 'high'] }Can be passed as. The order of the list should be from lowest to highest. |
high_cardinality_features | string, default = None | If your data contains high cardinality features, you can compress them to a lower level by passing them as a list of high cardinality column names. Feature compression is param high_cardinality_It is done by the method defined in method. |
high_cardinality_method | string, default = 'frequency' | Frequency'frequency'When set to, the original value of the feature is replaced with a frequency distribution and quantified. Another available method is "clustering", which clusters the statistical attributes of the data and replaces the original value of the feature with a cluster label. |
numeric_features | string, default = None | If the inferred data type is incorrect, numeric_You can use features to override the inferred type. When running the setup'column1'If the type of is inferred as a category rather than a number, use this parameter to numeric_features = ['column1']It can be overwritten by passing. |
numeric_imputation | string, default = 'mean' | If a missing value is found in the numerical features, the average value of the features is used for input. Another available option is'median'In, enter the value using the median of the training dataset. |
date_features | string, default = None | If your data has a DateTime column that is not auto-discovered during setup_features = 'date_column_name'You can use this parameter by passing. It can work with multiple date columns. Date columns are not used in modeling. Instead, a feature extraction is performed and the date column is removed from the dataset. If the date column contains a timestamp, time-related features are also extracted. |
ignore_features | string, default = None | Param ignore if there are features that should be ignored for modeling_You can pass it to features. The ID and DateTime columns when inferred are automatically set to ignore for modeling purposes. |
normalize | bool, default = False | If set to True, the parameter normalized_The feature space is transformed using method. In general, linear algorithms perform better with normalized data, but the results may vary. |
normalize_method | string, default = 'zscore' | Defines the method used for normalization. By default, the normalization method is'zscore'Is set to. Standard zscore is z= (x - u) /Calculated as s. |
minmax | minmax' : 0 -Scale and transform each feature individually so that it is within the range of 1. | |
maxabs | maxabs':The maximum absolute value of each feature is 1.Each feature is scaled and converted individually so that it becomes 0. This is a data shift/Since it does not center, it does not destroy the spparity. | |
robust | robust':Each feature is scaled and converted according to the range between the quark tiles. Robust scalers often give better results when the dataset contains outliers. | |
transformation | bool, default = False | Set to True to make the data more normal/A multiplier conversion is applied to make it look like Gauss. This is useful for modeling problems related to heterogeneity and other situations where normality is desired. The optimal parameters for stabilizing the variance and minimizing the skewness are estimated by the maximum likelihood method. |
transformation_method | string, default = 'yeo-johnson' | Defines the conversion method. By default, the conversion method is'yeo-johnson'Is set to. As another option'quantile'There is a conversion. Both transformations transform the feature set to follow a Gaussian or normal distribution. Note that the quantile transformation is non-linear and can distort the linear correlation between variables measured on the same scale. |
handle_unknown_categorical | bool, default = True | New if set to True/The unknown category level of unseen data is replaced with the most frequent or least frequent level trained in the training data. This method is unknown_categorical_It is defined by the method parameter. |
unknown_categorical_method | string, default = 'least_frequent' | A method used to replace an unknown categorical level of invisible data. In the method'least_frequent'Or'most_frequent'Can be set. |
pca | bool, default = False | If set to True, pca_Dimensionality reduction is applied to project the data into lower dimensional space using the method defined by the method parameter. In supervised learning, pca is generally executed when dealing with high feature spaces or when memory is constrained. Note that not all datasets can be efficiently decomposed using linear PCA techniques, and applying PCA can result in information loss. Therefore, different pca to assess its impact_We recommend that you perform multiple experiments using methods. |
pca_method | string, default = 'linear' | The linear method uses singular value decomposition to perform linear dimensionality reduction. Other available options are: |
kernel | Dimensionality reduction using RVF kernel. | |
incremental | If the data set you want to decompose is too large to fit in memory'linear'Replace pca. | |
pca_components | int/float, default = 0.99 | pca_If the components are float, they are treated as a target percentage to retain the information. pca_If components are integers, they are treated as the number of features to be retained. pca_The components must be strictly less than the original features of the dataset. |
ignore_low_variance | bool, default = False | When set to True, all category features with non-statistically significant variances are removed from the dataset. The variance is calculated using the ratio of unique values to the number of samples and the ratio of the most common values to the frequency of the second highest value. |
combine_rare_levels | bool, default = False | If set to True, param rare_level_All levels of categorical features below the threshold defined by threshold are combined as one level. There must be at least two levels below the threshold for this to take effect. rare_level_threshold represents the percentile distribution of level frequency. In general, this method is applied to limit sparse matrices due to the large number of levels in categorical features. |
rare_level_threshold | float, default = 0.1 | Percentile distribution where rare categories are combined. combine_rare_Only enabled if levels is set to True. |
bin_numeric_features | list, default = None | When a list of numerical features is passed, they are converted to categorical features using KMeans. The number of clusters is'sturges'It is determined based on the law. This is only best for Gaussian data and underestimates the number of bins for large non-Gaussian datasets. |
remove_outliers | bool, default = False | When set to True, outliers are removed from the training data using PCA linear dimensionality reduction using singular value decomposition techniques. |
outliers_threshold | float, default = 0.05 | Percentage of outliers in the dataset/Percentage is the parameter outliers_It can be defined using threshold. By default, 0.05 is used. This is 0 for the value of each side of the tail of the distribution.It means that 025 is deleted from the training data. |
remove_multicollinearity | bool, default = False | When set to True, multicollinearity_Variables with cross-correlation higher than the threshold defined by the threshold parameter are removed. If the two features have a high correlation with each other, the feature with the lowest correlation with the target variable is deleted. |
multicollinearity_threshold | float, default = 0.9 | The threshold used to remove the correlated features. remove remove_Only enabled if multicollinearity is set to True. |
remove_perfect_collinearity | bool, default = False | When set to True, perfect collinearity(correlation=Feature of 1)Is removed from the dataset and the two features are 100%correlationしている場合、そのうちの1つがランダムにデータセットから削除されます。 |
create_clusters | bool, default = False | When set to True, it creates additional features where each instance is assigned to a cluster. The number of clusters is Calinski-Determined using a combination of Harabasz and Silhouette criteria. |
cluster_iter | int, default = 20 | The number of iterations used to create the cluster. Each iteration represents the size of the cluster. create create_Only valid if the clusters parameter is set to True. |
polynomial_features | bool, default = False | When set to True, polynomial exists in the numerical features in the dataset_New features are created based on the combination of all polynomials up to the degree defined by degree param. |
polynomial_degree | int, default = 2 | Degree of polynomial features. For example, the input sample is two-dimensional[a, b]In the form of=The polynomial features of 2 are as follows. 1, a, b, a^2, ab, b^2]It becomes. |
trigonometry_features | bool, default = False | If set to True, the polynomial that exists in the numerical features in the dataset_A new feature is created based on the combination of all trigonometric functions up to the order defined by the degree parameter. |
polynomial_threshold | float, default = 0.1 | It is used to compress sparse matrices of polynomial features and trigonometric features. Polynomial and trigonometric features whose importance of features based on a combination of random forest, AdaBoost, and linear correlation fall within the defined threshold percentiles are retained in the dataset. The remaining features are deleted before further processing. |
group_features | list or list of list, default = None | Group if it contains features that have features related to the dataset_featuresparam can be used for statistical feature extraction. For example, numerical features in which datasets are related to each other ('Col1', 'Col2', 'Col3')If you have group_By passing a list containing column names under features, you can extract statistics such as mean, median, mode, and standard deviation. |
group_names | list, default = None | group_When features are passed, group as a list containing strings_You can pass the group name in the names parameter. group_The length of the list of names is group_Must be equal to the length of features. If the lengths don't match or the name isn't passed, group_1, group_New features are named in sequence, such as 2. |
feature_selection | bool, default = False | When set to True, a subset of features are selected using a combination of different sort importance techniques, such as Random Forest, Adaboost, and linear correlation with target variables. The size of the subset is feature_selection_It depends on param. This is commonly used to constrain the feature space to improve modeling efficiency. polynomial_features and features_When using interaction, feature_selection_It is highly recommended to define the threshold parameter with a lower value. |
feature_selection_threshold | float, default = 0.8 | Threshold used for feature selection (including newly created polynomial features). The larger the value, the more features. Features with different values, especially when using polynomial features and inter-feature interactions_selection_We recommend that you use the threshold to make multiple attempts. Setting a very low value is efficient, but can result in underfitting. |
feature_interaction | bool, default = False | When set to True, interacts (a) with all numeric variables in the dataset, including polynomial and trigonometric features (if created).*b) Create a new feature by doing. This feature is not scalable and may not work as expected on datasets with large feature spaces. |
feature_ratio | bool, default = False | When set to True, the ratio of all numeric variables in the dataset (a/b) Calculate to create a new feature. This feature is not scalable and may not work as expected on datasets with large feature spaces. |
interaction_threshold | bool, default = 0.01 | polynomial_Similar to threshold, it is used to compress a newly created sparse matrix of features by interaction. Features whose importance based on a combination of Random Forest, AdaBoost, and Linear Correlation falls within the defined threshold percentiles are stored in the dataset. The remaining features are deleted before further processing. |
transform_target | bool, default = False | When set to True, transform_target_Converts the target variable as defined by the method parameter. Target transformation is applied separately from feature transformation. |
transform_target_method | string, default = 'box-cox' | Box-cox'and'yeo-johnson'The law is supported. Box-Cox requires the input data to be exactly positive, but Yeo-Johnson supports both positive and negative data. transform_target_method is'box-cox'And if the target variable contains a negative value, the method internally to avoid exceptions'yeo-johnson'Forced to. |
data_split_shuffle | bool, default = True | Set to False to prevent rows from being shuffled when splitting the data. |
folds_shuffle | bool, default = True | Set to False to prevent rows from being shuffled when using cross-validation. |
n_jobs | int, default = -1 | Specifies the number of jobs to run in parallel(For functions that support parallel processing)-1 means to use all processors. To execute all functions on a single processor, n_Set jobs to None. |
html | bool, default = True | Set to False to disable the run-time display of the monitor. If you are using an environment that does not support HTML, you need to set it to False. |
session_id | int, default = None | If None, a random seed will be generated and returned to the Information grid. A unique number is then distributed as a seed for all functions used during the experiment. This can be used for reproducibility after the entire experiment. |
log_experiment | bool, default = False | When set to True, all metrics and parameters are recorded on the MLFlow server. |
experiment_name | str, default = None | The name of the experiment to log. If set to None, by default'reg'Is used as an alias for the experiment name. |
log_plots | bool, default = False | When set to True, records a particular plot as a png file in MLflow. The default is set to False. |
log_profile | bool, default = False | If set to True, the data profile will also be recorded in MLflow as an html file. The default is set to False. |
log_data | bool, default = False | When set to True, training and test data will be recorded as csv. |
silent | bool, default = False | If set to True, no data type confirmation is required. All preprocessing is performed assuming an automatically inferred data type. Direct use outside of established pipelines is not recommended. |
verbose | Boolean, default = True | If verbose is set to False, the information grid will not be printed. |
profile | bool, default = False | When set to true, the data profile for exploratory data analysis is displayed in an interactive HTML report. |
Recommended Posts