I investigated the pretreatment that can be done with PyCaret

I would like to share what I have learned while making appropriate corrections and additions.

About this document

Focusing on PyCaret's ** preprocessing **. Basically, it doesn't touch on modeling and tuning.

While actually moving it, I am writing while reading the original source code. https://github.com/pycaret/pycaret

Implementation assumptions

It is assumed that various libraries are imported as follows.

import pandas as pd
import numpy as np

What is PyCaret

A library that automates data preprocessing and machine learning model training and can be deployed in a low-code environment. https://pycaret.org/

Installation is a single pip command. Very easy. ..

pip install pycaret

You can refer to this article for an overview and how to implement a series of pipelines. https://qiita.com/tani_AI_Academy/items/62151d7e151024733919

How to execute preprocessing

In PyCaret, you can specify the preprocessing you want to execute with parameters. In addition, PyCaret confirms with the user about some processing contents before operation. The operation flow is as follows.

Call the data entry / preprocessing execution function

By calling setup () of the package prepared for each task such as classification and regression, the following preprocessing is executed.

I want PyCaret to process ** Preprocessing can be specified by giving it as an argument to setup () **. Only "target (target variable)" is required as an argument.

In the following explanation, I would like to acquire and execute the data attached to PyCaret. You can check the data attached to PyCaret on the original page. https://pycaret.org/get-data/

The code to perform data acquisition and preprocessing is as follows. Here, only the argument "target" is specified. Other options default.

from pycaret.datasets import get_data
dataset = get_data("diamond")

from pycaret.regression import *
setup(dataset, target="Price")

Check the estimation result of the type of each variable

When you run setup (), ** PyCaret first estimates the type (Data Type) of each variable and prompts the user to check the estimation result and continue processing **. If the type estimation result is correct, press the Enter key in the edit box in the blue frame of the figure to continue the process. If the inferred type is incorrect, you can interrupt the process by typing "quit".


Variables with incorrect type estimation can be resolved by explicitly specifying the type in setup (). (For details, see [Contents described below](Numeric Features, Categorical Features).)

Check the execution summary of preprocessing

When the execution of setup () is completed, the processing contents are output in the data frame format.

Description Value
0 session_id 3104
1 Transform Target False
2 Transform Target Method None
3 Original Data (6000, 8)
4 Missing Values False
5 Numeric Features 1
6 Categorical Features 6
7 Ordinal Features False
8 High Cardinality Features False
9 High Cardinality Method None
10 Sampled Data (6000, 8)
11 Transformed Train Set (4199, 28)
12 Transformed Test Set (1801, 28)
13 Numeric Imputer mean
14 Categorical Imputer constant
15 Normalize False
16 Normalize Method None
17 Transformation False
18 Transformation Method None
19 PCA False
20 PCA Method None
21 PCA Components None
22 Ignore Low Variance False
23 Combine Rare Levels False
24 Rare Level Threshold None
25 Numeric Binning False
26 Remove Outliers False
27 Outliers Threshold None
28 Remove Multicollinearity False
29 Multicollinearity Threshold None
30 Clustering False
31 Clustering Iteration None
32 Polynomial Features False
33 Polynomial Degree None
34 Trignometry Features False
35 Polynomial Threshold None
36 Group Features False
37 Feature Selection False
38 Features Selection Threshold None
39 Feature Interaction False
40 Feature Ratio False
41 Interaction Threshold None

From this table, you can ** check the data size, the number of features, and whether or not various preprocessing is specified **. By default, most options are disabled (False or None).

If you specify an option with the argument of setup (), the corresponding item becomes "True" and is colored.

In the following sections, we will explain the contents of various items.

Information about the session


Description Value
0 session_id 3104

It is a runtime identifier of PyCaret, and it seems that it is used internally as a seed for random numbers. If not specified, it will be determined randomly.

It can be specified by the argument "session_id" of setup (). Specify this value to maintain reproducibility during repeated execution. (It's an image close to "random_state" in scikit-learn.)

setup(dataset, target="Price", session_id=123)

Information about input data

Original Data

Description Value
3 Original Data (6000, 8)

The size (shape) of the input data is output.

When I actually check it, it is certainly the same size.


#Execution result
# (6000, 8)

Missing Values

Description Value
4 Missing Values False

Whether or not the input data is missing is output. Since this data does not contain any defects, "False" is output.

If there is a defect, this item will be "True".

If there is a defect, the defect will be filled in ** setup () **. The specification of the defect filling method will be described later.

Numeric Features and Categorical Features

Description Value
5 Numeric Features 1
6 Categorical Features 6

Estimated values for the number of continuous values and the number of features in the category are output, respectively.

It can be explicitly specified by the arguments "numeric_features" and "categorical_features" of setup ().

setup(dataset, target="Price",
        categorical_features=["Cut", "Color", "Clarity", "Polish", "Symmetry", "Report"], 
        numeric_features=["Carat Weight"])

In the above-mentioned ** PyCaret type inference confirmation dialog, if there is a variable whose type inference is incorrect, specify it explicitly with this argument. ** **

Information about data partitioning of train / test

Transformed Train Set、Transformed Test Set

Description Value
11 Transformed Train Set (4199, 28)
12 Transformed Test Set (1801, 28)

Each size after division is output to train / test data. The ratio of train / test data division can be specified by the argument "train_size" of setup (). The default is 0.7.

The number of columns is different from the input data because the number of features after preprocessing is displayed. (This time, the number of features has increased from 7 to 28 due to pretreatment.)

Information about data sampling

Sampled Data

Description Value
10 Sampled Data (6000, 8)

When data is sampled in `` setup () `, the number of data after sampling is output. ** PyCaret encourages you to sample data and perform a series of operations when the number of rows of data is greater than 25,000. ** **

If you execute setup () on data with more than 25,000 rows, the sampling execution confirmation dialog will be displayed after the type inference confirmation dialog is executed. When sampling, enter the percentage of data to be sampled in the edit box in the blue frame. If you want to use the full number of lines without sampling, leave it blank and press the Enter key.

(For regression tasks) image.png

(For classification tasks) image.png

The graph drawn here shows an indication of the deterioration of accuracy due to sampling.

The model used for this plot can be specified by the argument "sample_estimator" of setup (). For example, the code to specify RandomForestRegressor is below.

from sklearn.ensemble import RandomForestRegressor

traffic = get_data("traffic")
setup(traffic, target="traffic_volume", sample_estimator=RandomForestRegressor())

In addition, this function itself can be turned off by specifying the argument "sampling" of setup (). (It is not confirmed whether sampling is executed or not, and processing is continued using all data.)

(Other) Methods related to data cleaning and feature conversion processing

For other items, it is information about whether or not data cleaning and feature conversion processing is executed and the method. In the next chapter, we will explain the corresponding processes.

Data cleaning and feature conversion process

We will consider the processing content and the specification method.

The data after preprocessing is returned as the return value of setup ()

The pre-processed data and processing pipeline will be returned. It seems that it depends on the type of task you want to solve.


X, y, X_train, X_test, y_train, y_test, seed, prep_pipe, target_inverse_transformer, experiment__ \
    = setup(dataset, target="Price")


from pycaret.classification import *

dataset = get_data('credit')
X, y, X_train, X_test, y_train, y_test, seed, prep_pipe, experiment__ \
    = setup(dataset, target = 'default')

The return value is slightly different between regression and classification. ** The data after preprocessing is returned to X and y **, so you can check the specific processing result.

Is it possible to further process the data after pre-processing by PyCaret and reset it to PyCaret? Is currently unknown.

Feature exclusion

You can set the features to be excluded in preprocessing and subsequent modeling.


It can be executed by giving the following argument to setup ().


** ID and date (datetime) seem to be set to exclude ** at the time of modeling by default. If the date column is not recognized as a date, it seems that you can explicitly specify it with the argument "date_features".

Also, although the correct specifications are being confirmed, even if there are columns with exactly the same data, one will be automatically excluded.

Filling the deficiency

Interpolates the defects in the specified way.


It can be executed by giving the following argument to setup ().


At the moment, it is not possible to specify for each column, and it seems that all are processed by a unified method.

Sequence data encoding

Label conversion is performed by specifying the column you want to define as ordinal data.


It can be executed by giving the following argument to setup ().

Specify with the following image. ordinal_features = { 'column_name' : ['low', 'medium', 'high'] }

In the value part of the dictionary data, specify the values in ascending order of the order data.

Feature normalization

Normalize each feature.


It can be executed by giving the following argument to setup ().


You can refer to this article for'robust'scaling. https://qiita.com/unhurried/items/7a79d2f3574fb1d0cc27

If the dataset contains outliers, the'robust' scaling seems to be strong.

For other scaling, this article will be helpful. https://qiita.com/Umaremin/items/fbbbc6df11f78532932d

In general, linear algorithms tend to be more accurate when normalized, but this is not always the case and may require multiple experiments.

Integration of rare values in categorical variables

In the categorical variable, the categories that are less than the specified threshold are merged into one category.


It can be executed by giving the following argument to setup ().


In general, this technique avoids cases where a sparse matrix is created by making a dummy variable when there are a large number of categories in the categorical variable.

Binning of numerical data

Bins the features of numerical data.


It can be executed by giving the following argument to setup ().


Internally, it's an image that runs sklearn.preprocessing.KBinsDiscretizer. (It seems that an algorithm using the one-dimensional k-means method is used.)

Removal of outliers

Remove outliers from train data.


It can be executed by giving the following argument to setup ().


It seems that singular value decomposition and PCA are used for internal processing.

Removal of multicollinearity

Removes features that can cause multicollinearity.


It can be executed by giving the following argument to setup ().


For multicollinearity, this article will be helpful. https://qiita.com/ynakayama/items/36b7c1640e6a02ce2e00

Feature quantification of class ring results

Clustering is performed using each feature, and the class label of each record is added as a new feature.


It can be executed by giving the following argument to setup ().


The number of clusters appears to be determined using a combination of Calinski Harabasz and silhouette criteria.

For more information on Calinski Harabasz and silhouette standards, this article will help. https://qiita.com/yasaigirai/items/ec3c3aaaab5bc9b930a2

Removal of features by data distribution

Remove features with variances that are not statistically significant.


It can be executed by giving the following argument to setup ().


The data variance here seems to be calculated using the ratio of unique values (unique values) in all samples. Is it an image that is a candidate for exclusion because the more "same value" is in a variable, the lower the variance is considered?

Generation of interaction features

Generates interaction features using the specified parameters.


It can be executed by giving the following argument to setup ().

For example, if the input is two variables [a, b] and polynomial_degree = 2 is specified, the feature quantity [1, a, b, a ^ 2, ab, b ^ 2] will be generated.

In addition to the above, you can also specify the interaction features. Generates first-order interaction features for all numeric data features, including dummy variable features for categorical variables and features generated by polynomial_features and trigonometry_features.


About polynomial_threshold and interaction_threshold Indicators to compare with thresholds are like importance based on multiple combinations such as Random Forest, AdaBoost, and Linear Correlation.

About trigonometry_features, do you literally make features using trigonometric functions (sin, cos, tan)? Is it?

Please note that this function may be inefficient for datasets with a large feature space.

Generation of group features

By specifying related features in the dataset, statistical features based on them are extracted. The following aggregated values between the specified features are calculated to generate a new feature.



It can be executed by giving the following argument to setup ().

The implementation image is as follows.

setup(dataset, target="Price", group_features=[["cal1", "cal2" "cal3"], ["cal4", "cal5"]], group_names=["gr1", "gr2"])

Execution of feature selection

Select features using several evaluation indicators.


It can be executed by giving the following argument to setup ().


About feature_selection_threshold Indicators to compare with thresholds are like importance based on multiple combinations such as Random Forest, AdaBoost, and Linear Correlation.

According to the original source comment, when using polynomial_features and feature_interaction, it is better to define this parameter with a low value. Is it an image that the features created by interaction should be narrowed down to some extent in this process?

Reduction of high cardinality features

Specifying a column with high cardinality reduces the data types in the column and lowers the cardinality.


It can be executed by giving the following argument to setup ().


In the'clustering'method, k-means is used for a quick look at the source of the original family.

Feature scaling

Scales features according to the specified method.


It can be executed by giving the following argument to setup ().


Both'yeo-johnson'and'quantile' seem to transform the data to follow a normal distribution.

After checking the original code,'yeo-johnson' uses sklearn.preprocessing.PowerTransformer, and'quantile'uses sklearn.preprocessing.QuantileTransformer.

In general, bringing features closer to a normal distribution can be useful during modeling. According to the original source comment,'quantile' is non-linear and it should be noted that it may distort the linear correlation between variables measured on the same scale.

Objective variable scaling

Scales the objective variable by the specified method.


It can be executed by giving the following argument to setup ().


Bringing the objective variable closer to a normal distribution can be useful during modeling.

Box-Cox conversion has a restriction that all data are positive values, so if the data contains negative values, it seems to forcibly switch to Yeo-Johnson conversion.

Dimensionality reduction of features

Dimensionality reduction of features is performed.


It can be executed by giving the following argument to setup ().


Generally, it is carried out for the purpose of removing unimportant features and saving memory and CPU resources.

This process (dimension reduction) seems to be executed at the end of the preprocessing pipeline. (Dimension reduction is performed for the data after other preprocessing is completed.)

This article will be helpful for the main component analysis. https://qiita.com/shuva/items/9625bc326e2998f1fa27 https://qiita.com/NoriakiOshita/items/460247bb57c22973a5f0

For'incremental', it seems to use a method called Incremental PCA. According to scikit-learn's explanation, if the target data set is too large to fit in memory, it is better to use Incremental PCA (IPCA) instead of Principal Component Analysis (PCA). IPCA uses a memory amount that does not depend on the number of input data to create a low-dimensional approximation of the input data. https://scikit-learn.org/stable/auto_examples/decomposition/plot_incremental_pca.html

Implementation sample

Make a large amount of features

from pycaret.regression import *
X, y, X_train, X_test, y_train, y_test, seed, prep_pipe, target_inverse_transformer, experiment__ \
    =  setup(dataset, target="Price", session_id=123, 
             bin_numeric_features = ["Carat Weight"],
             create_clusters = True,
             polynomial_features = True,  feature_interaction = True,  feature_ratio = True)

The execution contents (excerpt) output from setup () are as shown in the figure below.

Checking the returned preprocessed data, 72 features were generated as shown below.


#Output result
# <class 'pandas.core.frame.DataFrame'>
# Int64Index: 6000 entries, 0 to 5999
# Data columns (total 72 columns):
#  #   Column                                            Non-Null Count  Dtype  
# ---  ------                                            --------------  -----  
#  0   Carat Weight_Power2                               6000 non-null   float64
#  1   Cut_Fair                                          6000 non-null   float64
#  2   Cut_Good                                          6000 non-null   float64
#  3   Cut_Ideal                                         6000 non-null   float64
#  4   Cut_Signature-Ideal                               6000 non-null   float64
#  5   Cut_Very Good                                     6000 non-null   float64
#  6   Color_D                                           6000 non-null   float64
#  7   Color_E                                           6000 non-null   float64
#  8   Color_F                                           6000 non-null   float64
#  9   Color_G                                           6000 non-null   float64
#  10  Color_H                                           6000 non-null   float64
#  11  Color_I                                           6000 non-null   float64
#  12  Clarity_FL                                        6000 non-null   float64
#  13  Clarity_IF                                        6000 non-null   float64
#  14  Clarity_SI1                                       6000 non-null   float64
#  15  Clarity_VS1                                       6000 non-null   float64
#  16  Clarity_VS2                                       6000 non-null   float64
#  17  Clarity_VVS1                                      6000 non-null   float64
#  18  Clarity_VVS2                                      6000 non-null   float64
#  19  Polish_EX                                         6000 non-null   float64
#  20  Polish_G                                          6000 non-null   float64
#  21  Polish_ID                                         6000 non-null   float64
#  22  Polish_VG                                         6000 non-null   float64
#  23  Symmetry_EX                                       6000 non-null   float64
#  24  Symmetry_G                                        6000 non-null   float64
#  25  Symmetry_ID                                       6000 non-null   float64
#  26  Symmetry_VG                                       6000 non-null   float64
#  27  Report_GIA                                        6000 non-null   float64
#  28  Carat Weight_0.0                                  6000 non-null   float64
#  29  Carat Weight_1.0                                  6000 non-null   float64
#  30  Carat Weight_10.0                                 6000 non-null   float64
#  31  Carat Weight_11.0                                 6000 non-null   float64
#  32  Carat Weight_12.0                                 6000 non-null   float64
#  33  Carat Weight_13.0                                 6000 non-null   float64
#  34  Carat Weight_2.0                                  6000 non-null   float64
#  35  Carat Weight_3.0                                  6000 non-null   float64
#  36  Carat Weight_4.0                                  6000 non-null   float64
#  37  Carat Weight_5.0                                  6000 non-null   float64
#  38  Carat Weight_6.0                                  6000 non-null   float64
#  39  Carat Weight_7.0                                  6000 non-null   float64
#  40  Carat Weight_8.0                                  6000 non-null   float64
#  41  Carat Weight_9.0                                  6000 non-null   float64
#  42  data_cluster_0                                    6000 non-null   float64
#  43  Polish_EX_multiply_Carat Weight_Power2            6000 non-null   float64
#  44  Symmetry_EX_multiply_Carat Weight_Power2          6000 non-null   float64
#  45  Report_GIA_multiply_Carat Weight_Power2           6000 non-null   float64
#  46  Clarity_VVS2_multiply_Carat Weight_Power2         6000 non-null   float64
#  47  Clarity_IF_multiply_Carat Weight_Power2           6000 non-null   float64
#  48  Clarity_SI1_multiply_Carat Weight_Power2          6000 non-null   float64
#  49  Carat Weight_Power2_multiply_data_cluster_0       6000 non-null   float64
#  50  Symmetry_EX_multiply_data_cluster_0               6000 non-null   float64
#  51  Report_GIA_multiply_data_cluster_0                6000 non-null   float64
#  52  Symmetry_VG_multiply_Carat Weight_Power2          6000 non-null   float64
#  53  Carat Weight_8.0_multiply_Carat Weight_Power2     6000 non-null   float64
#  54  Cut_Signature-Ideal_multiply_Carat Weight_Power2  6000 non-null   float64
#  55  data_cluster_0_multiply_Symmetry_EX               6000 non-null   float64
#  56  Color_E_multiply_Carat Weight_Power2              6000 non-null   float64
#  57  data_cluster_0_multiply_Cut_Ideal                 6000 non-null   float64
#  58  Carat Weight_Power2_multiply_Polish_EX            6000 non-null   float64
#  59  data_cluster_0_multiply_Report_GIA                6000 non-null   float64
#  60  Color_F_multiply_Carat Weight_Power2              6000 non-null   float64
#  61  Carat Weight_Power2_multiply_Carat Weight_8.0     6000 non-null   float64
#  62  Cut_Ideal_multiply_Carat Weight_Power2            6000 non-null   float64
#  63  Color_D_multiply_Carat Weight_Power2              6000 non-null   float64
#  64  data_cluster_0_multiply_Carat Weight_Power2       6000 non-null   float64
#  65  data_cluster_0_multiply_Polish_EX                 6000 non-null   float64
#  66  Color_I_multiply_Carat Weight_Power2              6000 non-null   float64
#  67  Polish_EX_multiply_data_cluster_0                 6000 non-null   float64
#  68  Color_H_multiply_Carat Weight_Power2              6000 non-null   float64
#  69  Carat Weight_Power2_multiply_Report_GIA           6000 non-null   float64
#  70  Clarity_VS2_multiply_Carat Weight_Power2          6000 non-null   float64
#  71  Carat Weight_Power2_multiply_Symmetry_VG          6000 non-null   float64
# dtypes: float64(72)
# memory usage: 3.3 MB

Checking the returned preprocessing pipeline is as follows.


#Execution result
# Pipeline(memory=None,
#          steps=[('dtypes',
#                  DataTypes_Auto_infer(categorical_features=[],
#                                       display_types=True, features_todrop=[],
#                                       ml_usecase='regression',
#                                       numerical_features=[], target='Price',
#                                       time_features=[])),
#                 ('imputer',
#                  Simple_Imputer(categorical_strategy='not_available',
#                                 numeric_strategy='mean',
#                                 target_variable=None)),
#                 ('new_levels1',
#                  New_Catagorical_Levels_i...
#                 ('dummy', Dummify(target='Price')),
#                 ('fix_perfect', Remove_100(target='Price')),
#                 ('clean_names', Clean_Colum_Names()),
#                 ('feature_select', Empty()), ('fix_multi', Empty()),
#                 ('dfs',
#                  DFS_Classic(interactions=['multiply', 'divide'],
#                              ml_usecase='regression', random_state=123,
#                              subclass='binary', target='Price',
#                              top_features_to_pick_percentage=None)),
#                 ('pca', Empty())],
#          verbose=False)


** PyCaret can perform various data cleaning and feature conversion processing with simple code ** PyCaret was able to describe various pre-processing just by specifying the parameters, and I felt that it would lead to a significant time saving. I also thought that the code would be cleaner and more unified, which would improve readability and thinking efficiency for the team and myself.

** Understanding the preprocessing that can be done with PyCaret also leads to studying various techniques ** PyCaret is relatively easy to make even for those who are not good at coding. I thought that it would be a good tool for beginners who had been stumbling in coding until now, to focus on learning the theory while actually moving it. (I myself learned a lot of techniques I didn't know before while conducting this survey.)

** On the other hand (at the moment) PyCaret is just a tool for efficiency ** PyCaret only performs cleaning and feature conversion processing based on the data input by the user, and I realized that hypothesis making, data collection, and feature design must still be done manually. It's done.

Recommended Posts

I investigated the pretreatment that can be done with PyCaret
It seems that Skeleton Tracking can be done with RealSense
I made a shuffle that can be reset (reverted) with Python
I tried to expand the database so that it can be used with PES analysis software
I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory
I made a simple timer that can be started from the terminal
File types that can be used with Go
List packages that can be updated with pip
I tried to summarize the operations that are likely to be used with numpy-stl
Color list that can be set with tkinter (memorial)
Python knowledge notes that can be used with AtCoder
Limits that can be analyzed at once with MeCab
List the classes that can be referenced by ObjCClass
Morphological analysis and tfidf (with test code) that can be done in about 1 minute
I investigated the problem that I could not get more than 101 images with google images download
The story that sendmail that can be executed in the terminal did not work with cron
[Python] I examined the practice of asynchronous processing that can be executed in parallel with the main thread (multiprocessing, asyncio).
Format summary of formats that can be serialized with gensim
Why can I use the module by importing with python?
Goroutine (parallel control) that can be used in the field
Text analysis that can be done in 5 minutes [Word Cloud]
Goroutine that can be used in the field (errgroup.Group edition)
[Atcoder] [C ++] I made a test automation tool that can be used during the contest
Calibrate the model with PyCaret
I made a plug-in that can "Daruma-san fell" with Minecraft
Let's make a diagram that can be clicked with IPython
[Flask] I tried to summarize the "docker-compose configuration" that can be created quickly for web applications
Understand the probabilities and statistics that can be used for progress management with a python program
About the matter that torch summary can be really used when building a model with Pytorch
[Python] A program that finds the maximum number of toys that can be purchased with your money
Predict the number of cushions that can be received as laughter respondents with Word2Vec + Random Forest
[Python] Make a graph that can be moved around with Plotly
[Python] I made my own library that can be imported dynamically
I made a package that can compare morphological analyzers with Python
A timer (ticker) that can be used in the field (can be used anywhere)
I investigated the X-means method that automatically estimates the number of clusters
Make a currency chart that can be moved around with Plotly (2)
Comparison of 4 styles that can be passed to seaborn with set_context
Make a Spinbox that can be displayed in HEX with Tkinter
Python standard module that can be used on the command line
Make a currency chart that can be moved around with Plotly (1)
I made a program that automatically calculates the zodiac with tkinter
I investigated the mechanism of flask-login!
requirements.txt can be commented out with #
I investigated how the scope looks
I liked the tweet with python. ..
Confirmation that rkhunter can be installed
I investigated the device tree Overlay
A memo when creating an environment that can be debugged with Lambda @ Edge for the time being
[Python] Code that can be written with brain death at the beginning when scraping as a beginner
EXCEL data bar and color scale can also be done with pandas
About the matter that the re.compiled object can be used for the re.match pattern
I created a template for a Python project that can be used universally
Acoustic signal processing module that can be used with Python-Sounddevice ASIO [Application]
The LXC Web Panel that can operate LXC with a browser was wonderful
Hide the warning that zsh can be used by default on Mac
Create a web app that can be easily visualized with Plotly Dash
Serverless LINE Bot that can be done in 2 hours (source identifier acquisition)
[Can be done in 10 minutes] Create a local website quickly with Django
Mathematical optimization that can be used for free work with Python + PuLP
Measures that pip install cannot be done with pycharm or import ssl cannot be done