Memorandum of understanding for environment construction of AutoML library PyCaret

Install pandas-profiling to avoid errors

Computer environment

OS:Ubuntu 18.04LTS

Anaconda3 Creating a virtual environment

`bash`


$ conda create -n pycaret python=3.6.10

Install pyCaret under conda virtual environment

Install with pip according to Manual

`bash`


$ conda activate pycaret
(pycaret)$ pip install pycaret
(pycaret)$ python -m ipykernel install --user --name pycaret --display-name "display-name-here"

However, after installing it recently, when I execute the following command with jupyter notebook, it starts to throw an error.

`python`


from pycaret.datasets import get_data
dataset = get_data('credit', profile=True)

This is a command to download from PyCaret's data respository with get_data, and the original tutorial didn't give the argument profile = True. In other words, it is executed with the default argument profile = False. * In this case, only the first 5 lines of data are displayed *.

On the other hand, if you give the argument profile = True, it will be output in the format of pandas profiling report. You can check the basic statistics and correlation coefficient of DataFrame all at once, but you don't have to bother with ʻimport pandas_profiling`.

However, if I installed using pip install pycaret at different times, I got an error with profile = True, probably because the subversions of some packages were different, so requirements.txt I'm installing using.

** Place the requirements.txt file separately in the directory where the virtual environment is started, and install it with pip **

`bash`


$ conda activate pycaret
(pycaret)$ pip install -r requirements.txt
(pycaret)$ python -m ipykernel install --user --name pycaret --display-name "display-name-here"

Describe the following in requirements.txt.

astropy==4.0.1.post1
attrs==19.3.0
awscli==1.18.64
backcall==0.1.0
bleach==3.1.5
blis==0.4.1
boto==2.49.0
boto3==1.13.14
botocore==1.16.14
catalogue==1.0.0
catboost==0.20.2
certifi==2020.4.5.1
chardet==3.0.4
chart-studio==1.1.0
click==7.1.2
colorama==0.4.3
colorlover==0.3.0
combo==0.1.0
confuse==1.1.0
cufflinks==0.17.0
cycler==0.10.0
cymem==2.0.3
datefinder==0.7.0
DateTime==4.3
decorator==4.4.2
defusedxml==0.6.0
docutils==0.15.2
entrypoints==0.3
funcy==1.14
future==0.18.2
gensim==3.8.3
graphviz==0.14
htmlmin==0.1.12
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.3.0
ipython==7.14.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.0
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.15.1
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
kiwisolver==1.2.0
kmodes==0.10.1
lightgbm==2.3.1
llvmlite==0.32.1
MarkupSafe==1.1.1
matplotlib==3.2.1
missingno==0.4.2
mistune==0.8.4
mlxtend==0.17.2
more-itertools==8.3.0
murmurhash==1.0.2
nbconvert==5.6.1
nbformat==5.0.6
nltk==3.5
notebook==6.0.3
numba==0.49.1
numexpr==2.7.1
numpy==1.18.4
packaging==20.4
pandas==1.0.3
pandas-profiling==2.3.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
phik==0.9.12
pickleshare==0.7.5
Pillow==7.1.2
plac==1.1.3
plotly==4.4.1
pluggy==0.13.1
preshed==3.0.2
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
py==1.8.1
pyasn1==0.4.8
pycaret==1.0.0
Pygments==2.6.1
pyLDAvis==2.1.2
pyod==0.7.9
pyparsing==2.4.7
pyrsistent==0.16.0
pytest==5.4.2
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.3.1
pyzmq==19.0.1
regex==2020.5.14
requests==2.23.0
retrying==1.3.3
rsa==3.4.2
s3transfer==0.3.3
scikit-learn==0.22
scipy==1.4.1
seaborn==0.10.1
Send2Trash==1.5.0
shap==0.32.1
six==1.14.0
smart-open==2.0.0
spacy==2.2.4
srsly==1.0.2
suod==0.0.4
tbb==2020.0.133
terminado==0.8.3
testpath==0.4.4
textblob==0.15.3
thinc==7.4.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
umap-learn==0.4.3
urllib3==1.25.9
wasabi==0.6.0
wcwidth==0.1.9
webencodings==0.5.1
widgetsnbextension==3.5.1
wordcloud==1.7.0
xgboost==0.90
yellowbrick==1.0.1
zipp==3.1.0
zope.interface==5.1.0

If you want to read the data with pandas and output the pandas profiling report, do the following.

`python`


import pandas as pd
import numpy as np

df = pd.read_csv('/path/to/data.csv',sep=",", encoding="utf-8")

import pandas_profiling

pandas_profiling.ProfileReport(df)