Building a Big Gorilla environment Try the FlexMatcher sample
――It is a recent standard to use pyenv only to put anaconda and to manage the environment with conda. ―― ~~ (As of July 12, 2017) Environment construction does not go well ~~ --~~ The dependency of the originally published conda environment is broken ~~ -You can download yml locally from ~~ Anaconda Cloud, delete the line that specifies urllib, and install by specifying the file. .. ~~ --Addition: The file has been updated to include the official documentation. --FlexMatcher sample didn't work either --It seems difficult to move without reading the code
Read the Flexmatcher code
Mac OS X 10.11 El Capitan homebrew is already installed Install anaconda using pyenv
pyenv was old, so update Update the version of python managed by pyenv --Qiita
Install anaconda
$ pyenv install anaconda3-4.2.0
$ pyenv global anaconda3-4.2.0
Creating an environment for Big Gorilla. .. ~~ I can't. ~~ 2017/07/21 postscript: It became possible. Below old record
$ conda env create biggorilla/py3gorilla
Collecting urllib==1.21.1
Downloading urllib-1.21.1.tar.gz (226kB)
100% |████████████████████████████████| 235kB 640kB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/bx/k4yrl_bd3nb0v8pz7fm60t8r0000gp/T/pip-build-58rsg5li/urllib/setup.py", line 191
s.connect((base64.b64decode(rip), 017620))
^
SyntaxError: invalid token
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/bx/k4yrl_bd3nb0v8pz7fm60t8r0000gp/T/pip-build-58rsg5li/urllib/
CondaValueError: Value error: pip returned an error.
It's not completely included, but I try to activate it. With source activate Py3 Gorilla, the shell falls. If you are using pyenv, you need to specify the conda activate command with the full path. Note on how to use Conda-Qiita Python environment construction for those who aim to be a data scientist 2016 --Qiita
$ conda info -e
# conda environments:
#
Py3Gorilla /Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/Py3Gorilla
root * /Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0
$ source /Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/Py3Gorilla/activate Py3Gorilla
I tried the Jupyter NoteBook to check the operation, but it says that the Py3 Gorilla kernel cannot be found.
$ anaconda download biggorilla/hi_gorilla
$ jupyter notebook hi_gorilla.ipynb
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-770f0b5370fe> in <module>()
----> 1 import py_stringmatching as sm
2
3 # This notebook imports a package that most users do not have installed
4 # before using BigGorilla. Running the notebook successfully implies the
5 # successful installation of BigGorilla.
ImportError: No module named 'py_stringmatching'
Once you create conda env, it is said that the prefix is registered. To remove it, use conda env remove -n.
$ conda env create biggorilla/py3gorilla
Using Anaconda API: https://api.anaconda.org
CondaValueError: Value error: prefix already exists: /Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/Py3Gorilla
$ conda env remove -n Py3Gorilla
Package plan for package removal in environment /Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/Py3Gorilla:
The following packages will be REMOVED:
openssl: 1.0.2l-0
pip: 9.0.1-py36_1
python: 3.6.1-2
readline: 6.2-2
setuptools: 27.2.0-py36_0
sqlite: 3.13.0-0
tk: 8.5.18-0
wheel: 0.29.0-py36_0
xz: 5.2.2-1
zlib: 1.2.8-3
Proceed ([y]/n)? y
Unlinking packages ...
[ COMPLETE ]|###############################################################################| 100%
~~ When I tried it as of July 12, 2017, I got the following error with this method and did not enter. (It seems that the older yml is applied, probably because the file name updated in June is strange. Probably it will be fixed by the update from now on) ~~
Addendum: The file has been updated to include the official documentation.
$ conda env create biggorilla/py3gorilla
Collecting urllib==1.21.1
Downloading urllib-1.21.1.tar.gz (226kB)
100% |████████████████████████████████| 235kB 640kB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/bx/k4yrl_bd3nb0v8pz7fm60t8r0000gp/T/pip-build-58rsg5li/urllib/setup.py", line 191
s.connect((base64.b64decode(rip), 017620))
^
SyntaxError: invalid token
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/bx/k4yrl_bd3nb0v8pz7fm60t8r0000gp/T/pip-build-58rsg5li/urllib/
CondaValueError: Value error: pip returned an error.
You can install it by downloading yml from Files :: Anaconda Cloud and removing the line that specifies urllib. The newer yml can also be included, but the flexmatcher version is old (degreased?)
#Erase the environment that was once halfway
$ conda env remove -n Py3Gorilla
#Recreate the environment by specifying the locally modified yml file
$ vim ~/Downloads/Py3Gorilla.yml //Delete the urllib line
$ conda env create --name test --file ~/Downloads/Py3Gorilla.yml
#If you are using pyenv, you need to specify the conda activate command with the full path. With source activate Py3 Gorilla, the shell falls.
$ source /Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/test/bin/activate test
#Drop the notebook for operation check and start it
$ anaconda download biggorilla/hi_gorilla
$ jupyter notebook hi_gorilla.ipynb
Next, I tried the flexmatcher sample.
Sample code is attached, so copy the source and paste it into the jupyter notebook.
As a result of trying, I found that it did not work due to an error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-34cd037abc3a> in <module>()
27 mapping_list = [data1_mapping, data2_mapping]
28 fm.create_training_data(schema_list, mapping_list)
---> 29 fm.train()
30
31 # Creating a test schmea
/Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/test/lib/python3.5/site-packages/flexmatcher/flexmatcher.py in train(self)
27 The class considers panda dataframes as databases and their column names as
28 the schema. FlexMatcher learn to do schema matching by training on
---> 29 instances of dataframes and how their columns are matched against the
30 mediated schema.
31
/Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/test/lib/python3.5/site-packages/flexmatcher/flexmatcher.py in <listcomp>(.0)
27 The class considers panda dataframes as databases and their column names as
28 the schema. FlexMatcher learn to do schema matching by training on
---> 29 instances of dataframes and how their columns are matched against the
30 mediated schema.
31
/Users/kkanazaw/.pyenv/versions/anaconda3-4.2.0/envs/test/lib/python3.5/site-packages/flexmatcher/classify.py in predict_training(self, folds)
TypeError: 'float' object cannot be interpreted as an integer
Recommended Posts