I thought about scraping easily with python and decided to use scrapy, but which one should I use to set up python in the first place? I was worried about it, and scrapy didn't work properly, so I got stuck in some cases, so I'll write a memo about how to solve it.
There are various tools in the python package and environment, and I'm wondering which one to use. I chose the following environment because I want to set the environment simply with as few tools as possible.
pyenv + anaconda
Use pyenv for python version control and anaconda's tool, conda, for package and virtual environment management. The following articles were very helpful for this selection.
Python environment construction for those who aim to become data scientists 2016
This article is for Mac (OS X) only, as we currently only use Mac for development.
The Mac comes with python 2.x as standard, but it seems that updating this will fit, so leave the system python as it is and install the python you want to use separately with pyenv. Also, it's unlikely that you'll rarely use the various python libraries, so it's convenient to use anaconda, which is a complete package of the major libraries. If anaconda is too big, there is also a minimalized miniconda.
It is said that installing anaconda directly on Mac will conflict with homebrew (see the above article), so it is better to use pyenv, and various packages other than anaconda can be easily used.
First, use homebrew to install pyenv.
$ brew install pyenv
$ echo 'export PYENV_ROOT="${HOME}/.pyenv"' >> ~/.bash_profile
$ echo 'export PATH="${PYENV_ROOT}/bin:$PATH"' >> ~/.bash_profile
$ echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
$ exec $SHELL -l
The packages that can be installed with pyenv
$ pyenv install -l
You can see it at. For anaconda, at the moment, the latest versions of python2 and python3 are
anaconda2-4.3.0
anaconda3-4.3.0
It has become. Here, install anaconda3-4.3.0.
$ pyenv install anaconda3-4.3.0
It takes a long time to install, so please take a coffee break.
The above command alone will not allow you to use the installed python (anaconda). To be able to use it throughout the system
$ pyenv global anaconda3-4.3.0
will do. You can check the current settings with the following command.
$ pyenv version
anaconda3-4.3.0 (set by /Users/tetsuo/.pyenv/version)
Also, seeing the python packages downloaded to your Mac (managed by pyenv) is
$ pyenv versions
system
3.6.0
* anaconda3-4.3.0 (set by /Users/tetsuo/.pyenv/version)
will do. If you start python in this state,
$ python
Python 3.6.0 |Anaconda 4.3.0 (x86_64)| (default, Dec 23 2016, 13:19:00)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
And you can see that python3 of the Anaconda package works.
You can display the list of libraries installed in Anaconda with the following command.
$ conda list
# packages in environment at /Users/tetsuo/.pyenv/versions/anaconda3-4.3.0:
#
_license 1.1 py36_1
alabaster 0.7.9 py36_0
anaconda 4.3.0 np111py36_0
anaconda-client 1.6.0 py36_0
anaconda-navigator 1.4.3 py36_0
appnope 0.1.0 py36_0
|
Unfortunately scrapy is not installed, so install it separately. Scrapy installation Normally pip is used, but since I want to manage it with conda, install it with conda instead of pip. First, check if conda has scrapy.
$ conda search scrapy
Fetching package metadata .........
scrapy 0.16.4 py26_0 defaults
0.16.4 py27_0 defaults
0.24.4 py27_0 defaults
1.0.1 py27_0 defaults
1.0.3 py27_0 defaults
1.1.1 py27_0 defaults
1.1.1 py34_0 defaults
1.1.1 py35_0 defaults
1.1.1 py36_0 defaults
Scrapy is also registered, so you can install it with conda. However, when you install the latest version of scrapy, the latest version of the twisted library used internally will also be installed. But a scrapy bug? So, if you use twisted after version 16.6, during scraping
TypeError: 'float' object is not iterable
And stop.
Reference TypeError:'float' object is not iterable (on Twisted dev + Scrapy dev)
Therefore, when you install scrapy, you also specify the twisted version (you can also find out which twisted version you can use with `conda search twisted`
).
$ conda install scrapy twisted=16.6.0
You have now installed scrapy as well.
$ scrapy version
Scrapy 1.1.1
Now you have an environment that uses python and scrapy with just pyenv and anaconda. When I use python once in a while, I can't remember how to use many tools, and it takes time to remember the environment. In addition, system global settings are required to use the support functions of programs such as linter in the editor. So I searched for a method that can be easily built with as few tools as possible, and created an environment that can handle everything from python version control to virtual environments with only two tools, pyenv and conda. Now the support function of the editor is working without any problem.
However, when using the virtual environment of conda, there is a problem that the activate command conflicts with pyenv and does not work. This is also easy to solve, so I'm happy with this environment for now. Reference 3 types of workarounds for activate collision problem when pyenv and anaconda coexist
Recommended Posts