This time, we will build an environment for data analysis in Python on a virtual environment. Specifically, the following are used.
name | Description |
---|---|
VirtualBox | Virtual machine execution environment |
Vagrant | Tools for managing virtual machines from the console |
IPython (+notebook) | Python development&Execution environment |
Pandas | Library for analysis |
VirtualBox is software for virtualizing x86 virtualization (ordinary PCs / servers lying around). The official name is Oracle VM VirtualBox. Currently being developed by Oracle.
A very useful tool for experimenting with various things without affecting the existing environment.
Vagrant is a tool that makes it easier to manage virtual environments from the console. You can also easily build a test environment by using Box created by volunteers.
Introducing this often saves time and effort when building various environments.
IPython is a significant extension of the existing Python interactive interpreter. Completion function at the time of input, parallel processing in cluster environment, command line shell function, Extensions such as toolkits around the GUI have been made.
Very useful as an interactive interpreter for ad hoc analysis.
IPython notebook is IPython made available from a web browser. Convenient for GUI-related parts, especially graph plots.
It is possible to complete it with a single machine, but if you install it on a server with good specifications, You will be able to easily analyze from weak clients and share the results with everyone.
Pandas is a Python data analysis library. A data structure that makes it easy to operate numerical values and matrices, and a summary of the operations.
Behind the scenes, I'm using a numerical library for Python, such as numpy and scipy. Thanks to that, the speed of numerical calculation is fast.
This time, we went in the following environment.
Debian 7.6.0 (64bit) was selected as the OS for the virtual environment.
Download and install the file that suits your environment from this page. It is compatible with all major operating systems such as Windows, Mac, and Linux. If you follow the installer's instructions, there should be no problem.
Download and install the file that suits your environment from this page. It is compatible with Windows, Mac, Linux (RedHat, Debian series) OS.
Select the Box file from this page. This time I chose Debian 7.6.0 (64bit).
https://github.com/jose-lpa/packer-debian_7.6.0/releases/download/1.0/packer_virtualbox-iso_virtualbox.box
Execute the following command.
$ vagrant box add debian-7.6 https://github.com/jose-lpa/packer-debian_7.6.0/releases/download/1.0/packer_virtualbox-iso_virtualbox.box
$ vagrant list
...
debian-7.6 (virtualbox, 0)
...
$ mkdir -p ~/vagrant/debian7.6 #Create a location where you want to install the virtual environment
$ cd ~/vagrant/debian7.6
$ vagrant init debian-7.6
$ ls
Vagrantfile
Edit the created Vagrantfile as follows.
Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "debian-7.6"
config.vm.network "private_network", ip: "192.168.20.10"
config.vm.provider "virtualbox" do |vb|
vb.customize ["modifyvm", :id, "--memory", "2048"]
end
end
The virtual machine's private IP is now 192.168.20.10
,
Memory allocation can be set to 2GB.
Start the virtual machine with the following command and connect with SSH.
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'debian-7.6'...
...
$ vagrant ssh
Linux packer-virtualbox-iso-1411922062 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Sep 28 16:43:22 2014 from 10.0.2.2
vagrant@packer-virtualbox-iso-1411922062:~$
You can now log in to the virtual environment.
You can return to the local environment with $ logout
or Ctrl + D
.
When terminating the virtual machine
$ vagrant halt
You can end with.
This time, we will use the Python 2.7 series of the system.
Since it is a virtual machine, only pip is used for package management, Without any special package management by virtualenv etc. Install it in Python on your system.
Execute the following command to install all the modules required for analysis.
$ sudo apt-get update
$ sudo apt-get upgrade
...
Do you want to continue [Y/n]? Y
...
$ sudo apt-get install -y gcc g++ libpyside-dev python2.7-dev libevent-dev python-all-dev build-essential python-numpy python-scipy python-matplotlib libatlas-dev libatlas3gf-base python-pandas emacs
$ pip install --user --install-option="--prefix=" -U scikit-learn
Install IPython with the following command.
$ sudo pip install "ipython[all]"
Create a configuration file and write the following contents at the beginning of the configuration file.
$ ipython profile create nbserver
$ emacs /home/vagrant/.ipython/profile_nbserver/ipython_notebook_config.py
ipython_notebook_config
# Configuration file for ipython-notebook.
c = get_config()
c.IPKernelApp.pylab = 'inline'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 9999
...
Start as a server with the following command.
$ ipython notebook --profile=nbserver &
When you access http://192.168.20.10:9999/, you will see the following screen.
Now select New-> Python2
in the upper right to bring up the interactive interpreter.
This time it is a virtual environment, so there is no problem, but when using it in a real environment, refer to the following page and You should set a password.
Start IPython notebook server-Set password to restrict access
sample.py
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(range(100))
Enter the above code and click ▶
to execute.
Now you have a Python analysis environment.
Recommended Posts