(Added how to install R-package. Http://qiita.com/TomokIshii/items/b43321448ab9fa21dc10#%E8%BF%BD%E8%A8%98r-package-%E3%81%AE%E3% 82% A4% E3% 83% B3% E3% 82% B9% E3% 83% 88% E3% 83% BC% E3% 83% AB 2016/9/2)
Anniversary Update for Windows 10 has been released, and Bash on Ubuntu on Windows can be used. There is a report that "TensorFlow also worked!", But this time, I decided to try to improve the environment of XGBoost (Python-package). XGBoost is a library (Xgboost = eXtreme Gradient Boosting) that implements the gradient boosting method. Although it is a C ++ program, it also supports use from Python, R, Julia, and Java.
I remember that when I installed XGBoost on Windows a while ago, it took a lot of time to start with installing MinGW. Expecting improvement, I proceeded with the work while thinking of writing an article like "This time, installation is so easy!", But I had some troubles, so I will introduce the situation.
(The environment I worked on this time is Windows 10, ver.1607, Bash on Ubuntu on Windows (Windows Sybsystem for Linux), Python 3.5.2, pyenv, miniconda3-4.0.5, xgboost ver.0.6.)
If you install Bash on Ubuntu on Windows by referring to Microsoft's blog article, the environment of Ubuntu 14.04LTS will be included, but there are almost no programs as the development environment. Therefore, we first introduced the basic tools.
sudo apt-get install git
sudo apt-get install gcc
sudo apt-get install g++
$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
So, gcc 4.8.4 is included.
Install make
because it is not included by default.
sudo apt-get install make
Now that gcc, g ++, etc. have been installed, build XGBoost.
XGBoost build
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
make -j4
In the Windows 10 native environment I tried before, it took a lot of trouble to build, but this time, the build is completed in one shot. In the xgboost / lib
directory, there were libxgboost.a
and libxgboost.so
.
Since there may be cases where Python2 is still needed, create an environment for pyenv and switch between Python2 and 3. (Reference) https://github.com/yyuu/pyenv#installation
It is necessary to add environment variables related to PATH, but here I edited .bashrc
using vi
that was included from the beginning.
Select python to install using pyenv. The list of choices is shown below.
pyenv install -l
Here, I am at a loss as to whether to select the (full package) Anaconda system or the (minimum package) miniconda system, but this time I chose miniconda3-4.0.5.
pyenv install miniconda3-4.0.5
After that, install the modules required for numerical calculation.
conda install numpy, scipy, scikit-learn
conda install ipython
It should have been all right so far, but at the end I stumbled. First, when you type the command according to the XGBoost documentation,
cd python-package; sudo python setup.py install
An error occurs with a message that various things are not enough. For this, python-setuptools is required. (It was properly written in the XGBoost documentation.)
sudo apt-get install python-setuptools
After this, go back to the next step and
sudo python setup.py install
I thought it would be recovered, but an error related to "command not found". The cause is a mismatch that the above command tried to execute at the system level (root authority) while the environment of pyenv was maintained at the user level. (Pyenv switches between multiple environments by laying shim under $ HOME / .pyenv by default (?).)
Therefore, install again at the user level.
python setup.py install
Installation is completed successfully. (I thought ...) Test xgboost with the distribution code predict_first_ntree.py.
$ python predict_first_ntree.py
OMP: Error #100: Fatal system error detected.
OMP: System error #22: Invalid argument
Cancel(Core dump)
This is the result of unknown cause (unexpected). The only clue is the word ** "OMP" **. Searching on the net, OMP = OpenMP (Open Multi-Processing). The one that seems to be related to this is the numerical calculation library ** MKL ** (Math Kernel Library) made by Intel, which was installed in Miniconda. (Installed as a prerequisite library for Numpy and Scipy.)
MKL supports numerical calculation related libraries such as Numpy and Scipy to improve performance, but it should be noted that it often causes troubles in terms of environment maintenance. Previously, in the Ubuntu environment (without virtual environment) (I do not know what triggered it), suddenly the Deep Learning Framework Theano and TensorFlow occurred at the same time, and as a result of hurrying investigation, it was caused by MKL. was there.
This time, it seems that MKL could not be supported because it is an Ubuntu virtual environment. I replaced the MKL related libraries and tried it. The replacement is a command to install nomkl
. The MKL library was removed and openblas was installed instead.
conda install nomkl
After that, execute predict_fist_ntree.py, which is a demonstration of a simple binary classification problem. (At the beginning, the distribution code has been modified.)
import os
# check os environment
if os.name == 'nt': # Windows case ... add mingw lib path
mingw_path = 'C:\\usr\\mingw-w64\\x86_64-5.4.0-win32-seh-rt_v5-rev0\\mingw64\\bin'
os.environ['PATH'] = mingw_path + ';' + os.environ['PATH']
if os.name == 'posix': # Linux case
break
import numpy as np
import xgboost as xgb
# load data
dtrain = xgb.DMatrix('./data/agaricus.txt.train')
dtest = xgb.DMatrix('./data/agaricus.txt.test')
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
watchlist = [(dtest,'eval'), (dtrain,'train')]
num_round = 3
bst = xgb.train(param, dtrain, num_round, watchlist)
print ('start testing prediction from first n trees')
# predict using first 1 tree
label = dtest.get_label()
ypred1 = bst.predict(dtest, ntree_limit=1)
# by default, we predict using all the trees
ypred2 = bst.predict(dtest)
print ('error of ypred1=%f' % (np.sum((ypred1>0.5)!=label) /float(len(label))))
print ('error of ypred2=%f' % (np.sum((ypred2>0.5)!=label) /float(len(label))))
Below are the calculation results.
$ python predict_first_ntree.py
[15:25:08] 6513x127 matrix with 143286 entries loaded from ./data/agaricus.txt.train
[15:25:08] 1611x127 matrix with 35442 entries loaded from ./data/agaricus.txt.test
[0] eval-error:0.042831 train-error:0.046522
[1] eval-error:0.021726 train-error:0.022263
[2] eval-error:0.006207 train-error:0.007063
start testing prediction from first n trees
error of ypred1=0.042831
error of ypred2=0.006207
So it became normal operation. Information that this environment (Bash on Ubuntu on Windows) does not support MKL has already been posted on Qiita. (I learned later ...)
(Qiita article-Pikkaman V) http://qiita.com/PikkamanV/items/d308927c395d6e687a6a (Source) https://scivision.co/anaconda-python-with-windows-subsystem-for-linux/
I have just started using the environment of Bash on Ubuntu on ..., but I have high expectations for this environment. I'm sorry about not supporting MKL, but before --Windows + library "that" + compiler "this" + tool "it" From such a state, it seems to be a big improvement.
Also, since XGBoost itself has just been upgraded to ver.0.6 (skipping ver.0.5), I would like to continue studying and deepen my understanding of XGBoost.
At this point, Docker for Windows has also been released, and the programming environment for windows (although it may have a different purpose than Bash on ..) has become more interesting. (Although it may be annoying.)
{Devtools} is required as a prerequisite package. Furthermore, since the C library required for the R package {devtools} is not included in the initial state of Bash on Ubuntu, it was necessary to install about two packages with sudo apt-get install
.
When {devtools} is entered, launch the R interpreter and
library(devtools)
install('xgboost/R-package')
It should have been OK. ('Xgboost / R-package' is a relative path, so you need to specify the path appropriately according to the current directory.)
The result of executing the above script is as follows.
> library(devtools);install('R-package')
Installing xgboost
URL 'https://cran.rstudio.com/src/contrib/Matrix_1.2-7.1.tar.gz'I'm trying
Content type 'application/x-gzip' length 1805890 bytes (1.7 MB)
==================================================
downloaded 1.7 MB
sh: 1: /bin/gtar: not found
sh: 1: /bin/gtar: not found
system(cmd, intern = TRUE)Error in:An error occurred while executing the instruction
Additional Information:Warning message:
utils::untar(src, exdir = target, compressed = "gzip")so:
‘/bin/gtar -xf '/tmp/RtmplJiuv1/Matrix_1.2-7.1.tar.gz' -C '/tmp/RtmplJiuv1/devtools24847e356e71'’ returned error code 127
This is an error message that / bin / gtar
does not exist when expanding Matrix_1.2-7.1.tar.gz. Bash on Ubuntu has / bin / tar
, so I'd like you to use it, but it seems that the installation script is not made that way. It's okay to link to the route, but when I looked it up on the net, there was a countermeasure on stackoverflow.
Error in untar( ) while using R
Sys.setenv(TAR = '/bin/tar')
After setting the above on the R interpreter, I ran'install (R-package)' to complete the installation.
Recommended Posts