This is a memo about how to access dashDB for Analytics, which is grouped in IBM Watson Data Platform, from Python, which is the mainstream for numerical data processing.
dashDB is an analytics service that combines in-memory analytics with "DB2 BLU" and in-database analytics from PureData for Analytics (Netezza) and can be used as a cloud service. It is available on Bluemix, a cloud platform provided by IBM. dashDB uses the BLU acceleration built into DB2 as its core analytics engine. BLU Acceleration is a completely new SQL processing engine developed based on the research results of IBM's Basic Research Laboratories. BLU acceleration is packed with innovations, as introduced in the features and benefits of DB2 for LUW. The most important point among them is that the tuning that was inherent in the analysis process is no longer necessary. All the user needs to do to take advantage of BLU acceleration is populate the data. High-speed analysis can be performed immediately after data is input. This is an essential element for dashDB to be offered as an analytics service in the cloud. To put it the other way around, it can be said that dashDB was made possible because of the introduction of BLU acceleration, which is an analysis engine that does not require tuning. And dashDB is provided as a service, so you can start using it quickly without any initial investment or infrastructure construction. (1)
Python provides libraries for numerical data processing, high-level scientific computing, machine learning, etc. Maybe Python is more popular as an analytics language than R language. There is enough information to think that it is. (2), (3), (4) So let's see how to use dashDB for Analytics from Python.
Python & dashDB You can use ODBC and JDBC to connect to dashDB from Python. Instructions on the settings and links to download drivers can be found on the Bluemix dashDB admin screen.
The following screenshot shows one of Bluemix's dashDB admin screens, with Connect selected and SSL selected. To connect to dashDB from Python via an ODBC driver, you need to download and install the driver. Reference (8) will help you to install and configure the ODBC driver. However, the DSN name cannot be connected unless the name displayed on the page corresponding to the above screen copy is used. The sample code below works on Ubuntu 14.04, which runs on my MacBook's vagrant environment.
!/usr/bin/env python
-*- coding:utf-8 -*-
ODBC
from ibmdbpy.base import IdaDataBase
idadb = IdaDataBase(dsn="DASHDB", uid="******", pwd="***********")
print "current_schema = ", idadb.current_schema
idadb.close();
ubuntu@ubuntu-xenial:~/dashDB$ ./test1.py
current_schema = DASH9824
Connection closed.
If you download the above-mentioned ODBC driver, the JDBC driver is also included, so you can also set the connection at http://pythonhosted.org/ibmdbpy/start.html#jdbc- on the reference material (5) site. Located in connection, the JDBC driver can also be downloaded from Reference (9). This is also the code that runs on the disguise server Ubuntu14.04 that runs on the MacBook.
!/usr/bin/env python
-*- coding:utf-8 -*-
JDBC
from ibmdbpy.base import IdaDataBase
jdbc = 'jdbc:db2://dashdb-entry-yp-dal09-09.services.dal.bluemix.net:50001/BLUDB:user=dash982;password=**********'
idadb = IdaDataBase(jdbc)
print idadb.current_schema
idadb.close();
However, as of June 1, 2017, this code doesn't work. The cause is that the version of JayDeBeApi has been upgraded. You can refer to this issue at the following URL: RuntimeError: No matching overloads found. at native/common/jp_method.cpp:117 #18 https://github.com/ibmdbanalytics/ibmdbpy/issues/18
The bmdbpy project provides a Python interactive interface for IBM dash DB and IBM DB2 database manipulation algorithms and data manipulation. Accelerate Python parsing by seamlessly pushing operations written in Python to a connected database to perform, thereby benefiting from in-database performance-enhancing features such as columnar storage and parallelism. receive. The ibmdbpy project can be used by Python developers without learning. This is because it mimics the well-known user interface of the Pandas library for manipulating data and the Scikit-learn library for machine learning algorithms. The ibmdbpy project is available in Python releases 2.7-3.4. You can connect to a dashDB or DB2 instance via ODBC or JDBC. The project is still in its infancy and many of its features are still under development. However, some experiments have already demonstrated significant performance benefits when working with medium or large amounts of data, that is, tables with more than a million rows. (Five) The Pandas user interface can be intuitively understood by looking at 10 Minutes to pandas http://pandas.pydata.org/pandas-docs/stable/10min.html.
There is ibmdbpy as a library for operating dashDB from Python, and you can drop it into your own execution environment with pip install. (5) (6) (7)
ubuntu@ubuntu-xenial:~/dashDB$ pip install ibmdbpy
It works as below, but it seems that it is necessary to take a closer look because it is still under development.
ubuntu@ubuntu-xenial:~/dashDB$ python
Python 2.7.12 (default, Feb 15 2017, 05:35:22)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ibmdbpy.base import IdaDataBase
>>> idadb = IdaDataBase(dsn="DASHDB", uid="******", pwd="*******")
>>> idadb.ida_query("SELECT* FROM SAMPLES.CITSTATUS")
䥃 彔 佃 䕄 䥃 彔 䕄 䍓
0 1 Born in the U.S.
1 2 Born in Puerto Rico/Guam/Virgin Islands/Northe...
2 3 Born abroad of American parent(s)
3 4 Naturalized Citizen
4 5 Not a citizen of the U.S.
Ibm_db for Python can be installed with pip install ibm_db. When I verified it, it seemed that there was a problem connecting with the DSN name, so I succeeded in accessing it when I accessed it with the connection character string without using the catalog. It seems that data containing Japanese can be used without problems.
!/usr/bin/env python
import ibm_db
url = "DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-09.services.dal.bluemix.net;PORT=50001;PROTOCOL=TCPIP;UID=******; PWD=*****;SECURITY=ssl"
sql = "SELECT* FROM SAMPLES.CITSTATUS"
conn = ibm_db.connect(url, "", "")
stmt = ibm_db.prepare(conn, sql)
ibm_db.execute(stmt)
data = ibm_db.fetch_tuple(stmt)
while(data):
print data[0], data[1]
data = ibm_db.fetch_tuple(stmt)
ibm_db.free_stmt(stmt)
ibm_db.close(conn)
The execution result is as follows, which is the same as normal ODBC access.
ubuntu@ubuntu-xenial:~/dashDB$ ./test5.py
1 Born in the U.S.
2 Born in Puerto Rico/Guam/Virgin Islands/Northern M
3 Born abroad of American parent(s)
4 Naturalized Citizen
5 Not a citizen of the U.S.
I tried accessing Bluemix's dashDB for analytics with Python. The ibmdbpy project applies a UI compatible with Pandas, and I have high expectations, but since it is in the early stages of development, I would like to keep an eye on future results. Also, the Python version of the legacy ibm_db is stable and can be used smoothly.
There is also a sequel How to write code to access python dash DB on Blumix or local.
(1) Let's touch dashDB https://www.ibm.com/developerworks/jp/data/library/dashdb/dm-dashdb1-bluemix/ (2) Data science with Python http://pythondatascience.plavox.info/ (3) Why Python is chosen for machine learning http://qiita.com/yaju/items/5502115d7e3d06e6bbdd (4) Scipy Lecture Notes http://www.turbare.net/transl/scipy-lecture-notes/index.html (5) ibmdbpy Accelerating Python Analytics by In-Database Processing http://pythonhosted.org/ibmdbpy/index.html (6) Python package index ibmdbpy 0.1.4 A Pandas-like SQL-wrapper for in-database analytics with IBM dashDB/DB2. https://pypi.python.org/pypi/ibmdbpy (7) GitHUB ibmdbanalytics/ibmdbpy https://github.com/ibmdbanalytics/ibmdbpy (8) Data analysis with Bluemix "dashDB": I tried to set ODBC on the Linux server http://qiita.com/azzeten/items/55dc4ad1535bf709403b (9) DB2 JDBC Driver Versions and Downloads http://www-01.ibm.com/support/docview.wss?uid=swg21363866 (10) ibm_db 2.0.7 https://pypi.python.org/pypi/ibm_db/
Recommended Posts