A reminder about how to read files on object storage from Python code on a Notebook in one of Watson Data Platform's core services, Data Science Experience (https://datascience.ibm.com/) (DSX). It is a record.
The writing code is for reading data on Bluemix PaaS object storage from a Python program on DSX Notebooks, Please be careful not to make a mistake because the object storage is the object storage on the service of Bluemix PaaS, not the IBM Cloud Onject Storage of the Bluemix infrastructure. This code was written with the concept of temporarily storing data sent from a server or IoT device in the object storage of Bluemix PaaS, reading it from the Python code of DSX, and executing scientific calculation processing.
Figure 1 This Python code that works with the Concept Data Science Experience Notebook
To use it, log in to DSX and select Project-> Default Project (any project name)-> Add Notebooks to create one notebook. Then copy and paste the code below to change the object storage credentials to your own credentials and you're ready to go. The method of acquiring authentication information will be described later.
You can take a file from object storage and populate it in a variable in your Python code with the following method: It is convenient to store the captured data in an array of Num.py. Please note that the first argument is the authentication information, and the user ID and password are changed for each container. The second argument is the object (file) name. Since the container name is set in the credentials, it is not explicitly set here.
result,status,Label,Data = Read_CSV_from_ObjectStorage(credentials_1, filename)
The first return value result returns True on success and False on failure. The second return value, stauts, contains the HTTP code. 200 is set for success. If the authentication fails, an error code in the 400s will be set. The third return value Label returns a list of item name labels in the header line of the CSV file. The fourth return value Data is the content of the data. All the data is converted to Float type and returned in the array.
The following is the whole reading code. Copy the following code to the DSX Notebook and change the necessary parts to use it. The number of columns in the CSV format file is programmed to automatically correspond.
%matplotlib inline
from io import BytesIO
import requests
import numpy as np
import matplotlib.pyplot as plt
import json
# Object storage authentication information <-Replace by learning the authentication information acquisition method described later.
credentials_1 = {
'auth_url':'https://identity.open.softlayer.com',
'project':'object_storage_bc6cdc85_586e_4581_8a09_8f01f7bdf3ed',
'project_id':'2a9de4c1d50944a49f1a46dd53394158',
'region':'dallas',
'user_id':'********************************',
'domain_id':'fb119f3e1bc0469dad2b253b317ec7ea',
'domain_name':'952993',
'username':'***********************************************',
'password':"********************",
'container':'DefaultProjecttakarajpibmcom',
'tenantId':'undefined',
'filename':'testdata_for_dsx.csv'
}
# Read from object storage
def Read_CSV_from_ObjectStorage(credentials, fileName):
"""This functions returns a StringIO object containing
the file content from Bluemix Object Storage V3."""
url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']},
'password': credentials['password']}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
#Exit when authentication error occurs
if resp1.status_code != 201:
return False, resp1.status_code, None, None
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
if(e1['type']=='object-store'):
for e2 in e1['endpoints']:
if(e2['interface']=='public'and e2['region']=='dallas'):
#url2 = ''.join([e2['url'],'/', credentials['container'], '/', credentials['filename']])
url2 = ''.join([e2['url'],'/', credentials['container'], '/', fileName])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'text/csv'}
resp2 = requests.get(url=url2, headers=headers2)
if resp2.status_code != 200:
return False, resp2.status_code, None, None
#Set in an array
tempArray = resp2.text.split ("\ n") #split into lines
csvLabel = [] # Label in first line of CSV
csvFloat = [] #Data part of CSV after the second line
lineNo = 0 # line count
for row in tempArray:
if len(row) > 0:
c = row.split(",")
if lineNo == 0:
csvLabel = c
else:
a = []
for i in range(0,len(c)):
a.append(float(c[i]))
csvFloat.append(a)
lineNo = lineNo + 1
return True, resp2.status_code,csvLabel,csvFloat
# Sample main
filename ='testDataSet.csv' <-Set the object name of the CSV file you want to read
result,status,Label,Data = Read_CSV_from_ObjectStorage(credentials_1, filename)
if result == True:
a = np.array (Data) # numpy 2D array (depending on the number of CSV columns)
# Graph drawing
x = np.array (a [:, [0]]) # Extract the first column
y = np.array (a [:, [1]]) # Extract the second column
plt.plot(x,y)
plt.show()
else:
print "ERROR ", status
First, register the CSV file in the DSX object storage. Note that there is a one-to-one correspondence between DSX projects and object storage containers. Therefore, please note that you cannot access the container of other projects from Notebook. Therefore, register the CSV file in the container associated with the project you are currently using. Specify Project-> Project Name on the menu bar to open the screen where the risks of Notebooks and Data Assets are displayed. Then, click + Add Data Assets in Data Assets, and the following will be displayed at the right end. If you drag and drop the file into the area of the dashed line displayed as Drop file here, the file will be uploaded. Then check the checkbox in front of the file name. It should now appear in Data Assets.
Then create a Notebook or open the developing Notebook in edit mode. Click the pen mark icon to open it in edit mode. And By clicking the icon of, the following display will appear, so if you click the downward triangle, a further menu will appear.
Click Insert Credentials at the bottom of this list to insert your credentials into your Notebook. Edit and ready.
In this code, the data is read and the graph is displayed. The graph corresponding to the data in the CSV file is displayed.
The information underlying this code is Working with Object Storage in Data Science Experience --Python Edition]( https://datascience.ibm.com/blog/working-with-object-storage-in-data-science-experience-python-edition/)の記事のJSON読み込み用コードを元に書き換えた物です。
Recommended Posts