In the analysis project of Cloud Pak for Data (CP4D), Notebook and Data Refinery Flow can be converted to Job and executed in batch. What I want to do this time is the following two points.

I want to give an argument to Job so that the behavior can be changed at runtime
I want to execute Job with API from outside CP4D

Strictly speaking, Job seems to be more accurate in the expression "set environment variables and start" than "pass arguments at runtime". I presume that it treats the environment variable as an OpenShift ConfigMap, probably because it launches internally as an OpenShift pod.

Let's actually start Job with API, give environment variables at that time, and pass it to the processing logic.

Create Notebook

Create a Notebook and turn it into a Job. Assuming "MYENV1", "MYENV2", and "MYENV3" as the environment variables handled this time, the values are processed into a pandas data frame and output as CSV to the data assets of the analysis project. Of course, these environment variables are not defined by default, so set the default values in default in os.getenv.

import os
myenv1 = os.getenv('MYENV1', default='no MYENV1')
myenv2 = os.getenv('MYENV2', default='no MYENV2')
myenv3 = os.getenv('MYENV3', default='no MYENV3')

print(myenv1)
print(myenv2)
print(myenv3)
# -output-
# no MYENV1
# no MYENV2
# no MYENV3

Next, dataframe these three values with pandas and

import pandas as pd
df = pd.DataFrame({'myenv1' : [myenv1], 'myenv2' : [myenv2], 'myenv3' : [myenv3]})
df
# -output-
#	myenv1	myenv2	myenv3
# 0	no MYENV1	no MYENV2	no MYENV3

Export as a data asset for your analysis project. Add a time stamp to the file name. The output of data assets to the analysis project is [this article](https://qiita.com/ttsuzuku/items/eac3e4bedc020da93bc1#%E3%83%87%E3%83%BC%E3%82%BF%E8 % B3% 87% E7% 94% A3% E3% 81% B8% E3% 81% AE% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 81% AE% E4% BF % 9D% E5% AD% 98-% E5% 88% 86% E6% 9E% 90% E3% 83% 97% E3% 83% AD% E3% 82% B8% E3% 82% A7% E3% 82% AF% E3% 83% 88).

from project_lib import Project
project = Project.access()
import datetime
now = datetime.datetime.now(datetime.timezone(datetime.timedelta(hours=9))).strftime('%Y%m%d_%H%M%S')

project.save_data("jov_env_test_"+now+".csv", df.to_csv(),overwrite=True)

Create Job

From the Notebook menu, select File> Save Versions to save the version. Required when creating a Job. Then click the Job button at the top right of the Notebook screen> Create Job. Give the job a name and click Create.

Run Job

Let's execute the created Job on the CP4D screen. First, just click the "Run Job" button and execute it without defining any environment variables.

OK when the job is executed and it becomes "Complete".

Looking at the data assets of the analysis project, a CSV file is generated,

If you click on the file name to see the preview, you can see that the default value set in Notebook is stored.

Next, set the environment variable and execute it. Click "Edit" of "Environment Variables" on the Job screen and set the following 3 lines.

MYENV1=1
MYENV2=hoge
MYENV3=10.5

It is like this.

After submitting the settings, try executing Job again. The contents of the resulting CSV file look like this. Since it is an environment variable, it is treated as a character string String even if you enter a numerical value.

Run Job with API

Use python requests to kick the created Job via API. From the Python environment outside CP4D, execute the following code.

Authentication

In order to get a token, basic authentication is performed with a user name and password, and an accessToken is obtained. For authentication, [Example of running with curl in CP4D v2.5 product manual](https://www.ibm.com/support/knowledgecenter/ja/SSQNUZ_2.5.0/wsj/analyze-data/ml-authentication- There is local.html).

url = "https://cp4d.hostname.com"
uid = "username"
pw = "password"

import requests

#Authentication
response = requests.get(url+"/v1/preauth/validateAuth", auth=(uid,pw), verify=False).json()
token = response['accessToken']

The verify = False option in requests is a certificate checking workaround if CP4D is using a self-signed certificate.

Get Job List

Get the Job list of the analysis project. As a preparation, find out the ID of the analysis project to be used on CP4D in advance. View and verify the environment variable PROJECT_ID in the Notebook in your analysis project.

`Project ID survey(Run on Notebook on CP4D)`


import os
os.environ['PROJECT_ID']
# -output-
# 'f3110316-687e-450a-8f17-57296c907973'

Set the project ID found above and get the job list with API. The API uses the Watson Data API. The API reference is Jobs / Get list of jobs under a project is.

project_id = 'f3110316-687e-450a-8f17-57296c907973'
headers = {
    'Authorization': 'Bearer ' + token,
    'Content-Type': 'application/json'
}

# Job list
response = requests.get(url+"/v2/jobs?project_id="+project_id, headers=headers, verify=False).json()
response
# -output-
#{'total_rows': 1,
# 'results': [{'metadata': {'name': 'job_env_test',
#    'description': '',
#    'asset_id': 'b05d1214-d684-4bd8-b1fa-cc05a8ccee81',
#    'owner_id': '1000331001',
#    'version': 0},
#   'entity': {'job': {'asset_ref': '6e0b450e-2f9e-4605-88bf-d8a5e2bda4a3',
#     'asset_ref_type': 'notebook',
#     'configuration': {'env_id': 'jupconda36-f3110316-687e-450a-8f17-57296c907973',
#      'env_type': 'notebook',
#      'env_variables': ['MYENV1=1', 'MYENV2=hoge', 'MYENV3=10.5']},
#     'last_run_initiator': '1000331001',
#     'last_run_time': '2020-05-31T22:20:18Z',
#     'last_run_status': 'Completed',
#     'last_run_status_timestamp': 1590963640135,
#     'schedule': '',
#     'last_run_id': 'ebd1c2f1-f7e7-40cc-bb45-5e12f4635a14'}}}]}

The above asset_id is the ID of Job "job_env_test". Store it in a variable.

job_id = "b05d1214-d684-4bd8-b1fa-cc05a8ccee81"

Run Job

Execute the above Job with API. The API reference is Job Runs / Start a run for a job. You need to give json the value job_run at runtime, including the runtime environment variables here.

jobrunpost = {
  "job_run": {
      "configuration" : {
          "env_variables" :  ["MYENV1=100","MYENV2=runbyapi","MYENV3=100.0"] 
      }
  }
}

Give the above job_run as json and run the job. The execution ID is stored in the'asset_id'of the response'metadata'.

# Run job
response = requests.post(url+"/v2/jobs/"+job_id+"/runs?project_id="+project_id, headers=headers, json=jobrunpost, verify=False).json()

# Job run id
job_run_id = response['metadata']['asset_id']
job_run_id
# -output-
# 'cedec57a-f9a7-45e9-9412-d7b87a04036a'

Check Job execution status

After running, check the status. API reference is Job Runs / Get a specific run of a jobis.

# Job run status
response = requests.get(url+"/v2/jobs/"+job_id+"/runs/"+job_run_id+"?project_id="+project_id, headers=headers, verify=False).json()
response['entity']['job_run']['state']
# -output-
# 'Starting'

If you run this requests.get several times, the result will change to'Starting'->' Running'->'Completed'. When it becomes'Completed', the execution is completed.

Execution result

Return to the CP4D screen and check the contents of the CSV file generated in the data asset of the analysis project.

It was confirmed that the environment variable specified in job_run is properly stored in the result data.

(bonus) Double-byte characters could also be used for the value of the job_run environment variable.

`Jobs containing double-byte characters_run`


jobrunpost = {
  "job_run": {
      "configuration" : {
          "env_variables" :  ["MYENV1=AIUEO","MYENV2=a-I-U-E-O","MYENV3=Aio"] 
      }
  }
}

Execution result:

After that, you can boil or bake the value (character string) of the environment variable received in Job's Notebook or use it as you like.

(Reference material) https://github.ibm.com/GREGORM/CPDv3DeployML/blob/master/NotebookJob.ipynb This repository contained useful samples of Notebooks that can be used with CP4D.

Execute API of Cloud Pak for Data analysis project Job with environment variables

Create Notebook

Create Job

Run Job

Run Job with API

Authentication

Get Job List

Project ID survey(Run on Notebook on CP4D)

Run Job

Check Job execution status

Execution result

Jobs containing double-byte characters_run

`Project ID survey(Run on Notebook on CP4D)`

`Jobs containing double-byte characters_run`