Create and run embulk config in Jupyter

Dynamically generate many embulk config files (hereafter embulk config) And I think there are occasional cases where you want to do that.

For embulk, please refer to the following page. http://qiita.com/hiroysato/items/397f36c4838a0a93e352 http://qiita.com/hiroysato/items/da45e52fb79c39547f69

When Jupyter can generate and execute embulk config file It's convenient to proceed through trial and error. I think that the efficiency of creating embulk config will also increase.

Create embulk config

    f=open('[file name]','w')
    setting = '''in:\n\
  type: gcs\n\
  bucket: xxxx\n\
  path_prefix: aaa/bbb/ccc_\n\
  auth_method: private_key\n\
  service_account_email: {{ env.SERVICE_ACCOUNT_EMAIL }}\n\
  p12_keyfile: ../key/{{ env.P12_FILENAME }}\n\
  application_name: zzz\n\
  tasks: 1\n\
  parser:\n\
    charset: UTF-8\n\
    newline: LF\n\
    header_line: true\n\
    type: csv \n\
    delimiter: \',\' \n\
    quote: \'\"\' \n\
    columns: \n\
    - {name: name, type: string}\n\
    - {name: title, type: string}\n\
    - {name: words, type: string}\n\
\n
out: \n\
  type: file \n\
  path_prefix: tmp \n\
  file_ext: txt \n\
  formatter: \n\
    type: csv \n\
    charset: UTF-8 \n\
    delimiter: \'\\\' \n\
    header_line: false \n\
    newline: LF'''

    f.write(setting)
    f.close()

I'm sorry that there is nothing special about it, Just write the contents of embulk config to a file. I think that the output embulk config will be easier to see if you add "\ n " to the line breaks.

embulk run

  os.system('embulk run [file name]')

Do it, paying attention to the path.

Usage case

If you have many tables you want to migrate, or if you want to separate files for each type of data, It is convenient to use. When it becomes possible to dynamically create many embulk configs using for statements etc. It becomes troublesome to manually create embulk config one by one.

Example

Generate and execute a file according to the multiplication of categories 1 to 5

for a in [1, 2, 3, 4, 5]:
  for b in [1, 2, 3, 4, 5]:
    filename = a + '-' + b '_xxx.yml.liquid'
    f.open(filename,'w')
    setting = '''in:\n\
      [embulk setting]
    '''
    f.write(setting)
    f.close()
    os.system('embulk run ' + filename)

Finally

Have embulk in, out, filter, etc. as separate character strings It will be more convenient if you generate embulk config by combining them.

Jupyter files runipy You can also batch execute using It was also easy to periodically execute the completed process after trial and error.

I think that the contents described this time can only be done without Jupyter, Generate and run embulk config from most recent Jupyter, It was easy to operate other related processing with Jupyter. I have listed it here.