[RUBY] Learn Digdag from Digdag Official Documentation-Scheduling workflow

Target

Translation of Scheduling workflow of the document of Digdag official website + α The final goal is to create a batch in Rails using Digdag's Ruby http://docs.digdag.io/scheduling_workflow.html

table of contents

Getting started Architecture Concepts Workflow definition Scheduling workflow Operators Command reference Language API -Ruby Change the setting value for each environment with Digdag (RubyOnRails) Batch implementation in RubyOnRails environment using Digdag

Scheduling workflow Setting up a schedule To run the workflow on a regular basis, add the `schedule:` option to the beginning of the workflow definition.

Generate a simple workflow. If you want to run it at 9 o'clock every day in Japan time, add a schedule option as follows.

workflows.dig


timezone: Asia/Tokyo

schedule:
  daily>: 09:00:00

+step1:
  sh>: echo start schedule ${session_time}

Start the Digdag server and push the workflows project to the Digdag server.

$ digdag server --memory
$ digdag push workflows

Looking at the details of the workflow, the execution history at 9 o'clock is added to the session and the Status is Success. スクリーンショット 2020-07-12 9.00.34.png

You can also check the actual trial information by clicking Success in Sessions. スクリーンショット 2020-07-12 9.08.42.png

option



 |syntax|Description|Example|
 |:-----------------|:------------------|:-----------------|
 | hourly>: MM:SS         |Run at 30 minutes every hour| hourly>: 30:00 |
 | daily>: HH:MM:SS       |Run at 7 o'clock every hour| daily>: 07:00:00|
 | weekly>: DDD,HH:MM:SS  |Run at 9am every week| weekly>: Sun,09:00:00|
 | monthly>: D,HH:MM:SS   |Run at 9am on the first day of every month| monthly>: 1,09:00:00|
 | minutes_interval>: M   |Run this job every 30 minutes| minutes_interval>: 30|
 | cron>: CRON      |Use cron format for complex scheduling||


 The digdag check command indicates when the first schedule starts.


#### **`check`**
```rb

$ digdag check
2020-07-12 09:23:10 +0900: Digdag v0.9.41
  System default timezone: Asia/Tokyo

  Definitions (1 workflows):
    workflows (2 tasks)

  Parameters:
    {}

  Schedules (1 entries):
    workflows:
      daily>: "09:00:00"
      first session time: 2020-07-13 00:00:00 +0900
      first scheduled to run at: 2020-07-13 09:00:00 +0900 (in 23h 36m 47s)

Caution

When using hourly, ddaily, weekly, or monthly, the session time may differ from the actual execution time.
Session time is 00 on the actual execution date:00:00 (if hourly, the time is 00:Set to 00).

Schedule example (system time: 2019-02-24 14:20:10 +0900) All session times are "00:00:00"

schedule first session time first scheduled to run at
hourly>: 32:32 2019-02-24 14:00:00 +0900 2019-02-24 14:32:32 +0900
daily>: 10:32:32 2019-02-25 00:00:00 +0900 2019-02-25 10:32:32 +0900
weekly>: 2,10:32:32 2019-02-26 00:00:00 +0900 2019-02-26 10:32:32 +0900
monthly>: 2,10:32:32 2019-03-02 00:00:00 +0900 2019-03-02 10:32:32 +0900

Running scheduler digdag schedulerStarts the scheduler.

$ digdag scheduler

If you change the workflow definition, the scheduler automatically reloads the digdag.dig file, so you do not need to restart it.

I changed the time previously set at 9 o'clock to 12 o'clock and tried running `digdag scheduler`, but I got an error. I thought it might be a bug in digdag and found the following ISSUE. Since the scheduler did not seem to be actively maintained, we recommend `digdag server``` rather than `digdag scheduler```.

workflows.dig


timezone: Asia/Tokyo

schedule:
  daily>: 12:00:00

+step1:
  sh>: echo start schedule ${session_time}
$ digdag scheduler
1) Error injecting constructor, java.lang.IllegalArgumentException: Configured authenticatorClass not found: io.digdag.standards.auth.jwt.JwtAuthenticator
  at io.digdag.server.ServerModule$AuthenticatorProvider.<init>(ServerModule.java:176)
  while locating io.digdag.server.ServerModule$AuthenticatorProvider
  while locating io.digdag.spi.Authenticator
    for the 1st parameter of io.digdag.server.AuthRequestFilter.<init>(AuthRequestFilter.java:28)
  while locating io.digdag.server.AuthRequestFilter
  while locating io.digdag.server.AuthRequestFilter annotated with @com.google.inject.internal.UniqueAnnotations$Internal(value=3)
1) Error injecting constructor, java.lang.IllegalArgumentException: Configured authenticatorClass not found: io.digdag.standards.auth.jwt.JwtAuthenticator
  at io.digdag.server.ServerModule$AuthenticatorProvider.<init>(ServerModule.java:176)
  while locating io.digdag.server.ServerModule$AuthenticatorProvider
  while locating io.digdag.spi.Authenticator
    for the 1st parameter of io.digdag.server.AuthRequestFilter.<init>(AuthRequestFilter.java:28)
  while locating io.digdag.server.AuthRequestFilter
  while locating io.digdag.server.AuthRequestFilter annotated with @com.google.inject.internal.UniqueAnnotations$Internal(value=3)

Checking scheduling status You can manage your schedule using client-mode commands. We will organize the commands in client mode in the future. If you want to get early, see the link below http://docs.digdag.io/command_reference.html#client-mode-commands

The scheduler command listens on `` `http://127.0.0.1:65432``` by default. Only connections from 127.0.0.1 (localhost) will be accepted. For security reasons, this does not open the port to the public network. Use the -bind ADDRESS option to change the listen address.

Setting an alert if a workflow doesn’t finish within expected time Since there was no example, I will check it with a simple code. Show a warning if the workflow takes more than 1 second to run

workflows.dig


timezone: Asia/Tokyo

schedule:
  daily>: 11:45:00
sla:
  duration: 00:00:01
  +notice:
    sh>: echo alert! so slow

+step1:
  sh>: ./your_script.sh

sleep 5And delay for 5 seconds

your_script.sh


#!/bin/bash
sleep 5
echo start script

If you want to see the server log, add the task-log option at startup.

$digdag server --memory --task-log ./task_log

Since the workflow execution time set in 1 second took 5 seconds, the alet message is displayed in Log..
Status is Success.

スクリーンショット 2020-07-12 11.46.19.png

Options This parameter supports the fail: BOOLEAN and alert: BOOLEAN options. cIf set to, the workflow will fail. alert:trueSends notifications using the notification mechanism described above.

fail: If set to true, the workflow will fail.
alert: If set to true, alerts will be sent using the notification mechanism described above. In the above example, I think that this setting is Default because the Status was Success even after the time passed.

If you add true, you can see that Status becomes Failure when the execution time exceeds 1 second.




#### **`add_fail_option`**
```rb

timezone: Asia/Tokyo

schedule:
  daily>: 12:03:00
sla:
  duration: 00:00:01
  fail: true
  +notice:
    sh>: echo alert! so slow

+step1:
  sh>: ./your_script.sh

スクリーンショット 2020-07-12 12.05.06.png

Skipping a next workflow session You may frequently run workflows that take longer than the duration between sessions, such as sessions every 30 or 60 minutes. This variation over the duration of the workflow can occur for several reasons. For example, if the amount of data you normally process is increasing.

For example, suppose you have a workflow that runs every hour and typically takes only 30 minutes. However, this is a holiday, and the workflow is now processing large amounts of data that takes 1 hour and 30 minutes as a result of the significant increase in site utilization. During this period, the second workflow will start running for the next hour. Both run at the same time, which puts additional strain on the available resources.

In this case, it's best to skip the next 1-hour workflow session and instead utilize a subsequent session to process the 2-hour data. To do this, I added the following:

Used to control whether scheduled session execution is skipped if the session is already running



 Scheduled workflow sessions have a `` `last_executed_session_time``` variable that contains the session time that was previously executed. Usually the same as ``` last_session_time```. The value is different if `` `skip_on_overtime: true``` is set or if the `` `session is the first run` ``.

 Define a workflow that runs every minute


#### **`workflows.dig`**
```rb

timezone: Asia/Tokyo

schedule:
  minutes_interval>: 1
  skip_on_overtime: true
+step1:
  sh>: ./your_script.sh

your_script.sh


#!/bin/bash
sleep 120
echo start script

The actual script is a sample that takes 2 minutes even if it is reserved for execution every minute.

Even if the execution is set every minute by true, the next execution time is actually determined 1 minute after the first execution is completed.


 ![スクリーンショット 2020-07-12 12.33.17.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/22feddc3-65bf-9142-6f10-a69e107b0780.png)

 **Skipping backfill**

```skip_delayed_by```With the option```backfill```The command allows you to skip the creation of a session that is delayed by a specified amount of time.

 When Digdag restarts, scheduled sessions will be created automatically until the next session in last_session_time.

schedule: minutes_interval>: 1 skip_delayed_by: 3m

+setup: sh>: echo ${session_time}


 When the schedule is stopped
 ![スクリーンショット 2020-07-12 13.16.51.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/d11954b1-14e2-a4a9-f907-5d9163894e15.png)

 When the schedule resumes (restarts at 17 minutes)
 ![スクリーンショット 2020-07-12 13.17.38.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/50df8228-91e5-e455-70c9-74320978be10.png)
 The 14-minute, 15-minute, and 16-minute sessions 3 minutes before the restart time are re-executed, and sessions that are more than 3 minutes old are skipped.



Recommended Posts

Learn Digdag from Digdag Official Documentation-Scheduling workflow
Learn Digdag from Digdag Official Documentation-Architecture
Learn Digdag from Digdag official documentation-Operators ① Workflow control operators
Learn Digdag from Digdag Official Documentation-Getting started
Learn Digdag from Digdag official documentation-Language API-Ruby
Learn Digdag from Digdag Official Documentation-Ah Concepts
Learn Digdag from Digdag Official Documentation-Scheduling workflow
Learn Digdag from Digdag Official Documentation-Ah Concepts
Learn Digdag from Digdag official documentation-Operators ① Workflow control operators
Learn Digdag from Digdag Official Documentation-Architecture
Learn Digdag from Digdag Official Documentation-Getting started
Learn Digdag from Digdag official documentation-Language API-Ruby