Translation of Scheduling workflow of the document of Digdag official website + α The final goal is to create a batch in Rails using Digdag's Ruby http://docs.digdag.io/scheduling_workflow.html
Getting started Architecture Concepts Workflow definition Scheduling workflow Operators Command reference Language API -Ruby Change the setting value for each environment with Digdag (RubyOnRails) Batch implementation in RubyOnRails environment using Digdag
Scheduling workflow
Setting up a schedule
To run the workflow on a regular basis, add the `schedule:`
option to the beginning of the workflow definition.
Generate a simple workflow. If you want to run it at 9 o'clock every day in Japan time, add a schedule option as follows.
workflows.dig
timezone: Asia/Tokyo
schedule:
daily>: 09:00:00
+step1:
sh>: echo start schedule ${session_time}
Start the Digdag server and push the workflows project to the Digdag server.
$ digdag server --memory
$ digdag push workflows
Looking at the details of the workflow, the execution history at 9 o'clock is added to the session and the Status is Success.
You can also check the actual trial information by clicking Success in Sessions.
option
|syntax|Description|Example|
|:-----------------|:------------------|:-----------------|
| hourly>: MM:SS |Run at 30 minutes every hour| hourly>: 30:00 |
| daily>: HH:MM:SS |Run at 7 o'clock every hour| daily>: 07:00:00|
| weekly>: DDD,HH:MM:SS |Run at 9am every week| weekly>: Sun,09:00:00|
| monthly>: D,HH:MM:SS |Run at 9am on the first day of every month| monthly>: 1,09:00:00|
| minutes_interval>: M |Run this job every 30 minutes| minutes_interval>: 30|
| cron>: CRON |Use cron format for complex scheduling||
The digdag check command indicates when the first schedule starts.
#### **`check`**
```rb
$ digdag check
2020-07-12 09:23:10 +0900: Digdag v0.9.41
System default timezone: Asia/Tokyo
Definitions (1 workflows):
workflows (2 tasks)
Parameters:
{}
Schedules (1 entries):
workflows:
daily>: "09:00:00"
first session time: 2020-07-13 00:00:00 +0900
first scheduled to run at: 2020-07-13 09:00:00 +0900 (in 23h 36m 47s)
Caution
When using hourly, ddaily, weekly, or monthly, the session time may differ from the actual execution time.
Session time is 00 on the actual execution date:00:00 (if hourly, the time is 00:Set to 00).
Schedule example (system time: 2019-02-24 14:20:10 +0900) All session times are "00:00:00"
schedule | first session time | first scheduled to run at |
---|---|---|
hourly>: 32:32 | 2019-02-24 14:00:00 +0900 | 2019-02-24 14:32:32 +0900 |
daily>: 10:32:32 | 2019-02-25 00:00:00 +0900 | 2019-02-25 10:32:32 +0900 |
weekly>: 2,10:32:32 | 2019-02-26 00:00:00 +0900 | 2019-02-26 10:32:32 +0900 |
monthly>: 2,10:32:32 | 2019-03-02 00:00:00 +0900 | 2019-03-02 10:32:32 +0900 |
Running scheduler
digdag scheduler
Starts the scheduler.
$ digdag scheduler
If you change the workflow definition, the scheduler automatically reloads the digdag.dig file, so you do not need to restart it.
I changed the time previously set at 9 o'clock to 12 o'clock and tried running `digdag scheduler`
, but I got an error.
I thought it might be a bug in digdag and found the following ISSUE.
Since the scheduler did not seem to be actively maintained, we recommend `digdag server``` rather than
`digdag scheduler```.
workflows.dig
timezone: Asia/Tokyo
schedule:
daily>: 12:00:00
+step1:
sh>: echo start schedule ${session_time}
$ digdag scheduler
1) Error injecting constructor, java.lang.IllegalArgumentException: Configured authenticatorClass not found: io.digdag.standards.auth.jwt.JwtAuthenticator
at io.digdag.server.ServerModule$AuthenticatorProvider.<init>(ServerModule.java:176)
while locating io.digdag.server.ServerModule$AuthenticatorProvider
while locating io.digdag.spi.Authenticator
for the 1st parameter of io.digdag.server.AuthRequestFilter.<init>(AuthRequestFilter.java:28)
while locating io.digdag.server.AuthRequestFilter
while locating io.digdag.server.AuthRequestFilter annotated with @com.google.inject.internal.UniqueAnnotations$Internal(value=3)
1) Error injecting constructor, java.lang.IllegalArgumentException: Configured authenticatorClass not found: io.digdag.standards.auth.jwt.JwtAuthenticator
at io.digdag.server.ServerModule$AuthenticatorProvider.<init>(ServerModule.java:176)
while locating io.digdag.server.ServerModule$AuthenticatorProvider
while locating io.digdag.spi.Authenticator
for the 1st parameter of io.digdag.server.AuthRequestFilter.<init>(AuthRequestFilter.java:28)
while locating io.digdag.server.AuthRequestFilter
while locating io.digdag.server.AuthRequestFilter annotated with @com.google.inject.internal.UniqueAnnotations$Internal(value=3)
Checking scheduling status You can manage your schedule using client-mode commands. We will organize the commands in client mode in the future. If you want to get early, see the link below http://docs.digdag.io/command_reference.html#client-mode-commands
The scheduler command listens on `` `http://127.0.0.1:65432``` by default. Only connections from 127.0.0.1 (localhost) will be accepted. For security reasons, this does not open the port to the public network. Use the -bind ADDRESS option to change the listen address.
Setting an alert if a workflow doesn’t finish within expected time Since there was no example, I will check it with a simple code. Show a warning if the workflow takes more than 1 second to run
workflows.dig
timezone: Asia/Tokyo
schedule:
daily>: 11:45:00
sla:
duration: 00:00:01
+notice:
sh>: echo alert! so slow
+step1:
sh>: ./your_script.sh
sleep 5
And delay for 5 seconds
your_script.sh
#!/bin/bash
sleep 5
echo start script
If you want to see the server log, add the task-log option at startup.
$digdag server --memory --task-log ./task_log
Since the workflow execution time set in 1 second took 5 seconds, the alet message is displayed in Log..
Status is Success.
Options
This parameter supports the fail: BOOLEAN
and alert: BOOLEAN
options.
c
If set to, the workflow will fail.
alert:true
Sends notifications using the notification mechanism described above.
fail: If set to true, the workflow will fail.
alert: If set to true, alerts will be sent using the notification mechanism described above. In the above example, I think that this setting is Default because the Status was Success even after the time passed.
If you add true, you can see that Status becomes Failure when the execution time exceeds 1 second.
#### **`add_fail_option`**
```rb
timezone: Asia/Tokyo
schedule:
daily>: 12:03:00
sla:
duration: 00:00:01
fail: true
+notice:
sh>: echo alert! so slow
+step1:
sh>: ./your_script.sh
Skipping a next workflow session You may frequently run workflows that take longer than the duration between sessions, such as sessions every 30 or 60 minutes. This variation over the duration of the workflow can occur for several reasons. For example, if the amount of data you normally process is increasing.
For example, suppose you have a workflow that runs every hour and typically takes only 30 minutes. However, this is a holiday, and the workflow is now processing large amounts of data that takes 1 hour and 30 minutes as a result of the significant increase in site utilization. During this period, the second workflow will start running for the next hour. Both run at the same time, which puts additional strain on the available resources.
In this case, it's best to skip the next 1-hour workflow session and instead utilize a subsequent session to process the 2-hour data. To do this, I added the following:
Used to control whether scheduled session execution is skipped if the session is already running
Scheduled workflow sessions have a `` `last_executed_session_time``` variable that contains the session time that was previously executed. Usually the same as ``` last_session_time```. The value is different if `` `skip_on_overtime: true``` is set or if the `` `session is the first run` ``.
Define a workflow that runs every minute
#### **`workflows.dig`**
```rb
timezone: Asia/Tokyo
schedule:
minutes_interval>: 1
skip_on_overtime: true
+step1:
sh>: ./your_script.sh
your_script.sh
#!/bin/bash
sleep 120
echo start script
The actual script is a sample that takes 2 minutes even if it is reserved for execution every minute.
Even if the execution is set every minute by true, the next execution time is actually determined 1 minute after the first execution is completed.
![スクリーンショット 2020-07-12 12.33.17.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/22feddc3-65bf-9142-6f10-a69e107b0780.png)
**Skipping backfill**
```skip_delayed_by```With the option```backfill```The command allows you to skip the creation of a session that is delayed by a specified amount of time.
When Digdag restarts, scheduled sessions will be created automatically until the next session in last_session_time.
schedule: minutes_interval>: 1 skip_delayed_by: 3m
+setup: sh>: echo ${session_time}
When the schedule is stopped
![スクリーンショット 2020-07-12 13.16.51.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/d11954b1-14e2-a4a9-f907-5d9163894e15.png)
When the schedule resumes (restarts at 17 minutes)
![スクリーンショット 2020-07-12 13.17.38.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/50df8228-91e5-e455-70c9-74320978be10.png)
The 14-minute, 15-minute, and 16-minute sessions 3 minutes before the restart time are re-executed, and sessions that are more than 3 minutes old are skipped.
Recommended Posts