[Ruby] Learn Digdag from the official Digdag documentation-Ar Concepts

6 minute read

Target

Digdag official website documentation Concepts translation + α The final goal is to make a batch in Rails using Digdag’s Ruby http://docs.digdag.io/concepts.html

table of contents

Getting started Architecture Concepts Workflow definition Scheduling workflow Operators Command reference Language API -Ruby Change the setting value for each environment with Digdag (RubyOnRails) Batch implementation in RubyOnRails environment using Digdag

Projects and revisions

Workflows are packaged with other files used in workflows. The file can be anything like ```SQL scripts, Python / Ruby / Shell scripts, configuration files

.


This workflow definition set is called a project.

When a project is uploaded to the Digdag server, the Digdag server inserts the new version and keeps the old version. The version of a project is called a revision. When you run the workflow, Digdag will use the latest revision by default. However, you can also use older revisions for the following purposes:

  1. The purpose of checking the definition of past workflow execution.
  2. To run the workflow using the previous revision and reproduce the same results as before
  3. The purpose of reverting to an older revision to resolve problems with the latest version ```

A project can have multiple workflows. However, if your new workflow is not related to any other workflow, you need to create a new project. The reason is that uploading a new revision updates all workflows in the project together.

Sessions and attempts

A session is a workflow execution plan that should complete successfully. Attempt means executing a session. When you re-run the workflow that failed, the session will have multiple attempts. A session means executing an execution plan, and a trial means executing that execution plan.

The reason for separating sessions and attempts is that execution can fail. When listing the sessions, the expected status is that all sessions are green. If you find a failed session, check the attempt and debug the problem from the log. You can start a new trial after uploading a new revision to fix the problem. In the session, you can easily see that all planned runs were successful. ![Screenshot 2020-07-09 21.32.04.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108475/b338df27-8e53-8cec-e30b-(39794c6a5fd5.png)

Scheduled execution and session_time

The session has a timestamp of ```session_time

. It means the execution start time of the workflow.



session_time is unique in the workflow history. If you send two sessions with the same session_time, later requests will be rejected. This prevents accidentally sending previously concurrently running sessions.
If you need to run workflows concurrently, you should retry past sessions instead of sending new sessions.

## Task
When a session attempt is initiated, the workflow is converted into a task set.
Tasks have interdependencies. Digdag understands the dependencies and executes the tasks in sequence.

## Export and store parameters
  1. local: parameters set directly on the task
  2. export: Parameters exported from the parent task
  3. store: Parameters saved in the previous task ```

The above parameters will be merged into one object when the task is executed. The ```local parameter

has the highest priority.


The ```export and store parameters
#### **` overwrite each other, so the parameters set in later tasks have higher priority.`**

The ```export parameter

is used by the parent task to pass values to the children.


The ```store parameter 
#### **` is used in tasks that pass values to all subsequent tasks, including children.`**

The effect of the export parameter is limited compared to the store parameter. This allows the workflow to be ```modularized

. For example, workflows use some scripts to process data.


You can set some parameters in the script to control the behavior of the script. On the other hand, other scripts should be unaffected by the parameters (for example, the data read part should not be affected by changes in the data processing part). In this case, you can put the script under a single parent task and have the parent task export the parameters.

The store parameter is visible in all subsequent tasks-store parameter is not visible in previous tasks. For example, suppose you run a workflow and try again. In this case, the parameters saved by the task are not visible to the previous task, even if the task completed successfully on its last run.

The store parameter is not a global variable. If two tasks run in parallel, they use different store parameters. This makes the workflow behave consistently regardless of when it actually executes. For example, if another task depends on two parallel tasks, the parameters saved by the last task will be used in the task submission order.

## Operators and plugins
The operator is the performer of the task. The operator is set as ```sh>, pg>
#### **` etc. in the workflow definition.`**

When the task runs, Digdag picks one operator, merges all the parameters (local, export, and store parameters) and passes the merged parameters to the operator.

An operator can be considered a package for common workloads. You can do more with scripts using operators.

Operators are designed as plugins (though not fully implemented yet). Install operators to simplify workflows and create operators for reuse in other workflows. Digdag is a simple platform for running many operators.

Dynamic task generation and _check/_error tasks

Digdag transforms a workflow into a series of tasks with dependencies. The graph for this task is called the DAG, Directed Acyclic Graph (Directed Acyclic Graph). DAGs are good for running the most dependent tasks through to the end.

However, it cannot represent a loop. Expressing an IF branch is not easy either.

But loops and branches are useful. To solve this problem, Digdag dynamically adds a task to the running DAG. Example) Digdag spawns three tasks that represent the loop: +example^sub+loop-0, +example^sub+loop-1, +example^sub+loop-2 (the dynamically spawned task is named ```^sub

``

``` is added)


+example:
  loop>: 3
  _do:
    echo>: this is ${i}th loop

execution result


2020-07-10 20:48:11 +0900 [INFO](0017@[0:default]+mydag+example): loop>: 3
2020-07-10 20:48:12 +0900 [INFO](0017@[0:default]+mydag+example^sub+loop-0): echo>: this is 0th loop
this is 0th loop
2020-07-10 20:48:12 +0900 [INFO](0017@[0:default]+mydag+example^sub+loop-1): echo>: this is 1th loop
this is 1th loop
2020-07-10 20:48:12 +0900 [INFO](0017@[0:default]+mydag+example^sub+loop-2): echo>: this is 2th loop
this is 2th loop

The _check and ```_error

options use dynamic task creation. These parameters are used by Digdag to execute another task only if the task succeeds or fails.



The ```_check
#### **` task is generated after the task has completed successfully. This is especially useful if you want to verify the task results before starting the next task.`**

The ```_error

task is created after the task fails. This helps notify external systems of task failures.



The following example outputs the success of subsequent tasks. It also outputs a message that the task failed.


#### **``**
```rb

+example:
  sh>: echo start
  _check:
    +succeed:
      echo>: success_error:
    +failed:
      echo>: fail

Execution result (success)


2020-07-10 21:05:33 +0900 [INFO](0017@[0:default]+mydag+example): sh>: echo start
start
2020-07-10 21:05:33 +0900 [INFO](0017@[0:default]+mydag+example^check+succeed): echo>: success
success

delete your_script.sh to generate an error

Execution result (error)


2020-07-10 20:56:49 +0900 [INFO](0017@[0:default]+mydag+example^error+failed): echo>: fail
fail
2020-07-10 20:56:49 +0900 [INFO](0017@[0:default]+mydag^failure-alert): type: notify
error:

Task naming and resuming

The task you are trying to have has a unique name. If you try again, this name will be used to match the task on the last try.

Child tasks are prefixed with the name of the parent task. The workflow name is also prefixed as the root task. In the following example, the task names are + my_workflow+load+from_mysql+tables, +my_workflow+load+from_postgres, and +my_workflow+dump.

my_workflow.dig


+load:
  +from_mysql:
    +tables:
        sh>: echo tables
  +from_postgres:
    sh>: echo from_postgres
+dump:
   sh>: echo dump

result


2020-07-10 21:12:12 +0900 [INFO](0017@[0:default]+my_workflow+load+from_mysql+tables): sh>: echo tables
tables
2020-07-10 21:12:13 +0900 [INFO](0017@[0:default]+my_workflow+load+from_postgres): sh>: echo from_postgres
from_postgres
2020-07-10 21:12:13 +0900 [INFO](0017@[0:default]+my_workflow+dump): sh>: echo dump
dump

Workspace

The workspace is the directory where tasks are performed. Digdag will extract the files from the project archive to this directory, change directory there and run the task (note: in local mode execution, the current working directory is assumed to be the workspace). , Never create a workspace).

The plugin does not allow access to the workspace’s parent directory. This is because the digdag server is running in a shared environment. The project must be self-contained so that it does not have to depend on the external environment. Script operators are an exception (eg sh> operator). We recommend that you run the script with the docker: option.