[RUBY] Learn Digdag from Digdag Official Documentation-Ah Concepts

Target

Translation of Concepts of Digdag official website document + α The final goal is to create a batch in Rails using Digdag's Ruby http://docs.digdag.io/concepts.html

#table of contents

Getting started Architecture Concepts Workflow definition Scheduling workflow Operators Command reference Language API -Ruby Change the setting value for each environment with Digdag (RubyOnRails) Batch implementation in RubyOnRails environment using Digdag

Projects and revisions The workflow is packaged with the other files used in the workflow. The file can be `SQL script, Python / Ruby / Shell script, configuration file`, etc. This workflow definition set is called a project.

When the project is uploaded to the Digdag server, the Digdag server inserts the new version and keeps the old version. The version of the project is called the revision. Digdag uses the latest revision by default when you run the workflow. However, you can use the older revision for the following purposes:

1.The purpose of checking the definition of past workflow execution.
2.The purpose of running a workflow using a previous revision and reproducing the same results as before
3.The purpose of reverting to an older revision to resolve issues with the latest version

A project can contain multiple workflows. However, if the new workflow is not related to other workflows, you need to create a new project. The reason is that uploading a new revision updates all workflows in the project together.

Sessions and attempts A session is a workflow execution plan that should complete successfully. Attempt means executing a session. If you rerun the failed workflow, the session will have multiple attempts. A session means an execution plan, and a trial means executing the execution plan.

The reason for separating sessions and trials is that execution can fail. When listing sessions, the expected status is that all sessions are green. If you find a failed session, check the attempt and debug the problem from the log. You can start a new attempt after uploading a new revision to fix the problem. The session makes it easy to see that all planned executions have been successful. スクリーンショット 2020-07-09 21.32.04.png

Scheduled execution and session_time The session has a time stamp of session_time. It means the execution start time of the workflow.

session_time is unique in the workflow history. If you send two sessions with the same session_time, later requests will be rejected. This prevents previously concurrently running sessions from being accidentally sent. If you need to run the workflow at the same time, you should retry the previous session instead of sending a new session.

Task When a session trial is started, the workflow is transformed into a task set. Tasks have interdependencies. Digdag understands the dependencies and performs the tasks in sequence.

Export and store parameters

1. local:Parameters set directly on the task
2. export:Parameters exported from the parent task
3. store:Parameters saved in the previous task

The above parameters are merged into one object when the task is executed. The `local parameter` has the highest priority. export and store parametersOverwrite each other, so the parameters set in later tasks have higher priority.

export parameterIs used by the parent task to pass values to the child. store parameterIs used in tasks that pass values to all subsequent tasks, including children.

The impact of the export parameter is limited compared to the store parameter. This allows workflows to be ``` modularized` ``. For example, workflows use several scripts to process data. You can set some parameters in the script to control the behavior of the script. On the other hand, other scripts should be unaffected by the parameters (for example, the data reading part should not be affected by changes in the data processing part). In this case, you can put the script under a single parent task and have the parent task export the parameters.

The store parameter is visible in all subsequent tasks-the store parameter is not visible in the previous task. For example, suppose you run a workflow and try again. In this case, the parameters saved by the task are not visible to the previous task, even if the task completed successfully on the last run.

The store parameter is not a global variable. If the two tasks run in parallel, they use different store parameters. This ensures that the workflow behaves consistently, regardless of the actual execution timing. For example, if another task is performed depending on two parallel tasks, the parameters saved by the last task will be used in the task submission order.

Operators and plugins The operator is the performer of the task. The operator is set as ``` sh>, pg>` ``, etc. in the workflow definition. When the task is executed, Digdag selects one operator, merges all the parameters (local, export, and store parameters) and passes the merged parameters to the operator.

The operator can think of it as a package for a typical workload. You can do more than a script with an operator.

The operator is designed as a plugin (although not yet fully implemented). Install an operator to simplify your workflow and create an operator for reuse in other workflows. Digdag is a simple platform for running many operators.

Dynamic task generation and _check/_error tasks Digdag transforms a workflow into a set of dependent tasks. The graph for this task is called the DAG, Directed Acyclic Graph (https://en.wikipedia.org/wiki/Directed Acyclic Graph). DAGs are good for running from the most dependent tasks to the end.

However, it cannot represent a loop. Representing an IF branch is also not easy.

But loops and branches are useful. To solve this problem, Digdag dynamically adds tasks to the running DAG. Example) Digdag spawns three tasks that represent loops: + example ^ sub + loop-0, + example ^ sub + loop-1, + example ^ sub + loop-2 (the names of the dynamically spawned tasks are `` `^ sub``` is added):


+example:
  loop>: 3
  _do:
    echo>: this is ${i}th loop

Execution result


2020-07-10 20:48:11 +0900 [INFO](0017@[0:default]+mydag+example): loop>: 3
2020-07-10 20:48:12 +0900 [INFO](0017@[0:default]+mydag+example^sub+loop-0): echo>: this is 0th loop
this is 0th loop
2020-07-10 20:48:12 +0900 [INFO](0017@[0:default]+mydag+example^sub+loop-1): echo>: this is 1th loop
this is 1th loop
2020-07-10 20:48:12 +0900 [INFO](0017@[0:default]+mydag+example^sub+loop-2): echo>: this is 2th loop
this is 2th loop

_checkand_errorThe option uses dynamic task generation. These parameters are used by digdag to perform another task only if the task succeeds or fails.

_checkTasks are created after the task completes successfully. This is especially useful if you want to validate the results of a task before starting the next task.

_errorThe task is generated after the task fails. This is useful for notifying external systems of task failures.

The following example outputs the success of subsequent tasks. It also outputs a message that the task failed.

python


+example:
  sh>: echo start
  _check:
    +succeed:
      echo>: success
  _error:
    +failed:
      echo>: fail

Execution result(success)


2020-07-10 21:05:33 +0900 [INFO](0017@[0:default]+mydag+example): sh>: echo start
start
2020-07-10 21:05:33 +0900 [INFO](0017@[0:default]+mydag+example^check+succeed): echo>: success
success

Removed your_script.sh to raise an error

Execution result(error)


2020-07-10 20:56:49 +0900 [INFO](0017@[0:default]+mydag+example^error+failed): echo>: fail
fail
2020-07-10 20:56:49 +0900 [INFO](0017@[0:default]+mydag^failure-alert): type: notify
error: 

Task naming and resuming The task being tried has a unique name. When you retry the attempt, this name will be used to match the task in the last attempt.

The child task is prefixed with the name of the parent task. The workflow name is also prefixed as the root task. In the following example, the task names would be + my_workflow + load + from_mysql + tables, + my_workflow + load + from_postgres, and + my_workflow + dump.

my_workflow.dig


+load:
  +from_mysql:
    +tables:
        sh>: echo tables
  +from_postgres:
    sh>: echo from_postgres
+dump:
   sh>: echo dump

result


2020-07-10 21:12:12 +0900 [INFO](0017@[0:default]+my_workflow+load+from_mysql+tables): sh>: echo tables
tables
2020-07-10 21:12:13 +0900 [INFO](0017@[0:default]+my_workflow+load+from_postgres): sh>: echo from_postgres
from_postgres
2020-07-10 21:12:13 +0900 [INFO](0017@[0:default]+my_workflow+dump): sh>: echo dump
dump

Workspace A workspace is a directory where tasks are performed. Digdag extracts files from the project archive to this directory, modifies the directory there, and runs the task (Note: running in local mode assumes that the current working directory is the workspace. , Never create a workspace).

The plugin does not allow access to the workspace's parent directory. This is because the digdag server is running in a shared environment. The project should be self-contained so that it does not have to rely on the external environment. Script operators are an exception (eg sh> operator). It is recommended to run the script using the docker: option.

Recommended Posts

Learn Digdag from Digdag Official Documentation-Ah Concepts
Learn Digdag from Digdag official documentation-Ah Workflow definition
Learn Digdag from Digdag Official Documentation-Architecture
Learn Digdag from Digdag Official Documentation-Getting started
Learn Digdag from Digdag Official Documentation-Scheduling workflow
Learn Digdag from Digdag official documentation-Language API-Ruby
Learn Digdag from Digdag Official Documentation-Scheduling workflow
Learn Digdag from Digdag official documentation-Ah Workflow definition
Learn Digdag from Digdag Official Documentation-Architecture
Learn Digdag from Digdag Official Documentation-Getting started
Learn Digdag from Digdag official documentation-Language API-Ruby
Learn Digdag from Digdag Official Documentation-Ah Concepts