Christmas is near, so let's talk about Airflow's web server.
The leading web server this time is one of the Airflow modules. For other modules,
And so on. For more information, Astronomer's article is easy to understand.
webserver is the management screen (shown below), CLI command (part), [API](https://airflow.apache. org / docs / stable / api.html) etc.
We accept processing such as.
Internally, it has a Flask + Gunicorn configuration, and the endpoint from the screen is [here](https://github.com/apache/airflow/blob/1.10.2/airflow/www/views. Defined in py).
(The figure is from Airflow official page)
The webserver not only accepts requests, but also ** reads DAG files on a regular basis **.
As a result, if the DAG file ** reads ** takes a long time,
Sometimes
Is it a warning? It has been.
It may be confusing that ** it takes a long time to load the DAG ** and ** it takes a long time to execute DAGRun **, but it is a different story, and the former is the problem this time.
To give an example, this is a ** DAG that takes a long time to load **
sleep(10000000)
start = DummyOperator(task_id='start')
This is a ** DAG that takes a long time to run ** DAGRun.
def hoge():
sleep(1000000)
slow_task = PythonOperator(
task_id='query_' + str(i),
python_callable=hoge,
)
Loading can be slow if there are a large number of tasks or if you are accessing the outside ** outside the task.
For those who are worried about the detailed flow:
Cloud Composer (Airflow 1.10.2) -I tried it with a DAG only for BigQuery Operator:
If only Graph View or Tree View is heavy, default_dag_run_display_number should be changed.
Some improvements have been proposed for this "loading DAG".
Cloud Composer implements an option to make DAG loading asynchronous on webserver (https://cloud.google.com/composer/docs/how-to/accessing/airflow-web-interface#asynchronous-load) It has also been [ported] to Airflow 1.10.4 (https://issues.apache.org/jira/browse/AIRFLOW-4924).
It's still a draft, but [AIP-24 DAG Persistence in DB using JSON for Airflow Webserver and (optional) Scheduler](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in The proposal (+ DB + using + JSON + for + Airflow + Webserver + and +% 28optional% 29 + Scheduler? FocusedCommentId = 123898950) is a more significant change.
We are proposing options. (It seems that it is not good that the webserver has a state in the first place It seems that there is a story))
A note about the Cloud Composer webserver:
By the way, Astronomer.io can change the size of vCPU / memory.