Testing DAG in Airflow
This article is the continuation of my learning in Airflow. You can access my previous writing in here. After I successfully implemented some DAGs locally, I decided to run tests on them.
Overview
Testing is one of the key aspects in developing any kind of software since we all make mistakes. Human error could possibly cause defects or failures in the system that we design. Some main objectives of testing are:
- Discovery of defects or bugs before we deliver it to the client.
- Ensure a reliable and easy-to-use system,
- Increase overall quality of the system.
This article will be focused on unit testing. Unit testing is a type of software testing mechanism where we only test the functionality of its individual units or components.
In the context of Airflow, it is definitely better to test our DAG architecture and its corresponding functionality first before deploying it in the production environment to ensure its quality and no components or parameters are left behind.
Validity Test
One idea is to test all of our DAGs’ validity. Some points that may need to be checked are:
- Are all of the dags loaded well?
- Is the total number of DAGs the same with our design?
- Does it have the specific parameters as it should?
To execute the test, I used the unittest library. Initially, we need to get all our data related to the DAGs by using the DagBag class provided by Airflow. Then we create our test cases to match whether our condition meets our specification or not.
To execute the test, we need to enter a command python3 -m unittest test-file-name.py
in the terminal.
Below is the result of running the unittest. We can see that our test have failed since there is one DAG that doesn’t have email in its default arguments.
DAG Parameter Test
We could also inspect specific DAG further to see if it meets our requirements. Some points that I test are:
- The specific DAG is properly loaded
- The amount of task in the dag
- List of tasks that it has
The concept and procedure of executing this test is quite the same as the validity test before. Below are the codes and the output of the test in the terminal.
Here I tested DAG called ‘postgres_operator_dags’ consisting of 4 tasks that I made before.
We can see that the DAG is successfully tested and it passes all of the three test cases.
Upstream/Downstream Testing
Another idea is to test upstream or downstream dependencies of a task. We could access the information of task dependencies by calling downstream_task_ids
or upstream_task_ids
method provided in DAG objects.
In code snippet above, I accessed the list of downstream dependencies and checks whether the total amount of tasks are matched with the config file.
References
- Airflow Documentation: https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#unit-tests
- Airflow Testing from GoogleTechTalks: https://www.youtube.com/watch?v=qUwz20v7lcc
- Python Tutorial for Unittesting Module by Corey Schafer: https://www.youtube.com/watch?v=6tNS--WetLI