Testing DAG in Airflow

Photo by Fotis Fotopoulos on Unsplash

This article is the continuation of my learning in Airflow. You can access my previous writing in here. After I successfully implemented some DAGs locally, I decided to run tests on them.

Overview

Testing is one of the key aspects in developing any kind of software since we all make mistakes. Human error could possibly cause defects or failures in the system that we design. Some main objectives of testing are:

  • Discovery of defects or bugs before we deliver it to the client.
  • Ensure a reliable and easy-to-use system,
  • Increase overall quality of the system.

This article will be focused on unit testing. Unit testing is a type of software testing mechanism where we only test the functionality of its individual units or components.

In the context of Airflow, it is definitely better to test our DAG architecture and its corresponding functionality first before deploying it in the production environment to ensure its quality and no components or parameters are left behind.

Validity Test

One idea is to test all of our DAGs’ validity. Some points that may need to be checked are:

  • Are all of the dags loaded well?
  • Is the total number of DAGs the same with our design?
  • Does it have the specific parameters as it should?

To execute the test, I used the unittest library. Initially, we need to get all our data related to the DAGs by using the DagBag class provided by Airflow. Then we create our test cases to match whether our condition meets our specification or not.

To execute the test, we need to enter a command python3 -m unittest test-file-name.py in the terminal.

Below is the result of running the unittest. We can see that our test have failed since there is one DAG that doesn’t have email in its default arguments.

Validity test results

DAG Parameter Test

We could also inspect specific DAG further to see if it meets our requirements. Some points that I test are:

  • The specific DAG is properly loaded
  • The amount of task in the dag
  • List of tasks that it has

The concept and procedure of executing this test is quite the same as the validity test before. Below are the codes and the output of the test in the terminal.

Here I tested DAG called ‘postgres_operator_dags’ consisting of 4 tasks that I made before.

Snippet of the result in the terminal

We can see that the DAG is successfully tested and it passes all of the three test cases.

Upstream/Downstream Testing

Another idea is to test upstream or downstream dependencies of a task. We could access the information of task dependencies by calling downstream_task_idsor upstream_task_idsmethod provided in DAG objects.

Code snippet of testing downstream dependencies

In code snippet above, I accessed the list of downstream dependencies and checks whether the total amount of tasks are matched with the config file.

References

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Increase efficiency with multi-tenant cloud software architecture

7 Reasons to Invest in Microsoft MS-900 Practice Test

What is KYVE (short request form)

Portofolio Part 5: “Mavible”

Tkinter Layout Methods

Healenium: Self-Healing Library for Selenium Test Automation

The array of objects in Java

What is Amazon Elastic Load Balancer (ELB)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Fauzan Ragitya

Fauzan Ragitya

More from Medium

Message platform patterns

Available Tools and Frameworks for Big Data Engineering

Efficient Monitoring for Reducing Data Downtime

Data Model in DBMS