This is going to be a quick post on Airflow. We realized that in one of our environments, Airflow scheduler picks up old task instances that were already a success (whether marked as success or completed successfully). You can verify this is actually your issue by ssh into your Airflow workers, and run:
ps -ef | grep airflow
And check the DAG Run IDs: most of them are for old runs.
This happens when Celery’s Backend, in our case Redis, has old keys (or duplicate keys) of task runs. So the solution would be to clear Celery queue. And here are the steps to do it when Celery runs on Redis:
1- Stop Airflow Scheduler:
sudo initctl status airflow-scheduler
sudo initctl stop airflow-scheduler
2- Stop webserver:
sudo initctl status airflow-webserver
sudo initctl stop airflow-webserver
3- Stop Celery Flower:
cd /var/lib/airflow/bin
./airflow.sh flower status
./airflow.sh flower stop
4- Stop workers:
cd /var/lib/airflow/bin
./airflow.sh worker status
./airflow.sh worker stop
Now ssh into the server where Redis is running and type “redis-cli” and press enter to get into Redis CLI. Follow steps below to flush Redis DB:
- INFO keyspace — List keyspaces
a. You should get only 1 result back
- SELECT 0 — Select Database
- config get dir — Get database file location to take backup
- Copy file “xxxx.db” from above location to your home directory
- FLUSHDB — Flush database
Now you can start all Airflow services:
1- Scheduler commands
sudo initctl start airflow-scheduler
sudo initctl status airflow-scheduler
2- Webserver commands
sudo initctl start airflow-webserver
sudo initctl status airflow-webserver
3- Flower commands
cd /var/lib/airflow/prd/bin
nohup ./airflow.sh flower start &
./airflow.sh flower status
4- Worker commands
cd /var/lib/airflow/prd/bin
nohup ./airflow.sh worker start &
./airflow.sh worker status
Go back to Airflow and validate all DAGs are starting and completing successfully.
And he happy ever after! 🙂