🛠️ Setting Up Airflow with Celery, RabbitMQ, and PostgreSQL — Solving Real-World Integration Issues
Abdellah Hallou

Abdellah Hallou @abdellahhallou

About: Passionate about all things data and machine learning, with a focus on NLP and ASR.

Joined:
Dec 2, 2021

🛠️ Setting Up Airflow with Celery, RabbitMQ, and PostgreSQL — Solving Real-World Integration Issues

Publish Date: Jun 6
0 0

Many data engineers struggle to integrate various tools like Airflow, Celery, RabbitMQ, and PostgreSQL into a cohesive architecture. I’ve been there. While online tutorials helped me configure each tool individually, they rarely covered the practical integration challenges that inevitably arise.

That’s exactly why I wrote this article: to consolidate all the resources I used, fill in the missing pieces, and provide a working setup with common errors resolved — based on real experience.

Article Image

Let’s get our hands dirty. But don’t worry — baby steps first 🍼

⚠️ Before We Start: A Quick Overview of the Stack

I won’t dive deep into each tool’s internal workings, just quick and simple definitions to get you on board:

  • Airflow: A task scheduler for defining and executing workflows. By default, it uses the SequentialExecutor and SQLite.
  • PostgreSQL: Recommended for production-grade Airflow setups instead of SQLite.
  • Celery: A distributed task queue that handles asynchronous job execution. Required if you’re using CeleryExecutor in Airflow.
  • RabbitMQ: A message broker used by Celery. While Redis is the default, we’ll be using RabbitMQ here.

🚀 Step-by-Step Setup

1. 🔧 Install Airflow and Essential Libraries

pip3 install apache-airflow[gcp,sentry,statsd]

Enter fullscreen mode Exit fullscreen mode

2. 🗄️ Initialize Airflow Database

Navigate to the airflow directory and initialize the airflow database:

cd airflow
airflow db init
Enter fullscreen mode Exit fullscreen mode

After initialization, you'll notice new files and directories in the airflow folder. Create a dags folder where all future DAGs will be stored and accessed by Airflow components:

mkdir dags
Enter fullscreen mode Exit fullscreen mode

3. 👤 Create an Airflow Admin User

airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@domain.com
Enter fullscreen mode Exit fullscreen mode

Verify the user was created successfully:

airflow users list
Enter fullscreen mode Exit fullscreen mode

Expected output:

id | username | email                 | first_name      | last_name      | roles
===+==========+=======================+=================+================+======
 1 | admin    | your_email@domain.com | your_first_name | your_last_name | Admin
Enter fullscreen mode Exit fullscreen mode

🎉 You’re now an admin. Feels good, doesn’t it?

4. ▶️ Start Scheduler and Webserver (in separate terminals)

Start scheduler (Terminal 1):

airflow scheduler
Enter fullscreen mode Exit fullscreen mode

In a new terminal (Terminal 2), activate the virtual environment and start the webserver:

source airflow_env/bin/activate
cd airflow
airflow webserver
Enter fullscreen mode Exit fullscreen mode

Once both services are initialized, open your browser and navigate to http://localhost:8080/.

Port 8080 is the default port for Airflow. If it's occupied, modify the port number in the airflow.cfg file.

If you see a jungle of random DAGs — yeah, Airflow comes with prebuilt DAGs that we don’t need. Feel free to explore the UI and experiment with these examples to understand how Airflow works.


🧹 Clean Up Example DAGs (Optional)

If you prefer a clean slate without example DAGs, you can disable them by setting the following variable in airflow.cfg file:

AIRFLOW__CORE__LOAD_EXAMPLES=False
Enter fullscreen mode Exit fullscreen mode

Creating Your First DAG

To create your first custom DAG, follow the detailed instructions at:

👉 How to Create First DAG in Airflow – GeeksForGeeks

And yes, once created, you’ll see it in the dashboard!

🧨 Now Airflow Works. You Happy??

Haha… thought that was it? Nah. The real fun begins now. Time to swap out the baby-defaults and give Airflow a proper backend and executor.

🗃️ From SQLite to PostgreSQL: No More Toy Databases.

SQLite got us off the ground, but we’re building an actual system now, we’re going PostgreSQL + LocalExecutor.

1. Install PostgreSQL

sudo apt install postgresql

Enter fullscreen mode Exit fullscreen mode

2. Configure PostgreSQL for Airflow

Access the PostgreSQL shell and create the necessary database and user:

sudo -u postgres psql

Enter fullscreen mode Exit fullscreen mode

Inside the prompt:

CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;

Enter fullscreen mode Exit fullscreen mode

🗂️ Update Airflow’s Config File

Update your airflow.cfg file with the following changes:

  1. Change the executor:

    executor = LocalExecutor
    
  2. Update the database connection:

    sql_alchemy_conn = postgresql+psycopg2://airflow_user:airflow_pass@localhost:5432/airflow_db
    

If you created a new PostgreSQL user for Airflow, the default search_path should work without changes. However, if you're using an existing user with a custom search_path, update it:

ALTER USER airflow_user SET search_path = public;
Enter fullscreen mode Exit fullscreen mode

📜 If Required: Update pg_hba.conf (Because Sometimes It Just Doesn’t Work)

If needed, add the Airflow user to PostgreSQL's access control list:

  1. Open the pg_hba.conf file (typically in the PostgreSQL data directory)
  2. Locate the “Local connections” or “IPv4 local connections” section (depending on the version of Postgres you are using)
  3. Add the following line:
host    all             airflow_user             127.0.0.1/32            md5

Enter fullscreen mode Exit fullscreen mode

Then restart PostgreSQL:

sudo service postgresql restart

Enter fullscreen mode Exit fullscreen mode

4. Re-initialize Airflow DB

Initialize your Airflow database with the new PostgreSQL backend:

airflow db init
Enter fullscreen mode Exit fullscreen mode

Takes a few seconds, so… stretch, hydrate, or scream internally.

🛡️ Optional: Fix Permissions (Because PostgreSQL Likes Drama)

If Airflow starts whining about permissions, you start by:

Checking your PostgreSQL users:

SELECT usename FROM pg_user;
Enter fullscreen mode Exit fullscreen mode

To grant privileges, you can either:

🔒 Keep it minimal:

GRANT CREATE, USAGE ON SCHEMA public TO airflow_user;

Enter fullscreen mode Exit fullscreen mode

💥 Or go nuclear:

ALTER ROLE airflow_user WITH SUPERUSER;
Enter fullscreen mode Exit fullscreen mode

(Use superuser only if you're lazy...)

👨‍💻 Recreate Admin User (Now for PostgreSQL Setup)

airflow users create \
  --username admin \
  --firstname FirstName \
  --lastname LastName \
  --role Admin \
  --email dummy@xyz.com \
  --password admin
# (The Fancy Way)
Enter fullscreen mode Exit fullscreen mode

Boom. You’re the boss now.

✅ Run Airflow Again (Like a Legend)

In terminal 1:

airflow webserver
Enter fullscreen mode Exit fullscreen mode

In terminal 2:

airflow scheduler

Enter fullscreen mode Exit fullscreen mode

Now open http://localhost:8080 and enjoy the view 😌

Now Airflow is running with PostgreSQL and ready for Celery integration. Happy? Haha, now the real fun begins…

In the previous section, you mastered the Local Executor and achieved parallel task execution in Airflow. Now it's time to level up with the Celery Executor to build truly production-ready pipelines. But first, we need to set up our supporting cast: Celery, Flower (the web server for Celery), and RabbitMQ as our message broker.

Deep breath - Here we go!

Deep breath

🐰 Setting Up RabbitMQ — Your Message Broker

RabbitMQ is like the post office of your pipeline — receiving, queueing, and delivering messages between your Airflow components. Built with Erlang, it’s designed to handle large-scale message traffic with resilience.

📦 1. Install Erlang

  • First, we need Erlang (because apparently rabbits speak Erlang):
sudo apt-get install erlang
Enter fullscreen mode Exit fullscreen mode
  • Perfect time to grab a cup of mint tea, while the installation does its magic... ☕.

🐇 2. Install RabbitMQ Server

sudo apt-get install rabbitmq-server
Enter fullscreen mode Exit fullscreen mode
  • Celebrate with a second cup, this time atay b chiba.

▶️ 3. Start the RabbitMQ Server

  • Let’s boot it up and verify it’s alive:
sudo service rabbitmq-server start

service --status-all
Enter fullscreen mode Exit fullscreen mode
  • You should see rabbitmq-server running happily as a service. If it's not running, give it another gentle nudge!

🌐 4. Enable RabbitMQ Management Dashboard

  • RabbitMQ comes with a sleek UI. Let’s unlock it:
sudo rabbitmq-plugins enable rabbitmq_management
Enter fullscreen mode Exit fullscreen mode
  • The default listening port for RabbitMQ is 15672 (memorize this number, you'll be seeing it a lot!)
  • Navigate to http://localhost:15672/ for the dashboard
  • Default credentials: username "guest" with password "guest" (security level: toddler-proof)
  • Voilà! A dashboard to spy on your queues.

🔐 5. Create a Custom User and Virtual Host

Let’s create a user and a dedicated vhost for Airflow:

sudo rabbitmqctl add_user admin admin
sudo rabbitmqctl rabbitmqctl add_vhost airflowvhost
Enter fullscreen mode Exit fullscreen mode
  • Make our "admin" user the boss:
sudo rabbitmqctl set_user_tags admin administrator
Enter fullscreen mode Exit fullscreen mode
  • Grant permissions:
sudo rabbitmqctl set_permissions -p airflowvhost admin "." "." ". "
Enter fullscreen mode Exit fullscreen mode
  • Now you can log in with the "admin" credentials and feel like you actually know what you're doing!

🌱 Installing Celery & Flower

⚙️ 1. Install Celery and Flower

  • Celery is your task executor, and Flower is your Celery UI garden 🌸
  • Install both:
sudo pip3 install apache-airflow[celery]
Enter fullscreen mode Exit fullscreen mode
  • This will also pull in required dependencies. (Time for more Moroccan tea?)

🌼 2. Launch the Flower Dashboard

  • Run Flower with:
airflow celery flower
Enter fullscreen mode Exit fullscreen mode
  • Default port: 5555 (another number for your collection!)
  • Visit http://localhost:5555/ to see Flower in action
  • If you see a clean interface, congratulations! If not... well, that's what the troubleshooting section is for 😅

🛠️ Airflow Configuration — Connect the Dots

  • Now for the moment of truth - let's tell Airflow about its new friends. Open your airflow.cfg file and update these parameters:
executor = CeleryExecutor
[celery]
broker_url = amqp://admin:admin@localhost:5672/airflowvhost # By default, RabbitMQ will listen on port 5672 on all available interfaces
celery_result_backend  = db+postgresql+psycopg2://admin:admin@localhost:5432/airflow_db # Port 5432 is dedicated to PostgreSQL connections
broker_connection_retry_on_startup = True
Enter fullscreen mode Exit fullscreen mode

💡 To avoid task revocation due to idle timeouts, set the consumer timeout in either:

RabbitMQ config (/etc/rabbitmq/rabbitmq.conf):

consumer_timeout = 31622400000
Enter fullscreen mode Exit fullscreen mode

or directly in airflow.cfg:

[celery_broker_transport_options]
consumer_timeout = 31622400000
Enter fullscreen mode Exit fullscreen mode

🔄 Restart Services

Make sure all gears are turning:

  • Restart PostgreSQL:
sudo service postgresql restart
Enter fullscreen mode Exit fullscreen mode
  • Start RabbitMQ:
sudo service rabbitmq-server start
Enter fullscreen mode Exit fullscreen mode
  • Initialize the Airflow DB:
airflow db init
Enter fullscreen mode Exit fullscreen mode

✅ Final Checklist: System Up & Running

Time to see if all your hard work pays off! You'll need four terminals (yes, four - you're basically a command line DJ now 🎧):

Component Command URL
Webserver airflow webserver localhost:8080
Scheduler airflow scheduler
Celery Worker airflow celery worker
Flower UI airflow celery flower localhost:5555
RabbitMQ Dashboard localhost:15672

Don’t forget to clean up old DAGs from the UI to let the scheduler re-scan everything afresh.

& That’s it.

📝 Bonus Tip

Always keep an eye on the celery.log file.. it's your best friend when things go sideways (and they occasionally will, because that's just how distributed systems roll 🤷‍♂️).

Reference:

https://habr.com/ru/companies/neoflex/articles/736292/

https://medium.com/accredian/setting-up-apache-airflow-in-ubuntu-324cfcee1427

https://documentation.ubuntu.com/server/how-to/databases/install-postgresql/index.html

https://medium.com/plumbersofdatascience/from-sqlite-to-postgresql-a-lighthearted-guide-to-upgrading-your-airflow-database-7b9c8de961b5

https://medium.com/coding-blocks/creating-user-database-and-adding-access-on-postgresql-8bfcd2f4a91e

https://stackoverflow.com/questions/69828547/precondition-failed-delivery-acknowledge-timeout-on-celery-rabbitmq-with-geve

https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html

https://stackoverflow.com/questions/74390647/postgres-airflow-db-permission-denied-for-schema-public

https://stackoverflow.com/questions/10757431/postgres-upgrade-a-user-to-be-a-superuser

Comments 0 total

    Add comment