Many data engineers struggle to integrate various tools like Airflow, Celery, RabbitMQ, and PostgreSQL into a cohesive architecture. I’ve been there. While online tutorials helped me configure each tool individually, they rarely covered the practical integration challenges that inevitably arise.
That’s exactly why I wrote this article: to consolidate all the resources I used, fill in the missing pieces, and provide a working setup with common errors resolved — based on real experience.
Let’s get our hands dirty. But don’t worry — baby steps first 🍼
⚠️ Before We Start: A Quick Overview of the Stack
I won’t dive deep into each tool’s internal workings, just quick and simple definitions to get you on board:
- Airflow: A task scheduler for defining and executing workflows. By default, it uses the SequentialExecutor and SQLite.
- PostgreSQL: Recommended for production-grade Airflow setups instead of SQLite.
- Celery: A distributed task queue that handles asynchronous job execution. Required if you’re using CeleryExecutor in Airflow.
- RabbitMQ: A message broker used by Celery. While Redis is the default, we’ll be using RabbitMQ here.
🚀 Step-by-Step Setup
1. 🔧 Install Airflow and Essential Libraries
pip3 install apache-airflow[gcp,sentry,statsd]
2. 🗄️ Initialize Airflow Database
Navigate to the airflow directory and initialize the airflow database:
cd airflow
airflow db init
After initialization, you'll notice new files and directories in the airflow folder. Create a dags
folder where all future DAGs will be stored and accessed by Airflow components:
mkdir dags
3. 👤 Create an Airflow Admin User
airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@domain.com
Verify the user was created successfully:
airflow users list
Expected output:
id | username | email | first_name | last_name | roles
===+==========+=======================+=================+================+======
1 | admin | your_email@domain.com | your_first_name | your_last_name | Admin
🎉 You’re now an admin. Feels good, doesn’t it?
4. ▶️ Start Scheduler and Webserver (in separate terminals)
Start scheduler (Terminal 1):
airflow scheduler
In a new terminal (Terminal 2), activate the virtual environment and start the webserver:
source airflow_env/bin/activate
cd airflow
airflow webserver
Once both services are initialized, open your browser and navigate to http://localhost:8080/.
Port 8080 is the default port for Airflow. If it's occupied, modify the port number in the airflow.cfg
file.
If you see a jungle of random DAGs — yeah, Airflow comes with prebuilt DAGs that we don’t need. Feel free to explore the UI and experiment with these examples to understand how Airflow works.
🧹 Clean Up Example DAGs (Optional)
If you prefer a clean slate without example DAGs, you can disable them by setting the following variable in airflow.cfg
file:
AIRFLOW__CORE__LOAD_EXAMPLES=False
Creating Your First DAG
To create your first custom DAG, follow the detailed instructions at:
👉 How to Create First DAG in Airflow – GeeksForGeeks
And yes, once created, you’ll see it in the dashboard!
🧨 Now Airflow Works. You Happy??
Haha… thought that was it? Nah. The real fun begins now. Time to swap out the baby-defaults and give Airflow a proper backend and executor.
🗃️ From SQLite to PostgreSQL: No More Toy Databases.
SQLite got us off the ground, but we’re building an actual system now, we’re going PostgreSQL + LocalExecutor.
1. Install PostgreSQL
sudo apt install postgresql
2. Configure PostgreSQL for Airflow
Access the PostgreSQL shell and create the necessary database and user:
sudo -u postgres psql
Inside the prompt:
CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
🗂️ Update Airflow’s Config File
Update your airflow.cfg
file with the following changes:
-
Change the executor:
executor = LocalExecutor
-
Update the database connection:
sql_alchemy_conn = postgresql+psycopg2://airflow_user:airflow_pass@localhost:5432/airflow_db
If you created a new PostgreSQL user for Airflow, the default search_path should work without changes. However, if you're using an existing user with a custom search_path, update it:
ALTER USER airflow_user SET search_path = public;
📜 If Required: Update pg_hba.conf
(Because Sometimes It Just Doesn’t Work)
If needed, add the Airflow user to PostgreSQL's access control list:
- Open the
pg_hba.conf
file (typically in the PostgreSQL data directory) - Locate the “Local connections” or “IPv4 local connections” section (depending on the version of Postgres you are using)
- Add the following line:
host all airflow_user 127.0.0.1/32 md5
Then restart PostgreSQL:
sudo service postgresql restart
4. Re-initialize Airflow DB
Initialize your Airflow database with the new PostgreSQL backend:
airflow db init
Takes a few seconds, so… stretch, hydrate, or scream internally.
🛡️ Optional: Fix Permissions (Because PostgreSQL Likes Drama)
If Airflow starts whining about permissions, you start by:
Checking your PostgreSQL users:
SELECT usename FROM pg_user;
To grant privileges, you can either:
🔒 Keep it minimal:
GRANT CREATE, USAGE ON SCHEMA public TO airflow_user;
💥 Or go nuclear:
ALTER ROLE airflow_user WITH SUPERUSER;
(Use superuser only if you're lazy...)
👨💻 Recreate Admin User (Now for PostgreSQL Setup)
airflow users create \
--username admin \
--firstname FirstName \
--lastname LastName \
--role Admin \
--email dummy@xyz.com \
--password admin
# (The Fancy Way)
Boom. You’re the boss now.
✅ Run Airflow Again (Like a Legend)
In terminal 1:
airflow webserver
In terminal 2:
airflow scheduler
Now open http://localhost:8080 and enjoy the view 😌
Now Airflow is running with PostgreSQL and ready for Celery integration. Happy? Haha, now the real fun begins…
In the previous section, you mastered the Local Executor and achieved parallel task execution in Airflow. Now it's time to level up with the Celery Executor to build truly production-ready pipelines. But first, we need to set up our supporting cast: Celery, Flower (the web server for Celery), and RabbitMQ as our message broker.
Deep breath - Here we go!
🐰 Setting Up RabbitMQ — Your Message Broker
RabbitMQ is like the post office of your pipeline — receiving, queueing, and delivering messages between your Airflow components. Built with Erlang, it’s designed to handle large-scale message traffic with resilience.
📦 1. Install Erlang
- First, we need Erlang (because apparently rabbits speak Erlang):
sudo apt-get install erlang
- Perfect time to grab a cup of mint tea, while the installation does its magic... ☕.
🐇 2. Install RabbitMQ Server
sudo apt-get install rabbitmq-server
- Celebrate with a second cup, this time atay b chiba.
▶️ 3. Start the RabbitMQ Server
- Let’s boot it up and verify it’s alive:
sudo service rabbitmq-server start
service --status-all
- You should see rabbitmq-server running happily as a service. If it's not running, give it another gentle nudge!
🌐 4. Enable RabbitMQ Management Dashboard
- RabbitMQ comes with a sleek UI. Let’s unlock it:
sudo rabbitmq-plugins enable rabbitmq_management
- The default listening port for RabbitMQ is 15672 (memorize this number, you'll be seeing it a lot!)
- Navigate to http://localhost:15672/ for the dashboard
- Default credentials: username "guest" with password "guest" (security level: toddler-proof)
- Voilà! A dashboard to spy on your queues.
🔐 5. Create a Custom User and Virtual Host
Let’s create a user and a dedicated vhost for Airflow:
sudo rabbitmqctl add_user admin admin
sudo rabbitmqctl rabbitmqctl add_vhost airflowvhost
- Make our "admin" user the boss:
sudo rabbitmqctl set_user_tags admin administrator
- Grant permissions:
sudo rabbitmqctl set_permissions -p airflowvhost admin "." "." ". "
- Now you can log in with the "admin" credentials and feel like you actually know what you're doing!
🌱 Installing Celery & Flower
⚙️ 1. Install Celery and Flower
- Celery is your task executor, and Flower is your Celery UI garden 🌸
- Install both:
sudo pip3 install apache-airflow[celery]
- This will also pull in required dependencies. (Time for more Moroccan tea?)
🌼 2. Launch the Flower Dashboard
- Run Flower with:
airflow celery flower
- Default port: 5555 (another number for your collection!)
- Visit http://localhost:5555/ to see Flower in action
- If you see a clean interface, congratulations! If not... well, that's what the troubleshooting section is for 😅
🛠️ Airflow Configuration — Connect the Dots
- Now for the moment of truth - let's tell Airflow about its new friends. Open your
airflow.cfg
file and update these parameters:
executor = CeleryExecutor
[celery]
broker_url = amqp://admin:admin@localhost:5672/airflowvhost # By default, RabbitMQ will listen on port 5672 on all available interfaces
celery_result_backend = db+postgresql+psycopg2://admin:admin@localhost:5432/airflow_db # Port 5432 is dedicated to PostgreSQL connections
broker_connection_retry_on_startup = True
💡 To avoid task revocation due to idle timeouts, set the consumer timeout in either:
RabbitMQ config (/etc/rabbitmq/rabbitmq.conf
):
consumer_timeout = 31622400000
or directly in airflow.cfg
:
[celery_broker_transport_options]
consumer_timeout = 31622400000
🔄 Restart Services
Make sure all gears are turning:
- Restart PostgreSQL:
sudo service postgresql restart
- Start RabbitMQ:
sudo service rabbitmq-server start
- Initialize the Airflow DB:
airflow db init
✅ Final Checklist: System Up & Running
Time to see if all your hard work pays off! You'll need four terminals (yes, four - you're basically a command line DJ now 🎧):
Component | Command | URL |
---|---|---|
Webserver | airflow webserver |
localhost:8080 |
Scheduler | airflow scheduler |
– |
Celery Worker | airflow celery worker |
– |
Flower UI | airflow celery flower |
localhost:5555 |
RabbitMQ Dashboard | – | localhost:15672 |
Don’t forget to clean up old DAGs from the UI to let the scheduler re-scan everything afresh.
& That’s it.
📝 Bonus Tip
Always keep an eye on the celery.log
file.. it's your best friend when things go sideways (and they occasionally will, because that's just how distributed systems roll 🤷♂️).
Reference:
https://habr.com/ru/companies/neoflex/articles/736292/
https://medium.com/accredian/setting-up-apache-airflow-in-ubuntu-324cfcee1427
https://documentation.ubuntu.com/server/how-to/databases/install-postgresql/index.html
https://medium.com/coding-blocks/creating-user-database-and-adding-access-on-postgresql-8bfcd2f4a91e
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html
https://stackoverflow.com/questions/74390647/postgres-airflow-db-permission-denied-for-schema-public
https://stackoverflow.com/questions/10757431/postgres-upgrade-a-user-to-be-a-superuser