Airflow ETL Setup Guide (Debian/Linux)

This guide provides a condensed list of commands to set up a stable Apache Airflow environment on a Debian-based system (Proxmox LXC, VM, or Azure Linux Container).

Note: On Linux, the standard psycopg2-binary driver is stable and does not require the workarounds used on macOS.

On Ansible environment

ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
        ssh-copy-id ansible_admin@<container_ip_address>

Edit your hosts file (usually /etc/ansible/hosts or a local inventory.yml):

airflow_servers:
        hosts:
        airflow_container:
        airflow_host: <container_ip_address>
        airflow_user: ansible_admin
        airflow_python_interpreter: /usr/bin/python3

On Airflow environment

1. System Dependencies

Install Python environment tools and PostgreSQL C-headers.

sudo apt update && sudo apt install -y python3-venv python3-pip libpq-dev postgresql postgresql-contrib

2. Database Setup

Create the metadata database and a mock source database for testing.

# Enter Postgres as the superuser
sudo -u postgres psql <<EOF
CREATE USER airflow_admin WITH PASSWORD 'airflow';
CREATE DATABASE airflow_db OWNER airflow_admin;
CREATE DATABASE mock_database OWNER airflow_admin;
ALTER ROLE airflow_admin CONNECTION LIMIT -1;
EOF

3. Python Virtual Environment

Create a clean environment and install Airflow with the Postgres provider and the standard driver.

python3 -m venv ~/airflow_venv
source ~/airflow_venv/bin/activate

# Install Airflow, Postgres Provider, and Psycopg2 Driver
pip install "apache-airflow[postgres]==2.10.0" psycopg2-binary \
  --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.0/constraints-3.12.txt"

4. Environment Configuration

Define the connection string. Add these to ~/.bashrc to make them permanent.

export AIRFLOW_HOME=~/airflow
export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN="postgresql+psycopg2://airflow_admin:airflow@localhost:5432/airflow_db"

# Initialize the metadata database
airflow db migrate

5. Create Admin User

airflow users create \
    --username admin \
    --firstname Saif \
    --lastname Uddin \
    --role Admin \
    --email admin@example.com

6. Populate Mock Source Data

psql -d mock_database -h localhost -U airflow_admin <<EOF
CREATE TABLE IF NOT EXISTS source_orders (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    order_total NUMERIC,
    order_date DATE
);

INSERT INTO source_orders (order_id, customer_name, order_total, order_date) VALUES
(1, 'Alice Smith', 150.00, '2026-04-10'),
(2, 'Bob Jones', 85.50, '2026-04-11'),
(3, 'Charlie Brown', 210.25, '2026-04-12')
ON CONFLICT DO NOTHING;
EOF

7. Running Airflow

Open two terminal sessions (or use tmux).

# Session 1: The UI
airflow webserver --port 8080

# Session 2: The Orchestrator
airflow scheduler

8. Airflow UI Connections

Log in at http://<server_ip>:8080 and navigate to Admin > Connections. Add: