Beginner Tutorial: Starting DolphinScheduler with External PostgreSQL and Zookeeper

Beginner Tutorial: Starting DolphinScheduler with External PostgreSQL and Zookeeper

Publish Date: Jul 31
2 0

tutorials-5238355\_1280

This article will guide you step-by-step on how to start Apache DolphinScheduler using external PostgreSQL and Zookeeper. Whether you're a beginner or an experienced developer, you can easily follow these steps to complete the installation and configuration in a Linux/Unix environment. In addition to the standard installation steps, we also share some cluster deployment tips to help you scale your system with ease.

Of course, if you encounter issues such as database connections, Zookeeper connections, or service startup problems, don't worry—this tutorial includes detailed troubleshooting steps to help you resolve them quickly.

System Requirements

  • Operating System: Linux/Unix (CentOS 7+ or Ubuntu 16.04+ recommended)
  • Java Environment: JDK 1.8+
  • Database: PostgreSQL 9.6+
  • Distributed Coordination Service: Zookeeper 3.4.6+
  • Memory: At least 4GB recommended
  • Disk Space: At least 10GB recommended

Preparations

  1. Install and Configure PostgreSQL
# Install PostgreSQL (CentOS example)
sudo yum install -y postgresql-server postgresql-contrib

# Initialize the database
sudo postgresql-setup initdb

# Start the service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Create DolphinScheduler database and user
sudo -u postgres psql -c "CREATE USER dolphinscheduler WITH PASSWORD 'yourpassword';"
sudo -u postgres psql -c "CREATE DATABASE dolphinscheduler OWNER dolphinscheduler;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE dolphinscheduler TO dolphinscheduler;"

# Modify pg_hba.conf
sudo vi /var/lib/pgsql/data/pg_hba.conf
# Add or modify the following line:
host    all             all             0.0.0.0/0               md5

# Modify postgresql.conf
sudo vi /var/lib/pgsql/data/postgresql.conf
# Change listen_addresses to:
listen_addresses = '*'

# Restart PostgreSQL
sudo systemctl restart postgresql
Enter fullscreen mode Exit fullscreen mode
  1. Install and Configure Zookeeper
# Download Zookeeper
wget https://downloads.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -xzf apache-zookeeper-3.7.1-bin.tar.gz
mv apache-zookeeper-3.7.1-bin /opt/zookeeper

# Configure Zookeeper
cd /opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
# Set data directory and server configuration (if clustered)
dataDir=/opt/zookeeper/data
# No need to change server settings for standalone mode

# Create data directory
mkdir /opt/zookeeper/data

# Start Zookeeper
/opt/zookeeper/bin/zkServer.sh start
Enter fullscreen mode Exit fullscreen mode

Install and Configure DolphinScheduler 3.1.9

  1. Download and Extract
wget https://downloads.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz
tar -xzf apache-dolphinscheduler-3.1.9-bin.tar.gz
mv apache-dolphinscheduler-3.1.9-bin /opt/dolphinscheduler
Enter fullscreen mode Exit fullscreen mode
  1. Modify Configuration Files Edit common.properties
vi /opt/dolphinscheduler/conf/common.properties
Enter fullscreen mode Exit fullscreen mode

Make the following changes:

# Database config
spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://your-postgresql-server:5432/dolphinscheduler
spring.datasource.username=dolphinscheduler
spring.datasource.password=yourpassword

# Zookeeper config
registry.plugin.name=zookeeper
registry.plugin.type=zookeeper
registry.servers=your-zookeeper-server:2181
Enter fullscreen mode Exit fullscreen mode

Optional: Modify environment variables

vi /opt/dolphinscheduler/conf/env/dolphinscheduler_env.sh
Enter fullscreen mode Exit fullscreen mode

Add or update Java environment variables:

export JAVA_HOME=/usr/java/jdk1.8.0_291
export PATH=$JAVA_HOME/bin:$PATH
Enter fullscreen mode Exit fullscreen mode
  1. Initialize the Database
/opt/dolphinscheduler/script/create-dolphinscheduler.sh
Enter fullscreen mode Exit fullscreen mode
  1. Start Services Start Master Server
/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start master-server
Enter fullscreen mode Exit fullscreen mode

Start Worker Server

/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start worker-server
Enter fullscreen mode Exit fullscreen mode

Start API Server

/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start api-server
Enter fullscreen mode Exit fullscreen mode

Start Alert Server

/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start alert-server
Enter fullscreen mode Exit fullscreen mode

Verify Installation

  1. Check process status:
ps -ef | grep dolphinscheduler
Enter fullscreen mode Exit fullscreen mode
  1. Access the Web UI: Default Port: 12345 Access URL: http://your-server-ip:12345/dolphinscheduler Default username/password: admin/dolphinscheduler123

Cluster Deployment Guide

Cluster Mode Deployment Steps

If you want to deploy in cluster mode, follow these steps:

  1. Deploy Worker Servers on Multiple Nodes

Node Requirements

  • Deploy Worker Servers on at least 3 nodes (odd number recommended)
  • Each node must have the same package version
  • Recommended server specs:

    • CPU: 4 cores or more
    • Memory: 8GB or more
    • Disk: 100GB+ (adjust based on data volume)

Example Deployment Plan

  • Node 1 (Primary): Master Server + Worker Server

    • IP: 192.168.1.101
    • Role: Master + Worker
  • Node 2 (Worker): Worker Server

    • IP: 192.168.1.102
    • Role: Worker
  • Node 3 (Worker): Worker Server

    • IP: 192.168.1.103
    • Role: Worker

Installation Notes

  1. Run the same installation script on all nodes

  2. Ensure the installation paths are consistent across nodes

  3. Verify network connectivity between nodes (use ping/telnet)

  4. Configure registry.servers

Detailed Configuration Steps

  1. Edit common.properties on all nodes
  • File path: /opt/your_app/conf/common.properties
    1. Set registry.servers to your Zookeeper cluster addresses
    2. Example format:
registry.servers=zk1:2181,zk2:2181,zk3:2181
Enter fullscreen mode Exit fullscreen mode

Configuration Verification

  1. Use zkCli.sh to verify Zookeeper config
./zkCli.sh -server zk1:2181
Enter fullscreen mode Exit fullscreen mode
  1. Check node registration:
ls /your_app/nodes
Enter fullscreen mode Exit fullscreen mode
  1. Time Synchronization Configuration

Detailed Time Sync Plan
All nodes must maintain time sync (within 1-second drift). Recommended steps:

NTP Setup

  1. Install NTP:
yum install -y ntp
Enter fullscreen mode Exit fullscreen mode
  1. Sync with NTP server (Aliyun example):
ntpdate ntp.aliyun.com
Enter fullscreen mode Exit fullscreen mode
  1. Set auto-sync:
# Enable at startup
systemctl enable ntpd
# Start service
systemctl start ntpd
Enter fullscreen mode Exit fullscreen mode
  1. Verify sync:
ntpq -p
date
Enter fullscreen mode Exit fullscreen mode

Alternative Time Sync Option
If external NTP server is inaccessible, set up an internal time server:

  1. Designate one server as the time source
  2. Sync all other nodes with that server
  3. Example config:
ntpdate 192.168.1.100
Enter fullscreen mode Exit fullscreen mode

Time Sync Notes

  • Recommended to set up a crontab job for periodic sync:
*/5 * * * * /usr/sbin/ntpdate ntp.aliyun.com >/dev/null 2>&1
Enter fullscreen mode Exit fullscreen mode
  • For systems sensitive to time (e.g., finance), maintain <100ms drift

Common Troubleshooting

Database Connection Issues

  1. PostgreSQL Remote Access Config
  • Check pg_hba.conf file and ensure it includes:
host    all             all             0.0.0.0/0               md5
Enter fullscreen mode Exit fullscreen mode
  • Restart PostgreSQL after changes
  1. Credential Verification
  • Test connection with psql:
psql -h [host] -U [username] -d [database]
Enter fullscreen mode Exit fullscreen mode
  • Ensure password is correct
  1. Firewall Check
  • Check if port 5432 is open:
firewall-cmd --list-all
Enter fullscreen mode Exit fullscreen mode
  • Open the port if needed:
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload
Enter fullscreen mode Exit fullscreen mode

Zookeeper Connection Issues

  1. Basic Connection Test
  • Use telnet:
telnet your-zookeeper-server 2181
Enter fullscreen mode Exit fullscreen mode
  • Should show: "Connected to your-zookeeper-server"
  1. Log Check
  • View Zookeeper logs:
tail -f /var/log/zookeeper/zookeeper.log
Enter fullscreen mode Exit fullscreen mode
  • Common issues:

    • Insufficient disk space
    • Low memory allocation
    • Improper cluster config

Service Startup Issues

  1. Log Analysis
  • Check main log file:
tail -n 100 /opt/dolphinscheduler/logs/dolphinscheduler-api.log
Enter fullscreen mode Exit fullscreen mode
  • Check other component logs:
/opt/dolphinscheduler/logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-api-server.log
├── dolphinscheduler-master-server.log
└── dolphinscheduler-worker-server.log
Enter fullscreen mode Exit fullscreen mode
  1. Java Environment Check
  • Verify Java version:
java -version
Enter fullscreen mode Exit fullscreen mode
- Requirement: JDK 1.8+
Enter fullscreen mode Exit fullscreen mode
  • Check JAVA_HOME:
echo $JAVA_HOME
Enter fullscreen mode Exit fullscreen mode
  • Check memory settings:
jmap -heap <pid>
Enter fullscreen mode Exit fullscreen mode
  1. Port Conflict Check
  • Check port usage:
netstat -tunlp | grep [port]
Enter fullscreen mode Exit fullscreen mode
  • Default ports:

    • Master Server: 5678
    • Worker Server: 1234
    • API Server: 12345

Comments 0 total

    Add comment