Introduction

ActiveMQ master-slave provides high availability where one broker is active (master) and others standby (slaves). When failover doesn't work properly, clients can't connect to the new master, or split-brain scenarios occur where multiple brokers claim master status.

Symptoms

Failover not triggered:

```bash $ tail -f /var/log/activemq/activemq.log

[ERROR] Master broker failed but slave did not take over [WARN] Slave broker waiting for lock, but master unreachable ```

Split-brain:

bash
[WARN] Multiple brokers claiming master status
[ERROR] Network partition detected: broker-1 and broker-2 both master

Client connection failure:

bash
[ERROR] Failover transport failed to connect to any broker
[WARN] Client blocked waiting for broker to become available
javax.jms.JMSException: Failover timeout elapsed

Common Causes

  1. 1.Shared storage unavailable - NFS, SAN, or database not accessible
  2. 2.Network partition - Network issue isolates master from slaves
  3. 3.Lock not acquired - Slave can't get database/file lock
  4. 4.Failover delay too long - Slave takes too long to detect failure
  5. 5.Client failover URL wrong - Clients not configured for failover
  6. 6.Split-brain protection disabled - No fencing mechanism

Step-by-Step Fix

Step 1: Check Master/Slave Status

```bash # Check broker status via JMX curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=localhost/MasterBroker

# Check network connectors curl -u admin:admin http://localhost:8161/api/jolokia/list/org.apache.activemq:type=Broker,brokerName=localhost,connector=networkConnectors

# Check all brokers in cluster for broker in broker1 broker2 broker3; do curl -u admin:admin http://${broker}:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=${broker}/MasterBroker done

# View broker logs grep -i "master|slave|lock" /var/log/activemq/activemq.log | tail -50 ```

Step 2: Check Shared Storage

```bash # For shared filesystem (KahaDB on NFS/SAN) # Check mount status mount | grep activemq-store

# Test write access touch /var/lib/activemq-shared/test && rm /var/lib/activemq-shared/test

# Check NFS connectivity showmount -e nfs-server

# For JDBC master-slave # Check database connectivity mysql -h db-server -u activemq -p -e "SELECT 1;"

# Check lock table mysql -h db-server -u activemq -p -e "SELECT * FROM activemq.ACTIVEMQ_LOCK;"

# Check if lock is held # TIME column shows when lock was acquired ```

Step 3: Configure Lock Acquisition

```xml <!-- For shared filesystem master-slave --> <persistenceAdapter> <kahaDB directory="/var/lib/activemq-shared/kahadb" lockKeepAlivePeriod="5000" lockAcquireSleepInterval="10000"/> </persistenceAdapter>

<!-- For JDBC master-slave --> <persistenceAdapter> <jdbcPersistenceAdapter dataSource="#mysql-ds" lockKeepAlivePeriod="5000" lockAcquireSleepInterval="10000"> <locker> <databaseLocker lockAcquireSleepInterval="10000"/> </locker> </jdbcPersistenceAdapter> </persistenceAdapter>

<!-- Key parameters --> <!-- lockKeepAlivePeriod: Lock renewal interval --> <!-- lockAcquireSleepInterval: Retry interval when lock held --> ```

Step 4: Configure Network Failover

```xml <!-- For network connector failover --> <networkConnectors> <networkConnector name="cluster" uri="static:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)" duplex="true" decreaseNetworkConsumerPriority="true" networkTTL="3" conduitSubscriptions="true" dynamicOnly="true"/> </networkConnectors>

<!-- Key network connector settings --> <!-- duplex: Both send and receive messages --> <!-- networkTTL: Number of brokers to propagate --> <!-- dynamicOnly: Only forward when consumers exist --> ```

Step 5: Configure Client Failover URL

```java // Configure failover transport in clients ConnectionFactory factory = new ActiveMQConnectionFactory( "failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)" );

// Recommended failover options ConnectionFactory factory = new ActiveMQConnectionFactory( "failover:(tcp://broker1:61616,tcp://broker2:61616)?" + "randomize=false&" + // Try brokers in order "priorityBackup=true&" + // Prefer broker1 "timeout=5000&" + // Max wait for connection "maxReconnectAttempts=10&" +// Retry limit "initialReconnectDelay=1000&" + // First retry delay "maxReconnectDelay=30000" // Max retry delay );

// For priority backup (master preferred) "failover:(tcp://master:61616,tcp://slave:61616)?priorityBackup=true&priorityURIs=tcp://master:61616" ```

Step 6: Test Failover Detection

```bash # Simulate master failure # Stop master broker systemctl stop activemq@master

# Watch slave logs tail -f /var/log/activemq/slave.log | grep -i "master|lock|takeover"

# Expected: # [INFO] Lost connection to master # [INFO] Attempting to acquire lock # [INFO] Acquired lock, becoming master

# Check slave becomes master curl -u admin:admin http://slave:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=slave/MasterBroker

# Should return true

# Test client reconnection # Client should reconnect to new master automatically ```

Step 7: Configure Split-Brain Prevention

```xml <!-- Enable split-brain detection --> <broker brokerName="broker1" shutdownOnSlaveStop="true" shutdownOnMasterFailure="true" useShutdownHook="true">

<persistenceAdapter> <kahaDB directory="/var/lib/activemq-shared/kahadb" checkForCorruptedJournalFiles="true" ignoreCorruptedJournalFiles="false"/> </persistenceAdapter> </broker>

<!-- shutdownOnSlaveStop: Master shuts down if slave stops --> <!-- shutdownOnMasterFailure: Slave shuts down if can't reach master -->

<!-- For JDBC, use lease-based locker for explicit timeout --> <locker> <lease-databaseLocker leaseExpirationTime="60000"/> </locker> ```

Step 8: Handle Network Partition

```bash # If network partition detected # Check network connectivity ping master-broker ping slave-broker

# Check network routes traceroute master-broker

# Check firewall rules iptables -L -n | grep 61616

# If partition confirmed: # 1. Stop all brokers to prevent split-brain systemctl stop activemq

# 2. Verify network restored ping master-broker && ping slave-broker

# 3. Clear lock (if JDBC) mysql -u activemq -p -e "DELETE FROM activemq.ACTIVEMQ_LOCK;"

# 4. Start designated master first systemctl start activemq@master

# Wait for lock acquisition tail -f /var/log/activemq/master.log | grep "Acquired lock"

# 5. Start slaves systemctl start activemq@slave ```

Step 9: Monitor Failover Health

```bash # Set up monitoring for failover status # Check master/slave status periodically curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=*/MasterBroker

# Monitor lock renewals grep -c "Renewed lock" /var/log/activemq/activemq.log

# Alert if lock renewals stop # (Indicates connection issue)

# Check network connector status curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,connector=networkConnectors,networkConnectorName=cluster/ConnectedBrokers

# Monitor client connections curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,connector=clientConnectors/ClientCount ```

Step 10: Configure Automatic Restart

```bash # Configure systemd for automatic restart # In /etc/systemd/system/activemq.service: [Unit] Description=Apache ActiveMQ After=network.target

[Service] Type=forking User=activemq ExecStart=/opt/activemq/bin/activemq start ExecStop=/opt/activemq/bin/activemq stop Restart=on-failure RestartSec=10

[Install] WantedBy=multi-user.target

# Apply changes systemctl daemon-reload systemctl enable activemq

# This ensures broker restarts after failure ```

Failover Configuration Settings

SettingDefaultRecommended
lockKeepAlivePeriod30000ms5000-10000ms
lockAcquireSleepInterval10000ms5000-10000ms
failover timeoutNone5000ms
maxReconnectAttempts-1 (infinite)10-100
initialReconnectDelay10ms1000ms

Verification

```bash # After configuration changes # Restart all brokers in sequence # Start master first systemctl restart activemq@master sleep 10

# Start slaves systemctl restart activemq@slave

# Verify master status curl -u admin:admin http://master:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=master/MasterBroker # Should return true

# Verify slave status curl -u admin:admin http://slave:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=slave/MasterBroker # Should return false

# Test failover systemctl stop activemq@master sleep 15 curl -u admin:admin http://slave:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=slave/MasterBroker # Should now return true

# Test client connectivity activemq-admin send --queue test.queue --message "test" --brokerUrl "failover:(tcp://master:61616,tcp://slave:61616)" ```

Prevention

To prevent ActiveMQ master slave failover issues from recurring, implement these proactive measures:

1. Monitor Failover Status

yaml
groups:
- name: activemq-ha
  rules:
  - alert: ActiveMQMasterSlaveSplit
    expr: |
      activemq_master_count != 1
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "ActiveMQ master/slave split-brain detected"

2. Use Shared Storage for LevelDB

```xml <!-- activemq.xml - Shared storage for LevelDB --> <persistenceAdapter> <levelDB directory="/shared-storage/activemq/leveldb"/> </persistenceAdapter>

<!-- Mount shared storage on all brokers --> <!-- /shared-storage should be NFS or shared disk --> ```

3. Test Failover Regularly

```bash # Monthly failover test script cat << 'EOF' > /usr/local/bin/test_activemq_failover.sh #!/bin/bash echo "Testing ActiveMQ failover..."

# Find current master MASTER=$(curl -s -u admin:admin http://activemq1:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=localhost/MasterBroker)

if [ "$MASTER" == "true" ]; then echo "Master on activemq1" # Stop master systemctl stop activemq1 sleep 10 # Check if slave took over SLAVE=$(curl -s -u admin:admin http://activemq2:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=localhost/MasterBroker) if [ "$SLAVE" == "true" ]; then echo "PASS: Failover successful" else echo "FAIL: Slave did not become master" fi # Restore original master systemctl start activemq1 fi EOF

chmod +x /usr/local/bin/test_activemq_failover.sh ```

Best Practices Checklist

  • [ ] Monitor master/slave status
  • [ ] Use shared storage for persistence
  • [ ] Test failover monthly
  • [ ] Configure proper network timeouts
  • [ ] Document failover procedures
  • [ ] Monitor client reconnection
  • [Fix ActiveMQ JDBC Lock Expired](/articles/fix-activemq-jdbc-lock-expired)
  • [Fix ActiveMQ Kaha PIndex Corrupted](/articles/fix-activemq-kaha-pindex-corrupted)
  • [Fix Network Partition Detected](/articles/fix-network-partition-detected)
  • [Fix Fix Activemq Broker Down Issue in Messaging](fix-activemq-broker-down)
  • [Fix ActiveMQ Destination Full](fix-activemq-destination-full)
  • [Fix ActiveMQ JDBC Lock Expired](fix-activemq-jdbc-lock-expired)
  • [Fix ActiveMQ Kaha PIndex Corrupted](fix-activemq-kaha-pindex-corrupted)
  • [Fix ActiveMQ Slow Consumer](fix-activemq-slow-consumer)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix ActiveMQ Master Slave Failover", "description": "Troubleshoot ActiveMQ master-slave failover issues. Configure shared storage, network connectivity, and failover detection.", "url": "https://www.fixwikihub.com/fix-activemq-master-slave-failover", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-03T19:21:58.635Z", "dateModified": "2026-04-03T19:21:58.635Z" } </script>