Introduction
ActiveMQ master-slave provides high availability where one broker is active (master) and others standby (slaves). When failover doesn't work properly, clients can't connect to the new master, or split-brain scenarios occur where multiple brokers claim master status.
Symptoms
Failover not triggered:
```bash $ tail -f /var/log/activemq/activemq.log
[ERROR] Master broker failed but slave did not take over [WARN] Slave broker waiting for lock, but master unreachable ```
Split-brain:
[WARN] Multiple brokers claiming master status
[ERROR] Network partition detected: broker-1 and broker-2 both masterClient connection failure:
[ERROR] Failover transport failed to connect to any broker
[WARN] Client blocked waiting for broker to become available
javax.jms.JMSException: Failover timeout elapsedCommon Causes
- 1.Shared storage unavailable - NFS, SAN, or database not accessible
- 2.Network partition - Network issue isolates master from slaves
- 3.Lock not acquired - Slave can't get database/file lock
- 4.Failover delay too long - Slave takes too long to detect failure
- 5.Client failover URL wrong - Clients not configured for failover
- 6.Split-brain protection disabled - No fencing mechanism
Step-by-Step Fix
Step 1: Check Master/Slave Status
```bash # Check broker status via JMX curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=localhost/MasterBroker
# Check network connectors curl -u admin:admin http://localhost:8161/api/jolokia/list/org.apache.activemq:type=Broker,brokerName=localhost,connector=networkConnectors
# Check all brokers in cluster for broker in broker1 broker2 broker3; do curl -u admin:admin http://${broker}:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=${broker}/MasterBroker done
# View broker logs grep -i "master|slave|lock" /var/log/activemq/activemq.log | tail -50 ```
Step 2: Check Shared Storage
```bash # For shared filesystem (KahaDB on NFS/SAN) # Check mount status mount | grep activemq-store
# Test write access touch /var/lib/activemq-shared/test && rm /var/lib/activemq-shared/test
# Check NFS connectivity showmount -e nfs-server
# For JDBC master-slave # Check database connectivity mysql -h db-server -u activemq -p -e "SELECT 1;"
# Check lock table mysql -h db-server -u activemq -p -e "SELECT * FROM activemq.ACTIVEMQ_LOCK;"
# Check if lock is held # TIME column shows when lock was acquired ```
Step 3: Configure Lock Acquisition
```xml <!-- For shared filesystem master-slave --> <persistenceAdapter> <kahaDB directory="/var/lib/activemq-shared/kahadb" lockKeepAlivePeriod="5000" lockAcquireSleepInterval="10000"/> </persistenceAdapter>
<!-- For JDBC master-slave --> <persistenceAdapter> <jdbcPersistenceAdapter dataSource="#mysql-ds" lockKeepAlivePeriod="5000" lockAcquireSleepInterval="10000"> <locker> <databaseLocker lockAcquireSleepInterval="10000"/> </locker> </jdbcPersistenceAdapter> </persistenceAdapter>
<!-- Key parameters --> <!-- lockKeepAlivePeriod: Lock renewal interval --> <!-- lockAcquireSleepInterval: Retry interval when lock held --> ```
Step 4: Configure Network Failover
```xml <!-- For network connector failover --> <networkConnectors> <networkConnector name="cluster" uri="static:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)" duplex="true" decreaseNetworkConsumerPriority="true" networkTTL="3" conduitSubscriptions="true" dynamicOnly="true"/> </networkConnectors>
<!-- Key network connector settings --> <!-- duplex: Both send and receive messages --> <!-- networkTTL: Number of brokers to propagate --> <!-- dynamicOnly: Only forward when consumers exist --> ```
Step 5: Configure Client Failover URL
```java // Configure failover transport in clients ConnectionFactory factory = new ActiveMQConnectionFactory( "failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)" );
// Recommended failover options ConnectionFactory factory = new ActiveMQConnectionFactory( "failover:(tcp://broker1:61616,tcp://broker2:61616)?" + "randomize=false&" + // Try brokers in order "priorityBackup=true&" + // Prefer broker1 "timeout=5000&" + // Max wait for connection "maxReconnectAttempts=10&" +// Retry limit "initialReconnectDelay=1000&" + // First retry delay "maxReconnectDelay=30000" // Max retry delay );
// For priority backup (master preferred) "failover:(tcp://master:61616,tcp://slave:61616)?priorityBackup=true&priorityURIs=tcp://master:61616" ```
Step 6: Test Failover Detection
```bash # Simulate master failure # Stop master broker systemctl stop activemq@master
# Watch slave logs tail -f /var/log/activemq/slave.log | grep -i "master|lock|takeover"
# Expected: # [INFO] Lost connection to master # [INFO] Attempting to acquire lock # [INFO] Acquired lock, becoming master
# Check slave becomes master curl -u admin:admin http://slave:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=slave/MasterBroker
# Should return true
# Test client reconnection # Client should reconnect to new master automatically ```
Step 7: Configure Split-Brain Prevention
```xml <!-- Enable split-brain detection --> <broker brokerName="broker1" shutdownOnSlaveStop="true" shutdownOnMasterFailure="true" useShutdownHook="true">
<persistenceAdapter> <kahaDB directory="/var/lib/activemq-shared/kahadb" checkForCorruptedJournalFiles="true" ignoreCorruptedJournalFiles="false"/> </persistenceAdapter> </broker>
<!-- shutdownOnSlaveStop: Master shuts down if slave stops --> <!-- shutdownOnMasterFailure: Slave shuts down if can't reach master -->
<!-- For JDBC, use lease-based locker for explicit timeout --> <locker> <lease-databaseLocker leaseExpirationTime="60000"/> </locker> ```
Step 8: Handle Network Partition
```bash # If network partition detected # Check network connectivity ping master-broker ping slave-broker
# Check network routes traceroute master-broker
# Check firewall rules iptables -L -n | grep 61616
# If partition confirmed: # 1. Stop all brokers to prevent split-brain systemctl stop activemq
# 2. Verify network restored ping master-broker && ping slave-broker
# 3. Clear lock (if JDBC) mysql -u activemq -p -e "DELETE FROM activemq.ACTIVEMQ_LOCK;"
# 4. Start designated master first systemctl start activemq@master
# Wait for lock acquisition tail -f /var/log/activemq/master.log | grep "Acquired lock"
# 5. Start slaves systemctl start activemq@slave ```
Step 9: Monitor Failover Health
```bash # Set up monitoring for failover status # Check master/slave status periodically curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=*/MasterBroker
# Monitor lock renewals grep -c "Renewed lock" /var/log/activemq/activemq.log
# Alert if lock renewals stop # (Indicates connection issue)
# Check network connector status curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,connector=networkConnectors,networkConnectorName=cluster/ConnectedBrokers
# Monitor client connections curl -u admin:admin http://localhost:8161/api/jolokia/read/org.apache.activemq:type=Broker,connector=clientConnectors/ClientCount ```
Step 10: Configure Automatic Restart
```bash # Configure systemd for automatic restart # In /etc/systemd/system/activemq.service: [Unit] Description=Apache ActiveMQ After=network.target
[Service] Type=forking User=activemq ExecStart=/opt/activemq/bin/activemq start ExecStop=/opt/activemq/bin/activemq stop Restart=on-failure RestartSec=10
[Install] WantedBy=multi-user.target
# Apply changes systemctl daemon-reload systemctl enable activemq
# This ensures broker restarts after failure ```
Failover Configuration Settings
| Setting | Default | Recommended |
|---|---|---|
| lockKeepAlivePeriod | 30000ms | 5000-10000ms |
| lockAcquireSleepInterval | 10000ms | 5000-10000ms |
| failover timeout | None | 5000ms |
| maxReconnectAttempts | -1 (infinite) | 10-100 |
| initialReconnectDelay | 10ms | 1000ms |
Verification
```bash # After configuration changes # Restart all brokers in sequence # Start master first systemctl restart activemq@master sleep 10
# Start slaves systemctl restart activemq@slave
# Verify master status curl -u admin:admin http://master:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=master/MasterBroker # Should return true
# Verify slave status curl -u admin:admin http://slave:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=slave/MasterBroker # Should return false
# Test failover systemctl stop activemq@master sleep 15 curl -u admin:admin http://slave:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=slave/MasterBroker # Should now return true
# Test client connectivity activemq-admin send --queue test.queue --message "test" --brokerUrl "failover:(tcp://master:61616,tcp://slave:61616)" ```
Prevention
To prevent ActiveMQ master slave failover issues from recurring, implement these proactive measures:
1. Monitor Failover Status
groups:
- name: activemq-ha
rules:
- alert: ActiveMQMasterSlaveSplit
expr: |
activemq_master_count != 1
for: 1m
labels:
severity: critical
annotations:
summary: "ActiveMQ master/slave split-brain detected"2. Use Shared Storage for LevelDB
```xml <!-- activemq.xml - Shared storage for LevelDB --> <persistenceAdapter> <levelDB directory="/shared-storage/activemq/leveldb"/> </persistenceAdapter>
<!-- Mount shared storage on all brokers --> <!-- /shared-storage should be NFS or shared disk --> ```
3. Test Failover Regularly
```bash # Monthly failover test script cat << 'EOF' > /usr/local/bin/test_activemq_failover.sh #!/bin/bash echo "Testing ActiveMQ failover..."
# Find current master MASTER=$(curl -s -u admin:admin http://activemq1:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=localhost/MasterBroker)
if [ "$MASTER" == "true" ]; then echo "Master on activemq1" # Stop master systemctl stop activemq1 sleep 10 # Check if slave took over SLAVE=$(curl -s -u admin:admin http://activemq2:8161/api/jolokia/read/org.apache.activemq:type=Broker,brokerName=localhost/MasterBroker) if [ "$SLAVE" == "true" ]; then echo "PASS: Failover successful" else echo "FAIL: Slave did not become master" fi # Restore original master systemctl start activemq1 fi EOF
chmod +x /usr/local/bin/test_activemq_failover.sh ```
Best Practices Checklist
- [ ] Monitor master/slave status
- [ ] Use shared storage for persistence
- [ ] Test failover monthly
- [ ] Configure proper network timeouts
- [ ] Document failover procedures
- [ ] Monitor client reconnection
Related Issues
- [Fix ActiveMQ JDBC Lock Expired](/articles/fix-activemq-jdbc-lock-expired)
- [Fix ActiveMQ Kaha PIndex Corrupted](/articles/fix-activemq-kaha-pindex-corrupted)
- [Fix Network Partition Detected](/articles/fix-network-partition-detected)
Related Articles
- [Fix Fix Activemq Broker Down Issue in Messaging](fix-activemq-broker-down)
- [Fix ActiveMQ Destination Full](fix-activemq-destination-full)
- [Fix ActiveMQ JDBC Lock Expired](fix-activemq-jdbc-lock-expired)
- [Fix ActiveMQ Kaha PIndex Corrupted](fix-activemq-kaha-pindex-corrupted)
- [Fix ActiveMQ Slow Consumer](fix-activemq-slow-consumer)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix ActiveMQ Master Slave Failover", "description": "Troubleshoot ActiveMQ master-slave failover issues. Configure shared storage, network connectivity, and failover detection.", "url": "https://www.fixwikihub.com/fix-activemq-master-slave-failover", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-03T19:21:58.635Z", "dateModified": "2026-04-03T19:21:58.635Z" } </script>