Introduction

Consul snapshot backup fails due to Raft issues, insufficient permissions, or storage problems. Snapshots are critical for disaster recovery and cluster restoration.

Symptoms

Snapshot save failure:

```bash $ consul snapshot save backup.snap

Error: failed to save snapshot: raft: no leader ```

Permission denied:

```bash $ consul snapshot save backup.snap

Error: failed to save snapshot: Permission denied ```

Write error:

```bash $ consul snapshot save /backup/consul.snap

Error: failed to save snapshot: write error: no space left on device ```

Common Causes

  1. 1.No Raft leader - Leader required for snapshot
  2. 2.ACL restrictions - Token lacks snapshot permission
  3. 3.Disk space - No room for snapshot file
  4. 4.Quorum issues - Insufficient servers for consensus
  5. 5.Large state - KV store too large for timeout
  6. 6.Network issues - Cannot reach leader

Step-by-Step Fix

Step 1: Check Leader Status

```bash # Check for leader consul operator raft list-peers

# Output: # Node ID Address State Voter # consul-1 xxx 10.0.0.1:8300 leader true

# If no leader: consul members

# Check server count consul members | grep server | wc -l

# Need majority for leader election # 3 servers: need 2 alive # 5 servers: need 3 alive

# If servers down, restore quorum first: consul operator raft remove-peer -peer-id=<failed-id> consul operator raft add-peer -address=<new-server>:8300

# Retry snapshot after leader elected consul snapshot save backup.snap ```

Step 2: Check ACL Permissions

```bash # Check if ACLs enabled consul acl policy list

# Check token permissions consul acl token read -id=<your-token>

# Required policy for snapshot: # In policy.hcl: snapshot { policy = "write" }

# Create snapshot policy: consul acl policy create -name=snapshot-policy -rules='snapshot { policy = "write" }'

# Create token with policy: consul acl token create -policy-name=snapshot-policy -description="Backup token"

# Use management token for snapshots: consul snapshot save backup.snap -token=<management-token>

# Or set default token in config: # In consul.hcl: acl { tokens { default = "<management-token>" } } ```

Step 3: Check Disk Space

```bash # Check disk space for snapshot location df -h /backup

# Consul snapshots can be large: # - KV store data # - ACL policies # - Prepared queries # - Event history

# Check Consul data size: du -sh /opt/consul/data

# Estimate snapshot size: consul kv get -recurse | wc -l # Each KV entry contributes to snapshot size

# Create backup directory with enough space: mkdir -p /backup/consul df -h /backup/consul

# Use remote storage: consul snapshot save -remote=/backup/consul/snapshot.snap

# Or stream to remote server: consul snapshot save | ssh backup-server "cat > /backup/consul.snap" ```

Step 4: Check Server Health

```bash # Check all servers alive consul members

# Output: # Node Status # consul-1 alive # consul-2 alive # consul-3 alive

# Check server load ssh consul-1 "top -b -n 1 | head"

# Check Consul process on each server ssh consul-1 "ps aux | grep consul"

# Check Raft state on each server consul operator raft list-peers

# If one server overloaded, leader may not respond: # Check that leader is responsive: curl http://<leader>:8500/v1/status/leader

# Restart overloaded server: ssh consul-2 "systemctl restart consul" ```

Step 5: Increase Snapshot Timeout

```bash # Default timeout may be too short for large KV stores

# Check current timeout: consul snapshot save -timeout=30s backup.snap

# Increase timeout: consul snapshot save -timeout=120s backup.snap

# Or via HTTP API with longer timeout: curl -X GET "http://localhost:8500/v1/snapshot" \ --max-time 120 \ --output backup.snap

# For very large clusters: consul snapshot save -timeout=300s backup.snap

# Verify snapshot size: ls -lh backup.snap ```

Step 6: Verify Snapshot Integrity

```bash # After successful snapshot, verify it's valid

# Check snapshot file size ls -lh backup.snap

# Empty file means failed snapshot: ls backup.snap # Should be > 0 bytes

# Test snapshot restore on test cluster: consul snapshot restore backup.snap

# Output: # Restored snapshot with index: xxx

# Verify data restored: consul kv get -recurse

# Check ACLs restored: consul acl policy list

# Check nodes restored: consul members

# Compare checksums: sha256sum backup.snap sha256sum backup.snap.previous ```

Step 7: Automate Snapshot Backups

```bash # Create backup script cat << 'EOF' > /usr/local/bin/consul-backup.sh #!/bin/bash

BACKUP_DIR="/backup/consul" DATE=$(date +%Y%m%d-%H%M%S) SNAPSHOT_FILE="${BACKUP_DIR}/consul-${DATE}.snap" TOKEN="${CONSUL_MANAGEMENT_TOKEN}"

# Check leader exists LEADER=$(consul operator raft list-peers 2>/dev/null | grep leader) if [ -z "$LEADER" ]; then echo "ERROR: No leader, cannot create snapshot" exit 1 fi

# Create snapshot consul snapshot save -token=${TOKEN} -timeout=120s ${SNAPSHOT_FILE}

if [ $? -eq 0 ]; then echo "Snapshot saved: ${SNAPSHOT_FILE}"

# Verify snapshot SIZE=$(stat -c%s ${SNAPSHOT_FILE}) if [ $SIZE -gt 0 ]; then echo "Snapshot valid: ${SIZE} bytes"

# Remove old backups (keep 7 days) find ${BACKUP_DIR} -name "*.snap" -mtime +7 -delete

# Copy to remote storage scp ${SNAPSHOT_FILE} backup-server:/backup/consul/ else echo "ERROR: Snapshot is empty" rm ${SNAPSHOT_FILE} exit 1 fi else echo "ERROR: Snapshot failed" exit 1 fi EOF

chmod +x /usr/local/bin/consul-backup.sh

# Schedule daily backup: cat << 'EOF' > /etc/cron.d/consul-backup 0 2 * * * root /usr/local/bin/consul-backup.sh >> /var/log/consul-backup.log 2>&1 EOF ```

Step 8: Handle Snapshot Restore

```bash # When restoring from snapshot:

# Stop all Consul servers first systemctl stop consul

# On each server, remove existing data rm -rf /opt/consul/data/*

# Restore snapshot on first server: consul snapshot restore backup.snap

# Output: # Restored snapshot

# Start first server as bootstrap: consul agent -bootstrap-expect=1 -server \ -data-dir=/opt/consul/data \ -bind=10.0.0.1

# Wait for leader election consul operator raft list-peers

# Start other servers with join: consul agent -server -join=10.0.0.1 \ -data-dir=/opt/consul/data \ -bind=10.0.0.2

# Verify restore: consul kv get -recurse consul members consul acl policy list ```

Step 9: Check Network Connectivity

```bash # Check connectivity to leader LEADER=$(curl -s http://localhost:8500/v1/status/leader) echo "Leader: $LEADER"

# Check network to leader ping ${LEADER%%:*}

# Check port 8500 (HTTP API) nc -zv ${LEADER%%:*} 8500

# Check port 8300 (Raft) nc -zv ${LEADER%%:*} 8300

# Check firewall iptables -L -n | grep 8500 iptables -L -n | grep 8300

# Allow API port for snapshot: iptables -I INPUT -p tcp --dport 8500 -j ACCEPT

# Test snapshot via API: curl -X GET "http://${LEADER}/v1/snapshot" --output test.snap

# Verify file ls -lh test.snap ```

Step 10: Monitor Backup Health

```bash # Create backup monitoring cat << 'EOF' > /usr/local/bin/check-consul-backup.sh #!/bin/bash

BACKUP_DIR="/backup/consul"

# Check last backup exists LAST_BACKUP=$(ls -t ${BACKUP_DIR}/*.snap 2>/dev/null | head -1)

if [ -z "$LAST_BACKUP" ]; then echo "WARNING: No backup found" exit 1 fi

# Check backup age BACKUP_AGE=$(( ($(date +%s) - $(stat -c%Y $LAST_BACKUP)) / 3600 )) if [ $BACKUP_AGE -gt 24 ]; then echo "WARNING: Last backup is ${BACKUP_AGE} hours old" fi

# Check backup size BACKUP_SIZE=$(stat -c%s $LAST_BACKUP) if [ $BACKUP_SIZE -lt 100 ]; then echo "ERROR: Backup too small: ${BACKUP_SIZE} bytes" exit 1 fi

echo "OK: Latest backup ${LAST_BACKUP}, size ${BACKUP_SIZE}, age ${BACKUP_AGE}h" EOF

chmod +x /usr/local/bin/check-consul-backup.sh

# Prometheus alert for backup: - alert: ConsulBackupMissing expr: consul_backup_age_hours > 24 for: 1h labels: severity: warning annotations: summary: "Consul snapshot backup missing or old" ```

Consul Snapshot Backup Checklist

CheckCommandExpected
Leader existsraft list-peersHas leader
ACL permissionsacl token readsnapshot:write
Disk spacedf -h> snapshot size
Snapshot filels -lh> 0 bytes
Restore testsnapshot restoreSuccess
Backup agestat -c%Y< 24 hours

Verification

```bash # After resolving snapshot issue

# 1. Create snapshot consul snapshot save backup.snap // Success! Snapshot saved

# 2. Verify file size ls -lh backup.snap // File exists with content

# 3. Test restore consul snapshot restore backup.snap // Restored successfully

# 4. Check backup schedule ls -la /backup/consul/*.snap // Recent backup exists

# 5. Verify leader stable consul operator raft list-peers // Leader present

# 6. Monitor backup logs tail /var/log/consul-backup.log // No errors ```

  • [Fix Consul KV Store Not Responding](/articles/fix-consul-kv-store-not-responding)
  • [Fix Consul Agent Not Starting](/articles/fix-consul-agent-not-starting)
  • [Fix Consul Service Not Registering](/articles/fix-consul-service-not-registering)
  • [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
  • [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
  • [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
  • [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
  • [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Consul Snapshot Backup Failed", "description": "Troubleshoot Consul snapshot backup failed. Check leader, permissions, storage.", "url": "https://www.fixwikihub.com/fix-consul-snapshot-backup-failed", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-05T13:31:24.597Z", "dateModified": "2026-04-05T13:31:24.597Z" } </script>