Introduction
Consul snapshot backup fails due to Raft issues, insufficient permissions, or storage problems. Snapshots are critical for disaster recovery and cluster restoration.
Symptoms
Snapshot save failure:
```bash $ consul snapshot save backup.snap
Error: failed to save snapshot: raft: no leader ```
Permission denied:
```bash $ consul snapshot save backup.snap
Error: failed to save snapshot: Permission denied ```
Write error:
```bash $ consul snapshot save /backup/consul.snap
Error: failed to save snapshot: write error: no space left on device ```
Common Causes
- 1.No Raft leader - Leader required for snapshot
- 2.ACL restrictions - Token lacks snapshot permission
- 3.Disk space - No room for snapshot file
- 4.Quorum issues - Insufficient servers for consensus
- 5.Large state - KV store too large for timeout
- 6.Network issues - Cannot reach leader
Step-by-Step Fix
Step 1: Check Leader Status
```bash # Check for leader consul operator raft list-peers
# Output: # Node ID Address State Voter # consul-1 xxx 10.0.0.1:8300 leader true
# If no leader: consul members
# Check server count consul members | grep server | wc -l
# Need majority for leader election # 3 servers: need 2 alive # 5 servers: need 3 alive
# If servers down, restore quorum first: consul operator raft remove-peer -peer-id=<failed-id> consul operator raft add-peer -address=<new-server>:8300
# Retry snapshot after leader elected consul snapshot save backup.snap ```
Step 2: Check ACL Permissions
```bash # Check if ACLs enabled consul acl policy list
# Check token permissions consul acl token read -id=<your-token>
# Required policy for snapshot: # In policy.hcl: snapshot { policy = "write" }
# Create snapshot policy: consul acl policy create -name=snapshot-policy -rules='snapshot { policy = "write" }'
# Create token with policy: consul acl token create -policy-name=snapshot-policy -description="Backup token"
# Use management token for snapshots: consul snapshot save backup.snap -token=<management-token>
# Or set default token in config: # In consul.hcl: acl { tokens { default = "<management-token>" } } ```
Step 3: Check Disk Space
```bash # Check disk space for snapshot location df -h /backup
# Consul snapshots can be large: # - KV store data # - ACL policies # - Prepared queries # - Event history
# Check Consul data size: du -sh /opt/consul/data
# Estimate snapshot size: consul kv get -recurse | wc -l # Each KV entry contributes to snapshot size
# Create backup directory with enough space: mkdir -p /backup/consul df -h /backup/consul
# Use remote storage: consul snapshot save -remote=/backup/consul/snapshot.snap
# Or stream to remote server: consul snapshot save | ssh backup-server "cat > /backup/consul.snap" ```
Step 4: Check Server Health
```bash # Check all servers alive consul members
# Output: # Node Status # consul-1 alive # consul-2 alive # consul-3 alive
# Check server load ssh consul-1 "top -b -n 1 | head"
# Check Consul process on each server ssh consul-1 "ps aux | grep consul"
# Check Raft state on each server consul operator raft list-peers
# If one server overloaded, leader may not respond: # Check that leader is responsive: curl http://<leader>:8500/v1/status/leader
# Restart overloaded server: ssh consul-2 "systemctl restart consul" ```
Step 5: Increase Snapshot Timeout
```bash # Default timeout may be too short for large KV stores
# Check current timeout: consul snapshot save -timeout=30s backup.snap
# Increase timeout: consul snapshot save -timeout=120s backup.snap
# Or via HTTP API with longer timeout: curl -X GET "http://localhost:8500/v1/snapshot" \ --max-time 120 \ --output backup.snap
# For very large clusters: consul snapshot save -timeout=300s backup.snap
# Verify snapshot size: ls -lh backup.snap ```
Step 6: Verify Snapshot Integrity
```bash # After successful snapshot, verify it's valid
# Check snapshot file size ls -lh backup.snap
# Empty file means failed snapshot: ls backup.snap # Should be > 0 bytes
# Test snapshot restore on test cluster: consul snapshot restore backup.snap
# Output: # Restored snapshot with index: xxx
# Verify data restored: consul kv get -recurse
# Check ACLs restored: consul acl policy list
# Check nodes restored: consul members
# Compare checksums: sha256sum backup.snap sha256sum backup.snap.previous ```
Step 7: Automate Snapshot Backups
```bash # Create backup script cat << 'EOF' > /usr/local/bin/consul-backup.sh #!/bin/bash
BACKUP_DIR="/backup/consul" DATE=$(date +%Y%m%d-%H%M%S) SNAPSHOT_FILE="${BACKUP_DIR}/consul-${DATE}.snap" TOKEN="${CONSUL_MANAGEMENT_TOKEN}"
# Check leader exists LEADER=$(consul operator raft list-peers 2>/dev/null | grep leader) if [ -z "$LEADER" ]; then echo "ERROR: No leader, cannot create snapshot" exit 1 fi
# Create snapshot consul snapshot save -token=${TOKEN} -timeout=120s ${SNAPSHOT_FILE}
if [ $? -eq 0 ]; then echo "Snapshot saved: ${SNAPSHOT_FILE}"
# Verify snapshot SIZE=$(stat -c%s ${SNAPSHOT_FILE}) if [ $SIZE -gt 0 ]; then echo "Snapshot valid: ${SIZE} bytes"
# Remove old backups (keep 7 days) find ${BACKUP_DIR} -name "*.snap" -mtime +7 -delete
# Copy to remote storage scp ${SNAPSHOT_FILE} backup-server:/backup/consul/ else echo "ERROR: Snapshot is empty" rm ${SNAPSHOT_FILE} exit 1 fi else echo "ERROR: Snapshot failed" exit 1 fi EOF
chmod +x /usr/local/bin/consul-backup.sh
# Schedule daily backup: cat << 'EOF' > /etc/cron.d/consul-backup 0 2 * * * root /usr/local/bin/consul-backup.sh >> /var/log/consul-backup.log 2>&1 EOF ```
Step 8: Handle Snapshot Restore
```bash # When restoring from snapshot:
# Stop all Consul servers first systemctl stop consul
# On each server, remove existing data rm -rf /opt/consul/data/*
# Restore snapshot on first server: consul snapshot restore backup.snap
# Output: # Restored snapshot
# Start first server as bootstrap: consul agent -bootstrap-expect=1 -server \ -data-dir=/opt/consul/data \ -bind=10.0.0.1
# Wait for leader election consul operator raft list-peers
# Start other servers with join: consul agent -server -join=10.0.0.1 \ -data-dir=/opt/consul/data \ -bind=10.0.0.2
# Verify restore: consul kv get -recurse consul members consul acl policy list ```
Step 9: Check Network Connectivity
```bash # Check connectivity to leader LEADER=$(curl -s http://localhost:8500/v1/status/leader) echo "Leader: $LEADER"
# Check network to leader ping ${LEADER%%:*}
# Check port 8500 (HTTP API) nc -zv ${LEADER%%:*} 8500
# Check port 8300 (Raft) nc -zv ${LEADER%%:*} 8300
# Check firewall iptables -L -n | grep 8500 iptables -L -n | grep 8300
# Allow API port for snapshot: iptables -I INPUT -p tcp --dport 8500 -j ACCEPT
# Test snapshot via API: curl -X GET "http://${LEADER}/v1/snapshot" --output test.snap
# Verify file ls -lh test.snap ```
Step 10: Monitor Backup Health
```bash # Create backup monitoring cat << 'EOF' > /usr/local/bin/check-consul-backup.sh #!/bin/bash
BACKUP_DIR="/backup/consul"
# Check last backup exists LAST_BACKUP=$(ls -t ${BACKUP_DIR}/*.snap 2>/dev/null | head -1)
if [ -z "$LAST_BACKUP" ]; then echo "WARNING: No backup found" exit 1 fi
# Check backup age BACKUP_AGE=$(( ($(date +%s) - $(stat -c%Y $LAST_BACKUP)) / 3600 )) if [ $BACKUP_AGE -gt 24 ]; then echo "WARNING: Last backup is ${BACKUP_AGE} hours old" fi
# Check backup size BACKUP_SIZE=$(stat -c%s $LAST_BACKUP) if [ $BACKUP_SIZE -lt 100 ]; then echo "ERROR: Backup too small: ${BACKUP_SIZE} bytes" exit 1 fi
echo "OK: Latest backup ${LAST_BACKUP}, size ${BACKUP_SIZE}, age ${BACKUP_AGE}h" EOF
chmod +x /usr/local/bin/check-consul-backup.sh
# Prometheus alert for backup: - alert: ConsulBackupMissing expr: consul_backup_age_hours > 24 for: 1h labels: severity: warning annotations: summary: "Consul snapshot backup missing or old" ```
Consul Snapshot Backup Checklist
| Check | Command | Expected |
|---|---|---|
| Leader exists | raft list-peers | Has leader |
| ACL permissions | acl token read | snapshot:write |
| Disk space | df -h | > snapshot size |
| Snapshot file | ls -lh | > 0 bytes |
| Restore test | snapshot restore | Success |
| Backup age | stat -c%Y | < 24 hours |
Verification
```bash # After resolving snapshot issue
# 1. Create snapshot consul snapshot save backup.snap // Success! Snapshot saved
# 2. Verify file size ls -lh backup.snap // File exists with content
# 3. Test restore consul snapshot restore backup.snap // Restored successfully
# 4. Check backup schedule ls -la /backup/consul/*.snap // Recent backup exists
# 5. Verify leader stable consul operator raft list-peers // Leader present
# 6. Monitor backup logs tail /var/log/consul-backup.log // No errors ```
Related Issues
- [Fix Consul KV Store Not Responding](/articles/fix-consul-kv-store-not-responding)
- [Fix Consul Agent Not Starting](/articles/fix-consul-agent-not-starting)
- [Fix Consul Service Not Registering](/articles/fix-consul-service-not-registering)
Related Articles
- [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
- [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
- [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
- [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
- [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Consul Snapshot Backup Failed", "description": "Troubleshoot Consul snapshot backup failed. Check leader, permissions, storage.", "url": "https://www.fixwikihub.com/fix-consul-snapshot-backup-failed", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-05T13:31:24.597Z", "dateModified": "2026-04-05T13:31:24.597Z" } </script>