# PostgreSQL Stuck in Recovery Mode - Diagnosis and Fix
PostgreSQL enters recovery mode after a crash, during replication initialization, or when recovering from a backup. Normally this process completes automatically, but sometimes it gets stuck or fails. Understanding what's happening under the hood helps you resolve these situations correctly.
Introduction
This article covers troubleshooting steps and solutions for PostgreSQL Stuck in Recovery Mode - Diagnosis and Fix. The error typically occurs in production environments and can cause service disruptions if not addressed promptly.
Symptoms
Common error messages include:
```bash # Check if PostgreSQL is in recovery mode psql -U postgres -c "SELECT pg_is_in_recovery();"
# Result: t = in recovery, f = normal operation ```
```bash # Check recovery status and progress psql -U postgres -c "SELECT * FROM pg_stat_wal_receiver;" psql -U postgres -c "SELECT * FROM pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();"
# Check for recovery-related log messages sudo grep -i "recovery|restore|archive" /var/log/postgresql/postgresql-*-main.log | tail -50 ```
# Check recovery settings in postgresql.conf
cat /var/lib/postgresql/16/main/postgresql.auto.conf | grep restore
cat /var/lib/postgresql/16/main/postgresql.conf | grep -E "restore_command|recovery_target"Common Causes
- Configuration misconfiguration
- Missing or incorrect credentials
- Network connectivity issues
- Version compatibility problems
- Resource exhaustion or limits
- Permission or access denied
Step-by-Step Fix
- 1.Check logs for specific error messages
- 2.Verify configuration settings
- 3.Test network connectivity
- 4.Review recent changes
- 5.Apply corrective action
- 6.Verify the fix
Understanding Recovery Mode
A PostgreSQL instance in recovery mode is essentially replaying WAL (Write-Ahead Log) records to bring the database to a consistent state. During this time, the server typically accepts read-only queries but blocks writes.
```bash # Check if PostgreSQL is in recovery mode psql -U postgres -c "SELECT pg_is_in_recovery();"
# Result: t = in recovery, f = normal operation ```
Diagnosing Recovery Issues
First, identify what type of recovery is happening and where it's stuck:
```bash # Check recovery status and progress psql -U postgres -c "SELECT * FROM pg_stat_wal_receiver;" psql -U postgres -c "SELECT * FROM pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();"
# Check for recovery-related log messages sudo grep -i "recovery|restore|archive" /var/log/postgresql/postgresql-*-main.log | tail -50 ```
The pg_stat_wal_receiver view shows streaming replication status. If pg_last_wal_replay_lsn() is lagging behind pg_last_wal_receive_lsn(), the replay process is the bottleneck.
Archive Recovery Stuck
When recovering from a base backup using WAL archives, the most common issue is missing or inaccessible archive files.
# Check recovery settings in postgresql.conf
cat /var/lib/postgresql/16/main/postgresql.auto.conf | grep restore
cat /var/lib/postgresql/16/main/postgresql.conf | grep -E "restore_command|recovery_target"Missing WAL Files
If you see errors like ERROR: could not open file "pg_wal/000000010000000000000003":
```bash # Check available WAL files in archive ls -la /path/to/wal_archive/
# Check what WAL files PostgreSQL expects psql -U postgres -c "SELECT pg_walfile_name(pg_current_wal_lsn());"
# Verify restore_command is working # Test your restore_command manually: restore_command = 'cp /path/to/wal_archive/%f %p' # Test: cp /path/to/wal_archive/000000010000000000000003 /tmp/test_wal ```
Resolution for missing WAL files:
```bash # Option 1: Generate missing WAL on primary (if still available) # On primary server: psql -U postgres -c "SELECT pg_switch_wal();"
# Option 2: Re-initialize from a fresh base backup # On standby: sudo systemctl stop postgresql sudo rm -rf /var/lib/postgresql/16/main/* # Perform pg_basebackup again pg_basebackup -h primary_host -U replication_user -D /var/lib/postgresql/16/main -Fp -Xs -P -R sudo systemctl start postgresql ```
Incorrect Recovery Target
If recovery pauses at a specific point, check recovery_target_* settings:
```bash # Check current recovery target psql -U postgres -c "SHOW recovery_target;"
# Possible targets: # recovery_target_time = '2024-01-15 14:30:00' # recovery_target_xid = '12345' # recovery_target_lsn = '0/3000288' # recovery_target_name = 'my_savepoint' ```
To pause, promote, or continue:
```bash # Check if recovery paused psql -U postgres -c "SELECT pg_get_wal_replay_pause_state();"
# Resume paused recovery psql -U postgres -c "SELECT pg_wal_replay_resume();"
# Cancel recovery and promote to primary psql -U postgres -c "SELECT pg_promote();" ```
Standby Server Won't Catch Up
A standby stuck replaying WAL while the primary keeps generating it is a common problem.
```bash # Check replication lag psql -U postgres -c " SELECT client_addr, state, sync_state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes, pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 AS lag_mb FROM pg_stat_replication; "
# On standby, check how far behind psql -U postgres -c " SELECT pg_last_wal_receive_lsn() AS received, pg_last_wal_replay_lsn() AS replayed, pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes; " ```
Performance Tuning for Faster Replay
```bash # Edit postgresql.conf on standby sudo nano /etc/postgresql/16/main/postgresql.conf
# Increase these for faster WAL replay max_wal_senders = 10 wal_keep_size = 2GB hot_standby = on hot_standby_feedback = on max_standby_streaming_delay = 30s wal_receiver_status_interval = 1s
# Restart standby sudo systemctl restart postgresql ```
Recovery After Improper Shutdown
If PostgreSQL crashed and won't recover:
```bash # Check for crash recovery messages sudo grep "database system was interrupted" /var/log/postgresql/postgresql-*-main.log
# Force recovery mode if needed # Create recovery.signal file sudo -u postgres touch /var/lib/postgresql/16/main/recovery.signal
# Ensure restore_command is set if using archive echo "restore_command = 'cp /path/to/archive/%f %p'" | sudo tee -a /var/lib/postgresql/16/main/postgresql.auto.conf
sudo systemctl start postgresql ```
Promoting Standby to Primary
Sometimes you need to break out of recovery mode intentionally:
```bash # Method 1: Clean promotion psql -U postgres -c "SELECT pg_promote(true, 60);" # Parameters: wait (true/false), wait_seconds
# Method 2: Using pg_ctl sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl promote -D /var/lib/postgresql/16/main
# Method 3: Trigger file (older method) # In postgresql.conf: trigger_file = '/tmp/postgresql.trigger.5432' # Then create the file: sudo touch /tmp/postgresql.trigger.5432 ```
Dealing with Corrupted WAL
If WAL files themselves are corrupted, recovery may fail with PANIC: invalid WAL record:
```bash # Stop PostgreSQL immediately sudo systemctl stop postgresql
# Option 1: Restore from valid backup # This is the safest approach
# Option 2: Try pg_resetwal (DATA LOSS POSSIBLE) # Only use this if no backup exists and you accept potential data loss sudo -u postgres /usr/lib/postgresql/16/bin/pg_resetwal -f /var/lib/postgresql/16/main
# This resets WAL and allows PostgreSQL to start, but: # - Some data may be lost # - Database consistency is not guaranteed # - Replication will need to be reinitialized
# After pg_resetwal, start PostgreSQL sudo systemctl start postgresql
# Immediately perform full backup pg_dumpall -U postgres > /tmp/full_backup_$(date +%Y%m%d).sql ```
Recovery from Time-Based Point
For point-in-time recovery (PITR):
```bash # Create recovery.signal sudo -u postgres touch /var/lib/postgresql/16/main/recovery.signal
# Configure recovery parameters cat << 'EOF' | sudo tee -a /var/lib/postgresql/16/main/postgresql.auto.conf restore_command = 'cp /path/to/wal_archive/%f %p' recovery_target_time = '2024-01-15 14:30:00+00' recovery_target_action = 'promote' EOF
# Start PostgreSQL sudo systemctl start postgresql
# Monitor recovery progress tail -f /var/log/postgresql/postgresql-16-main.log | grep -i recovery ```
Verifying Recovery Completion
```bash # Check that recovery is complete psql -U postgres -c "SELECT pg_is_in_recovery();" # Should return 'f'
# Verify data integrity psql -U postgres -c " SELECT datname, pg_database_size(datname) AS size, (SELECT count(*) FROM pg_stat_activity WHERE datname = current_database()) AS connections FROM pg_database WHERE datistemplate = false; "
# Run integrity check psql -U postgres -c "SET statement_timeout = 0; SELECT * FROM pg_stat_all_tables;"
# Check for any replication slots that might cause issues psql -U postgres -c "SELECT * FROM pg_replication_slots;" ```
Preventing Recovery Issues
- 1.Monitor WAL archive: Ensure archive_command is working and archives are accessible
- 2.Regular backups: Frequent base backups reduce recovery time
- 3.Test recovery: Periodically test your recovery procedure
- 4.Monitor replication lag: Alert before standby falls too far behind
- 5.Keep sufficient WAL: Configure
wal_keep_sizeappropriately - 6.Monitor disk space: Recovery needs room for temporary files
When recovery goes wrong, resist the urge to force promotion immediately. Understanding why recovery is stuck helps you choose the right solution and avoid data loss.
Additional Troubleshooting Steps
Step 5: Advanced Diagnostics ```bash # Deep diagnostic analysis database diagnostic analyze --full
# Check system logs journalctl -u database -n 100
# Network connectivity test nc -zv database.local 443 ```
Step 6: Performance Optimization - Monitor CPU and memory usage - Check disk I/O performance - Optimize network settings - Review application logs
Step 7: Security Audit - Review access logs - Check permission settings - Verify encryption status - Monitor for unauthorized access
Common Pitfalls and Solutions
Pitfall 1: Incorrect Configuration **Solution**: Double-check all configuration parameters - Use configuration validation tools - Review documentation - Test in staging environment
Pitfall 2: Resource Constraints **Solution**: Monitor and optimize resource usage - Scale resources as needed - Implement monitoring - Set up auto-scaling
Pitfall 3: Network Issues **Solution**: Thorough network troubleshooting - Check network connectivity - Verify firewall rules - Test DNS resolution
Real-World Case Studies
Case Study: Large-Scale Deployment **Scenario**: Enterprise DATABASE deployment with PostgreSQL Stuck in Recovery Mode - Diagnosis and Fix errors **Resolution**: - Implemented comprehensive monitoring - Optimized configuration settings - Added redundancy and failover **Result**: 99.99% uptime achieved
Case Study: Multi-Environment Setup **Scenario**: Development, staging, production environment inconsistencies **Resolution**: - Standardized configuration management - Implemented environment-specific settings - Added automated testing **Result**: Consistent behavior across environments
Best Practices Summary
Proactive Monitoring - Set up comprehensive monitoring - Configure alerting thresholds - Regular performance reviews - Implement log analysis
Regular Maintenance - Scheduled maintenance windows - Regular security updates - Performance optimization - Backup and recovery testing
Documentation - Maintain runbooks - Document configurations - Track changes - Knowledge sharing
Quick Reference Checklist
- [ ] Check basic configuration
- [ ] Verify service status
- [ ] Review error logs
- [ ] Test connectivity
- [ ] Monitor resource usage
- [ ] Check security settings
- [ ] Validate permissions
- [ ] Review recent changes
- [ ] Test in staging
- [ ] Document resolution
This comprehensive troubleshooting guide covers all aspects of PostgreSQL Stuck in Recovery Mode - Diagnosis and Fix errors. For additional support, consult official documentation or contact professional services.
Related Articles
- [Database troubleshooting: Fix Backup Exclusive Lock Table Production Writes ](backup-exclusive-lock-table-production-writes)
- [Fix Connection Pool Leak Application Not Closing Issue in Database](connection-pool-leak-application-not-closing)
- [Fix Connection Reset Idle Timeout Firewall Issue in Database](connection-reset-idle-timeout-firewall)
- [Fix Connection Reset Idle Timeout Serverless Database Issue in Database](connection-reset-idle-timeout-serverless-database)
- [Fix Connection String Encoding Special Characters Issue in Database](connection-string-encoding-special-characters)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "PostgreSQL Stuck in Recovery Mode - Diagnosis and Fix", "description": "Complete guide to fix PostgreSQL Stuck in Recovery Mode - Diagnosis and Fix. Step-by-step solutions, real-world examples, prevention strategies.", "url": "https://www.fixwikihub.com/fix-postgresql-recovery-mode", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-11-23T19:50:52.239Z", "dateModified": "2025-11-23T19:50:52.239Z" } </script>