Home / Memcached / Fix Memcached Failover Issues

Memcached

Fix Memcached Failover Issues

Fix Memcached failover and connection issues. Configure client failover, handle server failures, and ensure cache availability in distributed environments.

Published: Apr 27, 20269 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

# Fix Memcached Failover Issues

Memcached doesn't have built-in clustering or automatic failover. Failover behavior depends entirely on your client library configuration.

Introduction

Your Memcached cluster is experiencing failover issues. When a Memcached server goes down, your application either crashes, returns errors, or experiences degraded performance. You need to configure proper failover handling. Memcached does not have built-in clustering or automatic failover - failover behavior depends entirely on your client library configuration.

Understanding Memcached architecture is important: - No master/slave - all nodes are equal - No replication - data exists on one node only - No automatic failover - client handles node failures - Data loss on node failure - cache must be repopulated

Symptoms

Memcached failover issues present with: - "Connection refused" errors when servers fail - Application crashes on cache server failure - Stale connections to dead servers - Cache misses during failover - Performance degradation during server failures - Timeout errors reaching cache servers - Inconsistent cache behavior across nodes

Diagnosis commands to investigate:

Memcached is a simple, distributed cache: - No master/slave - all nodes are equal - No replication - data exists on one node only - No automatic failover - client handles node failures - Data loss on node failure - cache must be repopulated

Common Causes

Configuration misconfiguration
Missing or incorrect credentials
Network connectivity issues
Version compatibility problems
Resource exhaustion or limits
Permission or access denied

Step-by-Step Fix

Check Memcached server status:

```bash # Check if Memcached is running systemctl status memcached

# Check multiple servers for server in memcached1 memcached2 memcached3; do echo "=== $server ===" ssh $server "systemctl status memcached" done

# Check Memcached stats echo "stats" | nc localhost 11211 | head -20

# Check server connectivity nc -zv memcached1 11211 nc -zv memcached2 11211 nc -zv memcached3 11211 ```

Check from application:

```python import memcache

# Connect to cluster mc = memcache.Client(['memcached1:11211', 'memcached2:11211', 'memcached3:11211'])

# Test connectivity for server in mc.servers: print(f"Server {server.address}: {mc.get('test_key')}") ```

Common Issues and Solutions

Issue 1: Application Crashes on Server Failure

python

# Error
# Connection refused, server unavailable

Cause: Client not configured for failover.

Solution: Configure client with failover settings:

```python # Python - python-memcached import memcache

mc = memcache.Client( ['memcached1:11211', 'memcached2:11211', 'memcached3:11211'], dead_retry=30, # Retry dead servers after 30 seconds timeout=5, # Connection timeout failover=True, # Enable failover debug=False )

# Always handle exceptions try: value = mc.get('my_key') except Exception as e: value = None # Fallback to database ```

Issue 2: Stale Connections to Dead Servers

python

# Client keeps trying dead server

Cause: Client doesn't mark server as dead.

Solution: Configure retry and timeout:

```python # Python - pymemcache from pymemcache.client.base import Client from pymemcache.client.hash import HashClient

# Single client with timeout client = Client( ('memcached1', 11211), timeout=5, connect_timeout=5, ignore_exc=True # Ignore exceptions, return None )

# Hash client for multiple servers hash_client = HashClient( [('memcached1', 11211), ('memcached2', 11211), ('memcached3', 11211)], timeout=5, connect_timeout=5, ignore_exc=True, retry_timeout=30 # Retry dead server after 30s ) ```

Issue 3: Data Loss on Failover

python

# Key exists on failed server, now missing

Cause: Memcached doesn't replicate data.

Solution: Implement fallback logic:

```python def get_with_fallback(key, db_query_func): """Get from cache, fallback to database on failure.""" try: value = mc.get(key) if value is None: # Cache miss or server failure value = db_query_func() mc.set(key, value, timeout=3600) return value except Exception as e: # Cache unavailable, get from database return db_query_func()

# Usage user = get_with_fallback( f'user:{user_id}', lambda: db.query_user(user_id) ) ```

Issue 4: Hash Distribution Changes

When a server fails, keys redistribute to remaining servers:

python

# Key 'user:1' was on memcached1
# Now memcached1 is down, key goes to memcached2
# But memcached2 doesn't have the data

Solution: Use consistent hashing:

```python # Python - pymemcache with consistent hashing from pymemcache.client.hash import HashClient

client = HashClient( [('memcached1', 11211), ('memcached2', 11211), ('memcached3', 11211)], use_consistent_hashing=True, retry_timeout=30 ) ```

Issue 5: Java Client Configuration

```java // Java - spymemcached import net.spy.memcached.MemcachedClient; import net.spy.memcached.ConnectionFactoryBuilder;

MemcachedClient client = new MemcachedClient( new ConnectionFactoryBuilder() .setFailureMode(FailureMode.Redistribute) // Redistribute on failure .setOpTimeout(5000) // Operation timeout 5s .setTimeoutExceptionThreshold(10) // Mark dead after 10 failures .build(), AddrUtil.getAddresses("memcached1:11211 memcached2:11211 memcached3:11211") ); ```

Issue 6: PHP Client Configuration

```php // PHP - Memcached extension $m = new Memcached(); $m->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT); $m->setOption(Memcached::OPT_REMOVE_FAILED_SERVERS, true); $m->setOption(Memcached::OPT_RETRY_TIMEOUT, 30); $m->setOption(Memcached::OPT_CONNECT_TIMEOUT, 5000); $m->setOption(Memcached::OPT_SERVER_FAILURE_LIMIT, 2);

$m->addServers([ ['memcached1', 11211, 33], ['memcached2', 11211, 33], ['memcached3', 11211, 33] ]); ```

Issue 7: Node.js Client Configuration

```javascript // Node.js - memcached const Memcached = require('memcached');

const memcached = new Memcached({ 'memcached1:11211': { weight: 1 }, 'memcached2:11211': { weight: 1 }, 'memcached3:11211': { weight: 1 } }, { retries: 2, timeout: 5000, remove: true, // Remove failed servers failOverServers: ['memcached-backup:11211'], failOverOnException: true, retry: 30000 }); ```

Issue 8: Connection Pool Exhaustion

python

# Too many connections to remaining servers

Solution: Configure connection limits:

```python # Use connection pooling from pymemcache.client.hash import HashClient from pymemcache.pool import PooledClient

# Pooled client pool = PooledClient( ('memcached1', 11211), max_pool_size=10, timeout=5 )

# Or HashClient with pool per server client = HashClient( [('memcached1', 11211), ('memcached2', 11211)], timeout=5, connect_timeout=5 ) ```

Monitoring and Health Checks

Server Health Check Script

```python import socket import time

def check_memcached(host, port=11211, timeout=5): """Check if Memcached server is healthy.""" try: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(timeout) sock.connect((host, port)) sock.send(b'stats\r\n') response = sock.recv(1024) sock.close() return True except: return False

def monitor_servers(servers): """Monitor Memcached servers.""" while True: for host, port in servers: status = check_memcached(host, port) print(f"{host}:{port} - {'OK' if status else 'DOWN'}") time.sleep(60)

servers = [('memcached1', 11211), ('memcached2', 11211), ('memcached3', 11211)] monitor_servers(servers) ```

Prometheus Metrics

```python from prometheus_client import Gauge, start_http_server import memcache

mc = memcache.Client(['memcached1:11211', 'memcached2:11211', 'memcached3:11211'])

# Metrics cache_hits = Gauge('memcached_hits', 'Cache hits') cache_misses = Gauge('memcached_misses', 'Cache misses') server_status = Gauge('memcached_server_status', 'Server status', ['server'])

def collect_metrics(): for server in mc.servers: alive = server.connect() is not None server_status.labels(server=server.address).set(1 if alive else 0)

start_http_server(8000) ```

High Availability Setup

Multiple Memcached Instances

bash

# Run multiple Memcached instances per server
memcached -d -p 11211 -m 512 -c 1024
memcached -d -p 11212 -m 512 -c 1024
memcached -d -p 11213 -m 512 -c 1024

Client configuration:

python

mc = memcache.Client([
    'server1:11211', 'server1:11212', 'server1:11213',
    'server2:11211', 'server2:11212', 'server2:11213',
])

Twemproxy (Nutcracker)

Twemproxy provides proxy layer with automatic failover:

yaml

# nutcracker.yml
alpha:
  listen: 127.0.0.1:11211
  hash: crc32a
  distribution: ketama
  auto_eject_hosts: true
  timeout: 500
  server_retry_timeout: 30000
  server_failure_limit: 2
  servers:
    - memcached1:11211:1
    - memcached2:11211:1
    - memcached3:11211:1

Run Twemproxy:

bash

nutcracker -c nutcracker.yml -d

Client connects to Twemproxy:

python

mc = memcache.Client(['127.0.0.1:11211'])

Mcrouter

Mcrouter provides more advanced failover:

json

{
  "pools": {
    "memcached": {
      "servers": [
        "memcached1:11211",
        "memcached2:11211",
        "memcached3:11211"
      ]
    }
  },
  "routes": [
    {
      "route": "PoolRoute|memcached",
      "failover": {
        "failover_policy": "FailoverToNextPool",
        "retry_policy": {
          "tries": 3,
          "retry_delay_ms": 100
        }
      }
    }
  ]
}

Verification

```bash # Test failover manually # Stop one server systemctl stop memcached

# Test from client python test_memcached.py

# Check logs tail -f /var/log/memcached.log

# Restart server systemctl start memcached

# Verify client reconnects python test_memcached.py ```

Test script:

```python import memcache import time

mc = memcache.Client( ['memcached1:11211', 'memcached2:11211', 'memcached3:11211'], dead_retry=30 )

# Set test key mc.set('test_key', 'test_value')

# Get test key repeatedly for i in range(100): try: value = mc.get('test_key') print(f"Attempt {i}: {value}") except Exception as e: print(f"Attempt {i}: Error - {e}") time.sleep(1) ```

Prevention

1.[ ] Multiple Memcached servers configured
2.[ ] Client configured with failover settings
3.[ ] Timeout and retry settings appropriate
4.[ ] Fallback logic implemented in application
5.[ ] Consistent hashing enabled
6.[ ] Connection pooling configured
7.[ ] Health monitoring in place
8.[ ] Twemproxy or Mcrouter for HA (optional)
9.[ ] Test failover manually
10.[ ] Document failover behavior

[Fix Memcached Binary Protocol Sasl Authentication Failure in Memcached](memcached-binary-protocol-sasl-authentication-failure)
[Fix Memcached Cas Mismatch Concurrent Update Operations Issue in Memcached](memcached-cas-mismatch-concurrent-update-operations)
[Fix Memcached Cluster Node Failure Cache Miss Spike Issue in Memcached](memcached-cluster-node-failure-cache-miss-spike)
[Fix Memcached Connection Limit Maxconns Reached High Load Issue in Memcached](memcached-connection-limit-maxconns-reached-high-load)
[Fix Memcached Eviction Memory Pressure Hot Keys Issue in Memcached](memcached-eviction-memory-pressure-hot-keys)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

Memcached Failover Issues Memcached Failover Issues Memcached Memcached Failover Issues troubleshooting Memcached Failover Issues fix Fix Memcached failover and connection issues Memcached Fix Memcached failover and connection issues

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Memcached Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this memcached-errors troubleshooting guide applies to my situation?

This guide is designed for memcached-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these memcached-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of memcached-errors issue?

Most memcached-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this memcached-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for memcached-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Apr 27, 2026

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation

Fix Memcached Failover Issues

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Common Issues and Solutions

Issue 1: Application Crashes on Server Failure

Issue 2: Stale Connections to Dead Servers

Issue 3: Data Loss on Failover

Issue 4: Hash Distribution Changes

Issue 5: Java Client Configuration

Issue 6: PHP Client Configuration

Issue 7: Node.js Client Configuration

Issue 8: Connection Pool Exhaustion

Monitoring and Health Checks

Server Health Check Script

Prometheus Metrics

High Availability Setup

Multiple Memcached Instances

Twemproxy (Nutcracker)

Mcrouter

Verification

Prevention

People also search for

Browse Guides from Other Categories

WordPress

SSL

DNS

Memcached Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading

Fix Memcached Failover Issues

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Common Issues and Solutions

Issue 1: Application Crashes on Server Failure

Issue 2: Stale Connections to Dead Servers

Issue 3: Data Loss on Failover

Issue 4: Hash Distribution Changes

Issue 5: Java Client Configuration

Issue 6: PHP Client Configuration

Issue 7: Node.js Client Configuration

Issue 8: Connection Pool Exhaustion

Monitoring and Health Checks

Server Health Check Script

Prometheus Metrics

High Availability Setup

Multiple Memcached Instances

Twemproxy (Nutcracker)

Mcrouter

Verification

Prevention

Related Articles

People also search for

Share this guide

More Memcached Troubleshooting Guides

Browse Guides from Other Categories

Memcached Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading