Introduction

API Gateway circuit breakers protect backend services from cascading failures. When the circuit opens, all requests fail immediately without attempting to reach the backend, causing service unavailability until the circuit closes.

Symptoms

Circuit open error:

json
{
  "error": "CircuitBreakerOpen",
  "message": "Circuit breaker is open for service 'payment-service'",
  "timestamp": "2024-04-15T10:00:00Z"
}

API Gateway response:

```bash $ curl https://api.example.com/payments

HTTP/1.1 503 Service Unavailable { "error": "Service temporarily unavailable", "reason": "circuit_breaker_open" } ```

Monitoring alert:

bash
Circuit breaker 'payment-service' is OPEN
Failure rate: 75% (threshold: 50%)
Last failure: Connection refused to 10.0.0.5:8080

Common Causes

  1. 1.Backend service down - Target service not running or unreachable
  2. 2.High error rate - Exceeding failure threshold
  3. 3.Timeout issues - Slow responses triggering timeouts
  4. 4.Network connectivity - Network partition between gateway and backend
  5. 5.Resource exhaustion - Backend overloaded or out of resources
  6. 6.Configuration error - Wrong endpoint or health check URL
  7. 7.Inadequate thresholds - Too sensitive circuit breaker settings

Step-by-Step Fix

Step 1: Check Circuit Breaker Status

```bash # Check circuit breaker state (varies by gateway) # Kong: curl -s http://localhost:8001/plugins | jq '.data[] | select(.name=="circuit-breaker")'

# AWS API Gateway - check CloudWatch metrics: aws cloudwatch get-metric-statistics \ --namespace AWS/ApiGateway \ --metric-name 5XXError \ --dimensions Name=ApiName,Value=my-api \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Sum

# Check application logs kubectl logs -l app=api-gateway | grep -i "circuit" ```

Step 2: Verify Backend Service Health

```bash # Check if backend is running curl -I http://backend-service:8080/health

# Check service endpoints nslookup backend-service dig backend-service

# Test direct connection nc -zv backend-service 8080

# Check Kubernetes pods kubectl get pods -l app=backend-service kubectl describe pod backend-service-xxx ```

Step 3: Check Backend Logs

```bash # Check backend application logs kubectl logs -l app=backend-service --tail=100

# Check for errors kubectl logs -l app=backend-service | grep -i "error|exception|failed"

# Check resource usage kubectl top pods -l app=backend-service kubectl describe pod backend-service-xxx | grep -A5 "Limits:|Requests:" ```

Step 4: Check Error Thresholds

```yaml # Common circuit breaker configurations:

# Kong: plugins: - name: circuit-breaker config: error_threshold: 50 # 50% error rate triggers open volume_threshold: 10 # Minimum requests before evaluating half_open_timeout: 30000 # 30 seconds before retry

# Spring Cloud Gateway: resilience4j: circuitbreaker: instances: payment-service: failureRateThreshold: 50 slowCallDurationThreshold: 2s slowCallRateThreshold: 100 permittedNumberOfCallsInHalfOpenState: 3 slidingWindowSize: 10

# AWS App Mesh: circuitBreaker: maxConnections: 100 maxPendingRequests: 100 maxRequests: 100 maxRetries: 3 ```

Step 5: Manually Close Circuit (Emergency)

```bash # Kong - disable plugin temporarily curl -X PATCH http://localhost:8001/plugins/{plugin_id} \ -d "config.enabled=false"

# Spring Cloud - actuator endpoint curl -X POST http://localhost:8080/actuator/circuitbreakers/payment-service/close

# Custom circuit breaker reset curl -X POST http://api-gateway:8080/admin/circuit-breaker/payment-service/reset ```

Step 6: Fix Underlying Backend Issues

```bash # If backend crashed, restart it kubectl rollout restart deployment/backend-service

# If overloaded, scale up kubectl scale deployment backend-service --replicas=5

# Check database connections kubectl exec backend-service-xxx -- netstat -an | grep ESTABLISHED | wc -l

# Check for memory issues kubectl logs backend-service-xxx | grep -i "OutOfMemoryError|memory" ```

Step 7: Adjust Timeout Settings

```yaml # Increase timeout to prevent premature circuit opening # Kong: plugins: - name: circuit-breaker config: timeout: 60000 # 60 second timeout

# Nginx: location /api/ { proxy_pass http://backend; proxy_connect_timeout 60s; proxy_read_timeout 60s; proxy_send_timeout 60s; }

# Envoy: clusters: - name: backend_service connect_timeout: 60s per_connection_buffer_limit_bytes: 32768 ```

Step 8: Implement Graceful Degradation

javascript
// Return cached or fallback response when circuit open
async function callPaymentService(order) {
  try {
    return await paymentService.process(order);
  } catch (CircuitOpenError) {
    // Fallback: queue for later processing
    await queueService.enqueue('payments', order);
    return {
      status: 'queued',
      message: 'Payment will be processed shortly'
    };
  }
}

Step 9: Configure Proper Recovery

```yaml # Half-open state configuration for safe recovery circuitBreaker: failureRateThreshold: 50 slowCallDurationThreshold: 2s permittedNumberOfCallsInHalfOpenState: 5 slidingWindowType: COUNT_BASED slidingWindowSize: 10 minimumNumberOfCalls: 5 waitDurationInOpenState: 30s # Wait before trying half-open

# Automatic recovery monitoring management: endpoints: web: exposure: include: circuitbreakers,health endpoint: health: show-details: always ```

Step 10: Set Up Monitoring and Alerts

```yaml # Prometheus metrics for circuit breaker # application.yml management: metrics: export: prometheus: enabled: true endpoints: web: exposure: include: prometheus,health

# Prometheus alert rule groups: - name: circuit_breaker rules: - alert: CircuitBreakerOpen expr: resilience4j_circuitbreaker_state{state="open"} == 1 for: 1m labels: severity: critical annotations: summary: "Circuit breaker {{ $labels.name }} is open" ```

```bash # Grafana dashboard query resilience4j_circuitbreaker_state{state="open"} resilience4j_circuitbreaker_calls_seconds_count{kind="failed"}

# Check circuit breaker metrics curl http://localhost:8080/actuator/prometheus | grep circuitbreaker ```

Circuit Breaker States

StateBehaviorRecovery
ClosedRequests pass throughNormal operation
OpenRequests fail immediatelyWait for timeout
Half-OpenLimited test requestsSuccess → Closed, Fail → Open
Forced-OpenManual openManual reset required

Verification

```bash # After fixing backend and adjusting settings # Check circuit breaker status curl http://localhost:8080/actuator/circuitbreakers

# Should show: # {"circuitBreakers":{"payment-service":{"state":"CLOSED"}}}

# Test API endpoint curl https://api.example.com/payments

# Should return 200 OK

# Monitor failure rate curl http://localhost:8080/actuator/prometheus | grep circuitbreaker_failure_rate

# Should be below threshold ```

Prevention

To prevent API gateway circuit breaker open issues from recurring, implement these proactive measures:

1. Monitor Circuit Breaker State

yaml
groups:
- name: api-gateway
  rules:
  - alert: CircuitBreakerOpen
    expr: |
      resilience4j_circuitbreaker_state{state="open"} == 1
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Circuit breaker {{ $labels.name }} is open"

2. Configure Appropriate Thresholds

yaml
# application.yml - Resilience4j configuration
resilience4j:
  circuitbreaker:
    configs:
      default:
        failureRateThreshold: 50
        slowCallRateThreshold: 100
        slowCallDurationThreshold: 2s
        minimumNumberOfCalls: 10
        waitDurationInOpenState: 30s
        permittedNumberOfCallsInHalfOpenState: 3
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 100

3. Implement Health Endpoints

```java // Spring Boot health check for backend @RestController public class HealthController { @GetMapping("/health") public ResponseEntity<String> health() { // Check database connectivity // Check downstream services return ResponseEntity.ok("OK"); } }

// Configure gateway to check health before routing ```

Best Practices Checklist

  • [ ] Monitor circuit breaker state
  • [ ] Configure appropriate thresholds
  • [ ] Implement health endpoints
  • [ ] Test backend services regularly
  • [ ] Use fallback strategies
  • [ ] Document circuit breaker configuration
  • [Fix API Gateway Timeout Backend Slow](/articles/fix-api-gateway-timeout-backend-slow)
  • [Fix API Gateway Authentication Bypass](/articles/fix-api-gateway-authentication-bypass)
  • [Fix API Gateway SSL Termination Failed](/articles/fix-api-gateway-ssl-termination-failed)
  • [WordPress troubleshooting: Fix IAM Permission Denied - Complete Tro](fix-iam-permission-denied-d4at)
  • [WordPress troubleshooting: Fix IAM Access Denied 403 - Complete Tro](fix-iam-access-denied-403-ywdw)
  • [WordPress troubleshooting: Fix ELB Permission Denied - Complete Tro](fix-elb-permission-denied-1h5w)
  • [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error-i8br)
  • [WordPress troubleshooting: Fix IAM Access Denied 403 - Complete Tro](fix-iam-access-denied-403-5gzy)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix API Gateway Circuit Breaker Open", "description": "Troubleshoot API gateway circuit breaker issues. Fix backend failures, adjust thresholds, and implement proper recovery.", "url": "https://www.fixwikihub.com/fix-api-gateway-circuit-open", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-03T18:08:51.121Z", "dateModified": "2026-04-03T18:08:51.121Z" } </script>