Home / Kubernetes / Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit

Kubernetes

Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit

Configure Envoy global rate limiting with envoyproxy/ratelimit Docker image for production traffic control.

Published: Apr 23, 202613 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

# Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit

You deployed Envoy as an API gateway with rate limiting, but users can still make unlimited requests. Or worse, all requests return 429 Too Many Requests even though traffic is low.

bash

$ curl -I https://api.example.com/v1/users
HTTP/1.1 429 Too Many Requests
x-envoy-ratelimited: true
retry-after: 60

The envoyproxy/ratelimit service provides global rate limiting for Envoy, but misconfiguration is common.

Introduction

This article covers troubleshooting steps and solutions for Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit. The error typically occurs in production environments and can cause service disruptions if not addressed promptly.

Symptoms

Common error messages include:

bash

$ curl -I https://api.example.com/v1/users
HTTP/1.1 429 Too Many Requests
x-envoy-ratelimited: true
retry-after: 60

yaml

domain: "production-api"

yaml

domain: api-production  # Different name!

Common Causes

Configuration misconfiguration
Missing or incorrect credentials
Network connectivity issues
Version compatibility problems
Resource exhaustion or limits
Permission or access denied

Step-by-Step Fix

1.Check logs for specific error messages
2.Verify configuration settings
3.Test network connectivity
4.Review recent changes
5.Apply corrective action
6.Verify the fix

Real Scenario: Rate Limits Not Applied

A SaaS company deployed Envoy with rate limiting to protect their API. They configured limits of 100 requests per minute per user, but monitoring showed some users making 1000+ requests per minute without being blocked.

The problem: The rate limit filter was in the Envoy configuration, but the domain didn't match between Envoy and the rate limit service.

Envoy config had:

yaml

domain: "production-api"

Rate limit config had:

yaml

domain: api-production  # Different name!

Because the domains didn't match, Envoy's rate limit requests were ignored by the service.

Architecture Overview

bash

┌─────────┐     ┌──────────────┐     ┌──────────────────┐     ┌───────┐
│ Client  │────▶│ Envoy Proxy  │────▶│ Rate Limit Svc   │────▶│ Redis │
└─────────┘     └──────────────┘     └──────────────────┘     └───────┘
                       │
                       ▼
              ┌────────────────┐
              │ Backend Service│
              └────────────────┘

1.Client sends request to Envoy
2.Envoy calls rate limit service (gRPC on port 6070)
3.Rate limit service checks Redis for counters
4.Service returns OK or OVER_LIMIT
5.Envoy forwards request or returns 429

Quick Start with Docker

Test the setup locally before deploying to production:

```bash # 1. Start Redis docker run -d --name redis \ --network ratelimit-net \ redis:7-alpine

# 2. Create rate limit configuration mkdir -p /tmp/ratelimit/config cat > /tmp/ratelimit/config/ratelimit-config.yaml << 'EOF' domain: my-api descriptors: - key: user_id rate_limit: unit: minute requests_per_unit: 100 EOF

# 3. Start rate limit service docker run -d --name ratelimit \ --network ratelimit-net \ -p 8080:8080 \ -p 6070:6070 \ -e REDIS_SOCKET_TYPE=tcp \ -e REDIS_TCP_HOST=redis \ -e REDIS_TCP_PORT=6379 \ -e RUNTIME_ROOT=/data \ -e RUNTIME_SUBDIRECTORY=ratelimit \ -e RUNTIME_WATCH_ROOT=false \ -e USE_STATSD=false \ -v /tmp/ratelimit:/data \ envoyproxy/ratelimit:latest

# 4. Verify service is running curl http://localhost:8080/healthcheck # Should return: OK ```

Kubernetes Deployment

Step 1: Create Namespace and Redis

yaml

# redis.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: ratelimit
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: ratelimit
spec:
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
        livenessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: ratelimit
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379

Step 2: Deploy Rate Limit Service

```yaml # ratelimit.yaml apiVersion: v1 kind: ConfigMap metadata: name: ratelimit-config namespace: ratelimit data: ratelimit-config.yaml: | domain: production-api descriptors: # Per-user rate limit: 100 requests per minute - key: user_id rate_limit: unit: minute requests_per_unit: 100

# Per-IP rate limit: 20 requests per second - key: remote_address rate_limit: unit: second requests_per_unit: 20

# Nested: Per-user per-endpoint limits - key: user_id descriptors: - key: endpoint rate_limit: unit: minute requests_per_unit: 30 --- apiVersion: apps/v1 kind: Deployment metadata: name: ratelimit namespace: ratelimit spec: selector: matchLabels: app: ratelimit template: metadata: labels: app: ratelimit spec: containers: - name: ratelimit image: envoyproxy/ratelimit:latest ports: - containerPort: 8080 name: http - containerPort: 6070 name: grpc env: - name: REDIS_SOCKET_TYPE value: "tcp" - name: REDIS_TCP_HOST value: "redis" - name: REDIS_TCP_PORT value: "6379" - name: RUNTIME_ROOT value: "/data" - name: RUNTIME_SUBDIRECTORY value: "ratelimit" - name: RUNTIME_WATCH_ROOT value: "false" - name: RUNTIME_IGNOREDOTFILES value: "true" - name: LOG_LEVEL value: "info" - name: USE_STATSD value: "false" - name: GRPC_PORT value: "6070" - name: PORT value: "8080" resources: requests: memory: "256Mi" cpu: "200m" limits: memory: "512Mi" cpu: "500m" volumeMounts: - name: config mountPath: /data/ratelimit/config readOnly: true livenessProbe: httpGet: path: /healthcheck port: 8080 initialDelaySeconds: 10 periodSeconds: 5 readinessProbe: httpGet: path: /healthcheck port: 8080 initialDelaySeconds: 5 periodSeconds: 3 volumes: - name: config configMap: name: ratelimit-config --- apiVersion: v1 kind: Service metadata: name: ratelimit namespace: ratelimit spec: selector: app: ratelimit ports: - port: 8080 name: http targetPort: 8080 - port: 6070 name: grpc targetPort: 6070 ```

Step 3: Configure Envoy

```yaml # envoy.yaml static_resources: listeners: - name: listener_0 address: socket_address: address: 0.0.0.0 port_value: 10000 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: local_service domains: ["*"] routes: - match: prefix: "/" route: cluster: backend_service http_filters: # Rate limit filter MUST come before router - name: envoy.filters.http.ratelimit typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit domain: "production-api" # MUST match config file failure_mode_deny: false # Don't block if rate limit service is down timeout: 0.5s rate_limited_as_resource_exhausted: true rate_limit_service: grpc_service: envoy_grpc: cluster_name: ratelimit_cluster transport_api_version: V3 - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

clusters: - name: backend_service connect_timeout: 5s type: STRICT_DNS lb_policy: ROUND_ROBIN load_assignment: cluster_name: backend_service endpoints: - lb_endpoints: - endpoint: address: socket_address: address: backend-service.default.svc.cluster.local port_value: 8080

name: ratelimit_cluster
connect_timeout: 0.5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
http2_protocol_options: {} # Required for gRPC
load_assignment:
cluster_name: ratelimit_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: ratelimit.ratelimit.svc.cluster.local
port_value: 6070
`

Common Configuration Errors

Error 1: Domain Mismatch

Symptom: Rate limits not applied, all requests pass through.

Cause: The domain in Envoy config doesn't match the rate limit config.

Debug:

```bash # Check Envoy config kubectl exec -n default deployment/envoy -- curl -s localhost:9901/config_dump | jq '.configs[0].dynamic_listeners[0].active_state.listener.filter_chains[0].filters[1].typed_config.domain'

# Check rate limit config kubectl get configmap ratelimit-config -n ratelimit -o yaml | grep domain ```

Fix: Ensure both match exactly:

```yaml # Envoy config domain: "production-api"

# Rate limit config domain: production-api # No quotes needed in YAML ```

Error 2: Missing HTTP/2 on gRPC Port

Symptom: Rate limit service unreachable, timeouts in Envoy logs.

Cause: gRPC requires HTTP/2, but the cluster doesn't have http2_protocol_options.

Envoy error log:

bash

[warning][config] [source/extensions/filters/http/ratelimit/ratelimit.cc:79] rate limit service cluster 'ratelimit_cluster' is not configured for HTTP/2

Fix:

yaml

- name: ratelimit_cluster
  http2_protocol_options: {}  # Add this line

Error 3: All Requests Blocked

Symptom: Every request returns 429, even the first request.

Cause: Redis connection failure or misconfigured rate limit values.

Debug:

```bash # Check rate limit service logs kubectl logs -n ratelimit deployment/ratelimit --tail=100

# Look for Redis connection errors: # "redis: connection refused" or "dial tcp: connection refused"

# Test Redis connectivity from rate limit pod kubectl exec -n ratelimit deployment/ratelimit -- nc -zv redis 6379 # Should output: redis.ratelimit.svc.cluster.local (10.0.0.1) 6379 (?) open ```

Fix: Verify Redis is running and accessible:

bash

kubectl get pods -n ratelimit -l app=redis
kubectl logs -n ratelimit deployment/redis

Error 4: Rate Limit Service Crashes

Symptom: Container exits immediately with error.

Cause: Missing required environment variables.

Debug:

bash

kubectl logs -n ratelimit deployment/ratelimit
# Look for: "REDIS_SOCKET_TYPE must be set"

Required environment variables:

yaml

env:
- name: REDIS_SOCKET_TYPE
  value: "tcp"
- name: REDIS_TCP_HOST
  value: "redis"
- name: REDIS_TCP_PORT
  value: "6379"

Error 5: Wrong Filter Order

Symptom: Rate limiting works but some responses are malformed.

Cause: Rate limit filter must come before the router filter.

Wrong:

yaml

http_filters:
- name: envoy.filters.http.router  # Router first - WRONG
- name: envoy.filters.http.ratelimit

Correct:

yaml

http_filters:
- name: envoy.filters.http.ratelimit  # Rate limit first
- name: envoy.filters.http.router     # Router last

Rate Limit Configuration Examples

Per-User Rate Limiting

yaml

domain: production-api
descriptors:
  - key: user_id
    rate_limit:
      unit: minute
      requests_per_unit: 100

Envoy must send the user_id descriptor:

yaml

# In Envoy route configuration
rate_limits:
- actions:
  - request_headers:
      header_name: x-user-id
      descriptor_key: user_id

IP-Based Rate Limiting

yaml

domain: production-api
descriptors:
  - key: remote_address
    rate_limit:
      unit: second
      requests_per_unit: 20

Envoy configuration:

yaml

rate_limits:
- actions:
  - remote_address: {}

Nested Rate Limits

yaml

domain: production-api
descriptors:
  # First check: per-user limit
  - key: user_id
    rate_limit:
      unit: minute
      requests_per_unit: 100
    # Second check: per-user per-endpoint limit
    descriptors:
      - key: endpoint
        rate_limit:
          unit: minute
          requests_per_unit: 30

This allows 100 requests/minute total per user, but only 30/minute for each endpoint.

Testing Rate Limits

```bash # Send 25 requests and count HTTP status codes for i in {1..25}; do curl -s -o /dev/null -w "%{http_code}\n" \ -H "x-user-id: user123" \ http://localhost:10000/api/test done | sort | uniq -c

# Expected output (with 20/sec limit): # 20 200 # 5 429 ```

Production Best Practices

1. Use failure_mode_deny Carefully

```yaml # Safe for non-critical APIs failure_mode_deny: false # Allow traffic if rate limit service fails

# Safe only if you prefer outages over overuse failure_mode_deny: true # Block all traffic if rate limit service fails ```

2. Set Appropriate Timeouts

yaml

timeout: 0.5s  # Don't let rate limit checks slow down requests

3. Monitor Rate Limit Metrics

```bash # Check Envoy stats curl http://localhost:9901/stats | grep ratelimit

# Key metrics: # cluster.ratelimit_cluster.upstream_rq_total # cluster.ratelimit_cluster.upstream_cx_connect_fail # http.ratelimit.over_limit # http.ratelimit.ok ```

4. Use Redis Sentinel for High Availability

yaml

env:
- name: REDIS_TYPE
  value: "sentinel"
- name: REDIS_SENTINEL_MASTER_NAME
  value: "mymaster"
- name: REDIS_SENTINEL_ADDRESSES
  value: "sentinel1:26379,sentinel2:26379,sentinel3:26379"

Prevention

1.Verify rate limit service is healthy:
2.```bash
3.curl http://ratelimit.ratelimit.svc.cluster.local:8080/healthcheck
4.`
5.Check Redis connectivity:
6.```bash
7.kubectl exec -n ratelimit deployment/ratelimit -- redis-cli -h redis ping
8.`
9.Verify domain matches:
10.```bash
11.# Envoy domain
12.curl -s localhost:9901/config_dump | jq '.configs[0].dynamic_listeners[0].active_state.listener.filter_chains[0].filters[1].typed_config.domain'
13.# Rate limit config domain
14.kubectl get configmap ratelimit-config -n ratelimit -o yaml | grep domain
15.`
16.Check Envoy cluster is configured:
17.```bash
18.curl -s localhost:9901/clusters | grep ratelimit
19.`
20.Watch rate limit service logs:
21.```bash
22.kubectl logs -n ratelimit deployment/ratelimit -f
23.`

Additional Troubleshooting Steps

Step 5: Advanced Diagnostics ```bash # Deep diagnostic analysis kubernetes diagnostic analyze --full

# Check system logs journalctl -u kubernetes -n 100

# Network connectivity test nc -zv kubernetes.local 443 ```

Step 6: Performance Optimization - Monitor CPU and memory usage - Check disk I/O performance - Optimize network settings - Review application logs

Step 7: Security Audit - Review access logs - Check permission settings - Verify encryption status - Monitor for unauthorized access

Common Pitfalls and Solutions

Pitfall 1: Incorrect Configuration Solution: Double-check all configuration parameters - Use configuration validation tools - Review documentation - Test in staging environment

Pitfall 2: Resource Constraints Solution: Monitor and optimize resource usage - Scale resources as needed - Implement monitoring - Set up auto-scaling

Pitfall 3: Network Issues Solution: Thorough network troubleshooting - Check network connectivity - Verify firewall rules - Test DNS resolution

Real-World Case Studies

Case Study: Large-Scale Deployment Scenario: Enterprise KUBERNETES deployment with Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit errors Resolution: - Implemented comprehensive monitoring - Optimized configuration settings - Added redundancy and failover Result: 99.99% uptime achieved

Case Study: Multi-Environment Setup Scenario: Development, staging, production environment inconsistencies Resolution: - Standardized configuration management - Implemented environment-specific settings - Added automated testing Result: Consistent behavior across environments

Best Practices Summary

Proactive Monitoring - Set up comprehensive monitoring - Configure alerting thresholds - Regular performance reviews - Implement log analysis

Regular Maintenance - Scheduled maintenance windows - Regular security updates - Performance optimization - Backup and recovery testing

Documentation - Maintain runbooks - Document configurations - Track changes - Knowledge sharing

Quick Reference Checklist

[ ] Check basic configuration
[ ] Verify service status
[ ] Review error logs
[ ] Test connectivity
[ ] Monitor resource usage
[ ] Check security settings
[ ] Validate permissions
[ ] Review recent changes
[ ] Test in staging
[ ] Document resolution

This comprehensive troubleshooting guide covers all aspects of Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit errors. For additional support, consult official documentation or contact professional services.

[Fix Fix Argocd App Not Syncing Issue in Kubernetes](fix-argocd-app-not-syncing)
[Fix Fix Argocd Sync Conflict Issue in Kubernetes](fix-argocd-sync-conflict)
[Fix ArgoCD Sync Timeout](fix-argocd-sync-timeout)
[How to Fix Cilium Identity Exhaustion and Endpoint Allocation Failed](fix-cilium-identity-exhaustion)
[Fix CoreDNS Resolution Failed in Kubernetes](fix-coredns-resolution-failed-kubernetes)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

Envoy Rate Limit Configuration with envoyproxy/ratelimit Envoy Rate Limit Configuration with envoyproxy/ratelimit Kubernetes Envoy Rate Limit Configuration with envoyproxy/ratelimit troubleshooting Envoy Rate Limit Configuration with envoyproxy/ratelimit fix Configure Envoy global rate limiting with envoyproxy/ratelimit Docker image for production traffic control Kubernetes Configure Envoy global rate limiting with envoyproxy/ratelimit Docker image for production traffic control

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Kubernetes Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this kubernetes-errors troubleshooting guide applies to my situation?

This guide is designed for kubernetes-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these kubernetes-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of kubernetes-errors issue?

Most kubernetes-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this kubernetes-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for kubernetes-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Apr 23, 2026

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation

Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Real Scenario: Rate Limits Not Applied

Architecture Overview

Quick Start with Docker

Kubernetes Deployment

Step 1: Create Namespace and Redis

Step 2: Deploy Rate Limit Service

Step 3: Configure Envoy

Common Configuration Errors

Error 1: Domain Mismatch

Error 2: Missing HTTP/2 on gRPC Port

Error 3: All Requests Blocked

Error 4: Rate Limit Service Crashes

Error 5: Wrong Filter Order

Rate Limit Configuration Examples

Per-User Rate Limiting

IP-Based Rate Limiting

Nested Rate Limits

Testing Rate Limits

Production Best Practices

1. Use failure_mode_deny Carefully

2. Set Appropriate Timeouts

3. Monitor Rate Limit Metrics

4. Use Redis Sentinel for High Availability

Prevention

Additional Troubleshooting Steps

Step 5: Advanced Diagnostics ```bash # Deep diagnostic analysis kubernetes diagnostic analyze --full

Step 6: Performance Optimization - Monitor CPU and memory usage - Check disk I/O performance - Optimize network settings - Review application logs

Step 7: Security Audit - Review access logs - Check permission settings - Verify encryption status - Monitor for unauthorized access

Common Pitfalls and Solutions

Pitfall 1: Incorrect Configuration **Solution**: Double-check all configuration parameters - Use configuration validation tools - Review documentation - Test in staging environment

Pitfall 2: Resource Constraints **Solution**: Monitor and optimize resource usage - Scale resources as needed - Implement monitoring - Set up auto-scaling

Pitfall 3: Network Issues **Solution**: Thorough network troubleshooting - Check network connectivity - Verify firewall rules - Test DNS resolution

Real-World Case Studies

Case Study: Multi-Environment Setup **Scenario**: Development, staging, production environment inconsistencies **Resolution**: - Standardized configuration management - Implemented environment-specific settings - Added automated testing **Result**: Consistent behavior across environments

Best Practices Summary

Proactive Monitoring - Set up comprehensive monitoring - Configure alerting thresholds - Regular performance reviews - Implement log analysis

Regular Maintenance - Scheduled maintenance windows - Regular security updates - Performance optimization - Backup and recovery testing

Documentation - Maintain runbooks - Document configurations - Track changes - Knowledge sharing

Quick Reference Checklist

Related Articles

People also search for

Share this guide

More Kubernetes Troubleshooting Guides

Browse Guides from Other Categories

Kubernetes Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading

Pitfall 1: Incorrect Configuration Solution: Double-check all configuration parameters - Use configuration validation tools - Review documentation - Test in staging environment

Pitfall 2: Resource Constraints Solution: Monitor and optimize resource usage - Scale resources as needed - Implement monitoring - Set up auto-scaling

Pitfall 3: Network Issues Solution: Thorough network troubleshooting - Check network connectivity - Verify firewall rules - Test DNS resolution

Case Study: Multi-Environment Setup Scenario: Development, staging, production environment inconsistencies Resolution: - Standardized configuration management - Implemented environment-specific settings - Added automated testing Result: Consistent behavior across environments