# Fix AWS CloudWatch MaxConnectionsExceeded Alarm
Your CloudWatch alarm fires at 3 AM:
ALARM: "RDS-High-Connection-Count" in region us-east-1
Threshold: 80% of max_connections
Current: 92%And your application starts returning errors:
FATAL: remaining connection slots are reserved for non-replication superuser connectionsjava.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.The database has hit its maximum connection limit. New connections are rejected, causing application failures.
Introduction
This article covers troubleshooting steps and solutions for Fix AWS CloudWatch MaxConnectionsExceeded Alarm. The error typically occurs in production environments and can cause service disruptions if not addressed promptly.
Symptoms
Common error messages include:
ALARM: "RDS-High-Connection-Count" in region us-east-1
Threshold: 80% of max_connections
Current: 92%FATAL: remaining connection slots are reserved for non-replication superuser connectionsjava.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.Common Causes
- Configuration misconfiguration
- Missing or incorrect credentials
- Network connectivity issues
- Version compatibility problems
- Resource exhaustion or limits
- Permission or access denied
Step-by-Step Fix
- 1.Check logs for specific error messages
- 2.Verify configuration settings
- 3.Test network connectivity
- 4.Review recent changes
- 5.Apply corrective action
- 6.Verify the fix
Real Scenario: E-Commerce Site Outage
An e-commerce company experienced intermittent outages during flash sales. Their RDS PostgreSQL instance (db.r5.large) has a default max_connections of 340. During peak traffic, CloudWatch showed connection counts hitting 340, and users couldn't complete purchases.
Initial diagnosis:
$ aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=production-db \
--start-time 2026-04-23T00:00:00Z \
--end-time 2026-04-23T04:00:00Z \
--period 300 \
--statistics Maximum \
--region us-east-1Output showed connections peaked at 340 (the maximum).
Root cause: They had 8 application instances, each with a HikariCP pool of 50 connections:
Total connections = 8 instances × 50 pool size = 400 connections
But max_connections = 340The math didn't work.
Immediate Diagnosis
Step 1: Check Current Connection Count
PostgreSQL RDS:
# Get connection count via psql
psql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
-U postgres \
-c "SELECT count(*) as total_connections,
(SELECT setting FROM pg_settings WHERE name = 'max_connections') as max_connections,
round(100.0 * count(*) / (SELECT setting::int FROM pg_settings WHERE name = 'max_connections'), 2) as utilization_pct
FROM pg_stat_activity;"Output:
total_connections | max_connections | utilization_pct
-------------------+-----------------+----------------
312 | 340 | 91.76
(1 row)MySQL RDS:
mysql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
-u admin -p \
-e "SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';"Output:
``` Variable_name: Threads_connected Value: 285
Variable_name: max_connections Value: 340 ```
Step 2: Identify Connection Sources
PostgreSQL - Who's connecting?
SELECT
client_addr,
usename as username,
application_name,
state,
count(*) as connection_count,
array_agg(DISTINCT datname) as databases
FROM pg_stat_activity
WHERE client_addr IS NOT NULL
GROUP BY client_addr, usename, application_name, state
ORDER BY connection_count DESC;Output:
client_addr | username | application_name | state | connection_count | databases
----------------+----------+------------------+--------+------------------+-----------
10.0.1.45 | app_user | myapp-api | active | 48 | {production}
10.0.1.46 | app_user | myapp-api | active | 47 | {production}
10.0.1.47 | app_user | myapp-api | active | 47 | {production}
10.0.1.48 | app_user | myapp-api | active | 48 | {production}
10.0.2.100 | readonly | analytics | idle | 25 | {production}
10.0.3.50 | admin | psql | idle | 5 | {production}This shows 4 API servers with ~48 connections each, plus analytics and admin connections.
MySQL - Who's connecting?
SELECT
USER,
HOST,
DB,
COMMAND,
COUNT(*) as connection_count
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST, DB, COMMAND
ORDER BY connection_count DESC;Step 3: Find Idle Connections
PostgreSQL - Connections idle for > 5 minutes:
SELECT
pid,
usename,
application_name,
client_addr,
now() - query_start AS idle_duration,
state,
query
FROM pg_stat_activity
WHERE state = 'idle'
AND (now() - query_start) > interval '5 minutes'
ORDER BY idle_duration DESC;Output:
pid | usename | application_name | client_addr | idle_duration | state | query
-------+----------+------------------+----------------+---------------+-------+-------
12345 | app_user | myapp-api | 10.0.1.45 | 00:15:23 | idle | SELECT * FROM users WHERE id = 123
12346 | app_user | myapp-api | 10.0.1.45 | 00:12:45 | idle | SELECT * FROM products WHERE id = 456
12347 | readonly | analytics | 10.0.2.100 | 00:45:12 | idle | SELECT COUNT(*) FROM ordersThese idle connections are likely from connection leaks.
Root Causes and Solutions
Cause 1: Connection Pool Too Large
Each application instance opens its own connection pool. If you have many instances, total connections exceed the database limit.
Calculate maximum pool size:
Max pool size = (DB max_connections × 0.8) / number_of_instancesFor db.r5.large (max_connections = 340) with 8 instances:
Max pool size = (340 × 0.8) / 8 = 34 connections per instanceFix HikariCP configuration:
# application.yml
spring:
datasource:
hikari:
maximum-pool-size: 30 # Reduced from 50
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
leak-detection-threshold: 60000 # Alert on leaksFix Node.js pg-pool:
```javascript const { Pool } = require('pg');
const pool = new Pool({ max: 30, // Maximum connections per instance min: 5, idleTimeoutMillis: 60000, connectionTimeoutMillis: 30000 }); ```
Cause 2: Connection Leaks
Application code doesn't properly release connections back to the pool.
Find leaked connections in PostgreSQL:
-- Find connections idle in transaction for > 1 minute
SELECT
pid,
usename,
application_name,
now() - xact_start AS transaction_duration,
query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND (now() - xact_start) > interval '1 minute'
ORDER BY transaction_duration DESC;Fix Java connection leak:
```java // WRONG - Leaks connection on exception public User getUser(Long id) { Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?"); stmt.setLong(1, id); ResultSet rs = stmt.executeQuery(); rs.next(); User user = mapUser(rs); conn.close(); // Never reached if exception above! return user; }
// CORRECT - try-with-resources public User getUser(Long id) { try (Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) { stmt.setLong(1, id); try (ResultSet rs = stmt.executeQuery()) { rs.next(); return mapUser(rs); } } // All resources automatically closed } ```
Fix Node.js connection leak:
```javascript // WRONG - Leaks on error async function getUser(id) { const client = await pool.connect(); const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); client.release(); // Never reached if query throws return result.rows[0]; }
// CORRECT - Use finally async function getUser(id) { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; } finally { client.release(); // Always executed } }
// BETTER - Use pool.query directly async function getUser(id) { const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; // Connection automatically managed } ```
Cause 3: Long-Running Transactions
Transactions holding connections for extended periods (batch jobs, reports).
Find long-running transactions:
SELECT
pid,
usename,
application_name,
now() - xact_start AS duration,
state,
left(query, 100) AS query_preview
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY duration DESC
LIMIT 10;Fix - Set transaction timeout:
-- PostgreSQL parameter group
idle_in_transaction_session_timeout = 300000 -- 5 minutes
statement_timeout = 60000 -- 1 minute per statement
lock_timeout = 30000 -- 30 seconds for locksCause 4: Too Many Application Instances
Horizontal scaling increased instances without adjusting pool sizes.
Calculate max instances:
Max instances = (DB max_connections × 0.8) / pool_size_per_instanceFor db.r5.large (340 max) with pool size 30:
Max instances = (340 × 0.8) / 30 = 9 instances- 1.Solutions:
- 2.Use PgBouncer for connection pooling
- 3.Scale vertically (larger instance = more connections)
- 4.Add read replicas for read traffic
Solution: Deploy PgBouncer
PgBouncer sits between applications and database, multiplexing connections:
Application (1000 connections) → PgBouncer (100 connections) → PostgreSQLDeploy PgBouncer on ECS:
# pgbouncer-task-definition.yaml
family: pgbouncer
containerDefinitions:
- name: pgbouncer
image: edoburu/pgbouncer:latest
essential: true
environment:
- name: DATABASE_URL
value: "postgres://app_user:password@production-db.xxxxx.rds.amazonaws.com/production"
- name: POOL_MODE
value: "transaction"
- name: MAX_CLIENT_CONN
value: "1000"
- name: DEFAULT_POOL_SIZE
value: "25"
- name: MIN_POOL_SIZE
value: "5"
- name: RESERVE_POOL_SIZE
value: "5"
- name: SERVER_IDLE_TIMEOUT
value: "300"
portMappings:
- containerPort: 5432
hostPort: 5432PgBouncer configuration file:
```ini [databases] production = host=production-db.xxxxx.rds.amazonaws.com port=5432 dbname=production
[pgbouncer] listen_addr = 0.0.0.0 listen_port = 5432 auth_type = scram-sha-256 auth_file = /etc/pgbouncer/userlist.txt pool_mode = transaction max_client_conn = 1000 default_pool_size = 25 min_pool_size = 5 reserve_pool_size = 5 reserve_pool_timeout = 3 server_idle_timeout = 300 client_idle_timeout = 0 client_lifetime = 0 ```
Update application to connect to PgBouncer:
spring:
datasource:
url: jdbc:postgresql://pgbouncer.internal:5432/production
hikari:
maximum-pool-size: 50 # Can be higher nowSolution: Increase Max Connections
Quick fix, but requires instance reboot for some parameters:
```bash # Create custom parameter group aws rds create-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --db-parameter-group-family postgres15 \ --description "Custom PostgreSQL 15 parameters"
# Modify max_connections aws rds modify-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --parameters "ParameterName=max_connections,ParameterValue=500,ApplyMethod=pending-reboot"
# Apply to instance aws rds modify-db-instance \ --db-instance-identifier production-db \ --db-parameter-group-name custom-postgres15 \ --apply-immediately
# Reboot instance aws rds reboot-db-instance \ --db-instance-identifier production-db ```
Warning: More connections require more memory. Each connection uses ~10MB of RAM.
Connection Limits by Instance Class
| Instance Class | vCPU | Memory | Default max_connections (PostgreSQL) |
|---|---|---|---|
| db.t3.micro | 2 | 1 GB | 55 |
| db.t3.small | 2 | 2 GB | 85 |
| db.t3.medium | 2 | 4 GB | 170 |
| db.t3.large | 2 | 8 GB | 170 |
| db.r5.large | 2 | 16 GB | 340 |
| db.r5.xlarge | 4 | 32 GB | 680 |
| db.r5.2xlarge | 8 | 64 GB | 1360 |
| db.r5.4xlarge | 16 | 128 GB | 2720 |
Formula: max_connections ≈ (Memory in MB × 0.4) / 10MB per connection
CloudWatch Alarm Configuration
Set up proactive alerting:
aws cloudwatch put-metric-alarm \
--alarm-name "RDS-High-Connection-Count" \
--alarm-description "Alert when database connections exceed 80% of max" \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=production-db \
--statistic Maximum \
--period 300 \
--evaluation-periods 2 \
--threshold 272 \
--comparison-operator GreaterThanThreshold \
--treat-missing-data notBreaching \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
--ok-actions arn:aws:sns:us-east-1:123456789012:alertsCalculate threshold: max_connections × 0.8 = 340 × 0.8 = 272
Emergency: Kill Idle Connections
If the database is completely blocked:
```sql -- PostgreSQL: Terminate idle connections older than 10 minutes SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND (now() - query_start) > interval '10 minutes' AND pid <> pg_backend_pid();
-- Or terminate all connections from a specific application SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name = 'problematic-app' AND pid <> pg_backend_pid(); ```
-- MySQL: Kill idle connections
SELECT CONCAT('KILL ', id, ';')
FROM information_schema.PROCESSLIST
WHERE Command = 'Sleep'
AND Time > 600;Checklist for Connection Issues
- 1.Check connection count:
- 2.```bash
- 3.aws cloudwatch get-metric-statistics --metric-name DatabaseConnections ...
- 4.
` - 5.Identify connection sources:
- 6.```sql
- 7.SELECT client_addr, count(*) FROM pg_stat_activity GROUP BY client_addr;
- 8.
` - 9.Find idle connections:
- 10.```sql
- 11.SELECT count(*) FROM pg_stat_activity WHERE state = 'idle';
- 12.
` - 13.Verify pool configuration:
- 14.```bash
- 15.# Check HikariCP settings
- 16.curl http://localhost:8080/actuator/configprops | jq '.contexts.application.beans.hikariDataSource'
- 17.
` - 18.Calculate if pool size is appropriate:
- 19.
` - 20.Total connections = instances × pool_size
- 21.Should be < max_connections × 0.8
- 22.
` - 23.Check for connection leaks:
- 24.```sql
- 25.SELECT count(*) FROM pg_stat_activity
- 26.WHERE state = 'idle' AND (now() - query_start) > interval '5 minutes';
- 27.
`
Additional Troubleshooting Steps
Step 5: Advanced Diagnostics ```bash # Deep diagnostic analysis aws diagnostic analyze --full
# Check system logs journalctl -u aws -n 100
# Network connectivity test nc -zv aws.local 443 ```
Step 6: Performance Optimization - Monitor CPU and memory usage - Check disk I/O performance - Optimize network settings - Review application logs
Step 7: Security Audit - Review access logs - Check permission settings - Verify encryption status - Monitor for unauthorized access
Common Pitfalls and Solutions
Pitfall 1: Incorrect Configuration **Solution**: Double-check all configuration parameters - Use configuration validation tools - Review documentation - Test in staging environment
Pitfall 2: Resource Constraints **Solution**: Monitor and optimize resource usage - Scale resources as needed - Implement monitoring - Set up auto-scaling
Pitfall 3: Network Issues **Solution**: Thorough network troubleshooting - Check network connectivity - Verify firewall rules - Test DNS resolution
Real-World Case Studies
Case Study: Large-Scale Deployment **Scenario**: Enterprise AWS deployment with Fix AWS CloudWatch MaxConnectionsExceeded Alarm errors **Resolution**: - Implemented comprehensive monitoring - Optimized configuration settings - Added redundancy and failover **Result**: 99.99% uptime achieved
Case Study: Multi-Environment Setup **Scenario**: Development, staging, production environment inconsistencies **Resolution**: - Standardized configuration management - Implemented environment-specific settings - Added automated testing **Result**: Consistent behavior across environments
Best Practices Summary
Proactive Monitoring - Set up comprehensive monitoring - Configure alerting thresholds - Regular performance reviews - Implement log analysis
Regular Maintenance - Scheduled maintenance windows - Regular security updates - Performance optimization - Backup and recovery testing
Documentation - Maintain runbooks - Document configurations - Track changes - Knowledge sharing
Quick Reference Checklist
- [ ] Check basic configuration
- [ ] Verify service status
- [ ] Review error logs
- [ ] Test connectivity
- [ ] Monitor resource usage
- [ ] Check security settings
- [ ] Validate permissions
- [ ] Review recent changes
- [ ] Test in staging
- [ ] Document resolution
This comprehensive troubleshooting guide covers all aspects of Fix AWS CloudWatch MaxConnectionsExceeded Alarm errors. For additional support, consult official documentation or contact professional services.
Related Articles
- [AWS troubleshooting: Fix IAM Permission Denied - Complete Tro](fix-iam-permission-denied)
- [AWS cloud troubleshooting: AWS ACM Certificate Pending Validation Because the](aws-acm-certificate-pending-validation-wrong-route53-zone)
- [AWS cloud troubleshooting: AWS ALB Returns 502 Because the Target Closed the ](aws-alb-502-target-closed-connection-keepalive-timeout-mismatch)
- [AWS cloud troubleshooting: Fix AWS ALB CreateListener TargetGroupNotFound Err](aws-alb-createlistener-targetgroupnotfound)
- [AWS cloud troubleshooting: Fix Aws Alb Lambda 502 Bad Gateway Issue in AWS](aws-alb-lambda-502-bad-gateway)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "AWS cloud troubleshooting: Fix AWS CloudWatch MaxConnectionsExceeded Alarm", "description": "Professional guide to fix Fix AWS CloudWatch MaxConnectionsExceeded Alarm. AWS cloud troubleshooting with step-by-step solutions. Learn best practices and prevention strategies.", "url": "https://www.fixwikihub.com/cloudwatch-maxconnectionsexceeded-rds-fix", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-23T10:00:00.000Z", "dateModified": "2026-04-23T10:00:00.000Z" } </script>