Introduction
Auto Scaling Groups monitor instance health and replace unhealthy instances. When health checks fail, ASG terminates the "unhealthy" instance and launches a replacement. But if the root cause isn't fixed, the new instance also fails health checks, creating a replacement loop.
Symptoms
In the AWS Console:
Instance health: Unhealthy
Last status: ELB health check failedVia AWS CLI:
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name my-asg \
--query 'AutoScalingGroups[*].Instances[*].[InstanceId,HealthStatus]'
[
["i-abc123", "Unhealthy"]
]Activity history shows:
Terminating instance due to health check failure
Launching new instanceCommon Causes
- 1.Application not ready - App takes longer than grace period to start
- 2.Health check endpoint wrong - Path, port, or protocol misconfigured
- 3.Security group blocking - ELB can't reach instance on health check port
- 4.Instance startup failure - Application crashes or fails to start
- 5.Network issues - Instance can't reach dependencies
- 6.Resource limits - Instance runs out of CPU/memory during startup
- 7.DNS/routing issues - Health check hostname doesn't resolve correctly
Step-by-Step Fix
- 1.Check logs for specific error messages
- 2.Verify configuration settings
- 3.Test network connectivity
- 4.Review recent changes
- 5.Apply corrective action
- 6.Verify the fix
Step 1: Check Health Check Configuration
For ALB/NLB target groups:
aws elbv2 describe-target-groups --names my-target-group \
--query 'TargetGroups[*].[HealthCheckPath,HealthCheckPort,HealthCheckIntervalSeconds,HealthCheckTimeoutSeconds,HealthyThresholdCount,UnhealthyThresholdCount]'For Classic Load Balancers:
aws elb describe-load-balancers --load-balancer-names my-elb \
--query 'LoadBalancerDescriptions[*].HealthCheck'Common issues:
- HealthCheckPath returns 404 or 403
- HealthCheckPort doesn't match application port
- HealthCheckTimeoutSeconds too short for slow responses
- HealthyThresholdCount requires too many successful checks
Step 2: Check Target Health Details
aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/12345Output shows the specific failure reason:
{
"Target": {
"Id": "i-abc123",
"Port": 80
},
"TargetHealth": {
"State": "unhealthy",
"Reason": "Target.ResponseCodeMismatch",
"Description": "Health checks failed with this reason: Code mismatch"
}
}Common reason codes:
| Reason | Description |
|---|---|
Target.ResponseCodeMismatch | App returned unexpected HTTP code |
Target.Timeout | Health check timed out |
Target.ConnectionFailed | Could not establish TCP connection |
Elb.InternalError | ELB-side issue |
Step 3: Verify Security Group Rules
The instance must accept traffic from the load balancer:
```bash # Get instance security group aws ec2 describe-instances --instance-ids i-abc123 \ --query 'Reservations[*].Instances[*].SecurityGroups[*].GroupId'
# Check security group rules aws ec2 describe-security-groups --group-ids sg-12345 \ --query 'SecurityGroups[*].IpPermissions' ```
Required rule: - Allow inbound on health check port from ELB security group (or VPC CIDR)
aws ec2 authorize-security-group-ingress \
--group-id sg-instance \
--protocol tcp \
--port 80 \
--source-group sg-elbStep 4: Test Health Check Endpoint Manually
SSH into the instance and test:
```bash # Test health check endpoint curl -v http://localhost:80/health
# Check if application is listening netstat -tlnp | grep :80
# Check application logs sudo journalctl -u myapp -f ```
The endpoint should return: - HTTP 200 status code - Response within timeout - Valid response body (if configured)
Step 5: Adjust Health Check Grace Period
If the application takes time to start, increase grace period:
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-asg \
--health-check-grace-period 300Default is 300 seconds. Increase if: - Application has long startup time - Dependencies need time to connect - Database migrations run on startup
Step 6: Check Instance Logs
```bash # Get console output aws ec2 get-console-output --instance-id i-abc123 --output text
# For instances with SSM, use Session Manager aws ssm start-session --target i-abc123 ```
Check for: - Application startup errors - Missing configuration - Dependency connection failures - Permission errors
Step 7: Review ASG Activity History
aws autoscaling describe-scaling-activities \
--auto-scaling-group-name my-asg \
--max-records 20Look for patterns: - Consistent failures after specific timing - Specific error messages - Multiple replacement attempts
Step 8: Create Proper Health Check Endpoint
If your app doesn't have a dedicated health endpoint:
```python # FastAPI example @app.get("/health") async def health_check(): # Check database connection try: await db.execute("SELECT 1") except Exception as e: raise HTTPException(status_code=503, detail="Database unavailable")
# Check Redis connection try: await redis.ping() except Exception as e: raise HTTPException(status_code=503, detail="Redis unavailable")
return {"status": "healthy"} ```
```javascript // Express.js example app.get('/health', async (req, res) => { const checks = { database: await checkDatabase(), redis: await checkRedis(), };
const healthy = Object.values(checks).every(v => v); res.status(healthy ? 200 : 503).json(checks); }); ```
Step 9: Set Up Proper Monitoring
Create CloudWatch alarm for health check failures:
aws cloudwatch put-metric-alarm \
--alarm-name "asg-health-check-failures" \
--metric-name UnHealthyHostCount \
--namespace AWS/ApplicationELB \
--dimensions Name=TargetGroup,Value=tg-12345 Name=LoadBalancer,Value=app/my-alb/12345 \
--statistic Average \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 3Verification
```bash # Check instance health status aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-name my-asg \ --query 'AutoScalingGroups[*].Instances[*].[InstanceId,HealthStatus]'
# Should show "Healthy" for running instances
# Check target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/12345 ```
Related Issues
- [Fix AWS ALB Target Unhealthy](/articles/fix-aws-alb-target-unhealthy)
- [Fix AWS EC2 Instance Not Responding](/articles/fix-aws-ec2-instance-not-responding)
- [Fix AWS Auto Scaling Not Triggering](/articles/fix-aws-auto-scaling-not-triggering)
Related Articles
- [AWS troubleshooting: Fix IAM Permission Denied - Complete Tro](fix-iam-permission-denied)
- [AWS cloud troubleshooting: AWS ACM Certificate Pending Validation Because the](aws-acm-certificate-pending-validation-wrong-route53-zone)
- [AWS cloud troubleshooting: AWS ALB Returns 502 Because the Target Closed the ](aws-alb-502-target-closed-connection-keepalive-timeout-mismatch)
- [AWS cloud troubleshooting: Fix AWS ALB CreateListener TargetGroupNotFound Err](aws-alb-createlistener-targetgroupnotfound)
- [AWS cloud troubleshooting: Fix Aws Alb Lambda 502 Bad Gateway Issue in AWS](aws-alb-lambda-502-bad-gateway)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix AWS EC2 Auto Scaling Health Check Failed", "description": "Troubleshoot ASG health check failures. Fix ELB target health, adjust grace periods, and resolve instance initialization issues.", "url": "https://www.fixwikihub.com/fix-aws-ec2-auto-scaling-health-check-failed", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-03-31T20:48:48.228Z", "dateModified": "2026-03-31T20:48:48.228Z" } </script>