Introduction

Auto Scaling Groups monitor instance health and replace unhealthy instances. When health checks fail, ASG terminates the "unhealthy" instance and launches a replacement. But if the root cause isn't fixed, the new instance also fails health checks, creating a replacement loop.

Symptoms

In the AWS Console:

bash
Instance health: Unhealthy
Last status: ELB health check failed

Via AWS CLI:

bash
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name my-asg \
  --query 'AutoScalingGroups[*].Instances[*].[InstanceId,HealthStatus]'
[
  ["i-abc123", "Unhealthy"]
]

Activity history shows:

bash
Terminating instance due to health check failure
Launching new instance

Common Causes

  1. 1.Application not ready - App takes longer than grace period to start
  2. 2.Health check endpoint wrong - Path, port, or protocol misconfigured
  3. 3.Security group blocking - ELB can't reach instance on health check port
  4. 4.Instance startup failure - Application crashes or fails to start
  5. 5.Network issues - Instance can't reach dependencies
  6. 6.Resource limits - Instance runs out of CPU/memory during startup
  7. 7.DNS/routing issues - Health check hostname doesn't resolve correctly

Step-by-Step Fix

  1. 1.Check logs for specific error messages
  2. 2.Verify configuration settings
  3. 3.Test network connectivity
  4. 4.Review recent changes
  5. 5.Apply corrective action
  6. 6.Verify the fix

Step 1: Check Health Check Configuration

For ALB/NLB target groups:

bash
aws elbv2 describe-target-groups --names my-target-group \
  --query 'TargetGroups[*].[HealthCheckPath,HealthCheckPort,HealthCheckIntervalSeconds,HealthCheckTimeoutSeconds,HealthyThresholdCount,UnhealthyThresholdCount]'

For Classic Load Balancers:

bash
aws elb describe-load-balancers --load-balancer-names my-elb \
  --query 'LoadBalancerDescriptions[*].HealthCheck'

Common issues: - HealthCheckPath returns 404 or 403 - HealthCheckPort doesn't match application port - HealthCheckTimeoutSeconds too short for slow responses - HealthyThresholdCount requires too many successful checks

Step 2: Check Target Health Details

bash
aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/12345

Output shows the specific failure reason:

json
{
  "Target": {
    "Id": "i-abc123",
    "Port": 80
  },
  "TargetHealth": {
    "State": "unhealthy",
    "Reason": "Target.ResponseCodeMismatch",
    "Description": "Health checks failed with this reason: Code mismatch"
  }
}

Common reason codes:

ReasonDescription
Target.ResponseCodeMismatchApp returned unexpected HTTP code
Target.TimeoutHealth check timed out
Target.ConnectionFailedCould not establish TCP connection
Elb.InternalErrorELB-side issue

Step 3: Verify Security Group Rules

The instance must accept traffic from the load balancer:

```bash # Get instance security group aws ec2 describe-instances --instance-ids i-abc123 \ --query 'Reservations[*].Instances[*].SecurityGroups[*].GroupId'

# Check security group rules aws ec2 describe-security-groups --group-ids sg-12345 \ --query 'SecurityGroups[*].IpPermissions' ```

Required rule: - Allow inbound on health check port from ELB security group (or VPC CIDR)

bash
aws ec2 authorize-security-group-ingress \
  --group-id sg-instance \
  --protocol tcp \
  --port 80 \
  --source-group sg-elb

Step 4: Test Health Check Endpoint Manually

SSH into the instance and test:

```bash # Test health check endpoint curl -v http://localhost:80/health

# Check if application is listening netstat -tlnp | grep :80

# Check application logs sudo journalctl -u myapp -f ```

The endpoint should return: - HTTP 200 status code - Response within timeout - Valid response body (if configured)

Step 5: Adjust Health Check Grace Period

If the application takes time to start, increase grace period:

bash
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name my-asg \
  --health-check-grace-period 300

Default is 300 seconds. Increase if: - Application has long startup time - Dependencies need time to connect - Database migrations run on startup

Step 6: Check Instance Logs

```bash # Get console output aws ec2 get-console-output --instance-id i-abc123 --output text

# For instances with SSM, use Session Manager aws ssm start-session --target i-abc123 ```

Check for: - Application startup errors - Missing configuration - Dependency connection failures - Permission errors

Step 7: Review ASG Activity History

bash
aws autoscaling describe-scaling-activities \
  --auto-scaling-group-name my-asg \
  --max-records 20

Look for patterns: - Consistent failures after specific timing - Specific error messages - Multiple replacement attempts

Step 8: Create Proper Health Check Endpoint

If your app doesn't have a dedicated health endpoint:

```python # FastAPI example @app.get("/health") async def health_check(): # Check database connection try: await db.execute("SELECT 1") except Exception as e: raise HTTPException(status_code=503, detail="Database unavailable")

# Check Redis connection try: await redis.ping() except Exception as e: raise HTTPException(status_code=503, detail="Redis unavailable")

return {"status": "healthy"} ```

```javascript // Express.js example app.get('/health', async (req, res) => { const checks = { database: await checkDatabase(), redis: await checkRedis(), };

const healthy = Object.values(checks).every(v => v); res.status(healthy ? 200 : 503).json(checks); }); ```

Step 9: Set Up Proper Monitoring

Create CloudWatch alarm for health check failures:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name "asg-health-check-failures" \
  --metric-name UnHealthyHostCount \
  --namespace AWS/ApplicationELB \
  --dimensions Name=TargetGroup,Value=tg-12345 Name=LoadBalancer,Value=app/my-alb/12345 \
  --statistic Average \
  --period 60 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 3

Verification

```bash # Check instance health status aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-name my-asg \ --query 'AutoScalingGroups[*].Instances[*].[InstanceId,HealthStatus]'

# Should show "Healthy" for running instances

# Check target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/12345 ```

  • [Fix AWS ALB Target Unhealthy](/articles/fix-aws-alb-target-unhealthy)
  • [Fix AWS EC2 Instance Not Responding](/articles/fix-aws-ec2-instance-not-responding)
  • [Fix AWS Auto Scaling Not Triggering](/articles/fix-aws-auto-scaling-not-triggering)
  • [AWS troubleshooting: Fix IAM Permission Denied - Complete Tro](fix-iam-permission-denied)
  • [AWS cloud troubleshooting: AWS ACM Certificate Pending Validation Because the](aws-acm-certificate-pending-validation-wrong-route53-zone)
  • [AWS cloud troubleshooting: AWS ALB Returns 502 Because the Target Closed the ](aws-alb-502-target-closed-connection-keepalive-timeout-mismatch)
  • [AWS cloud troubleshooting: Fix AWS ALB CreateListener TargetGroupNotFound Err](aws-alb-createlistener-targetgroupnotfound)
  • [AWS cloud troubleshooting: Fix Aws Alb Lambda 502 Bad Gateway Issue in AWS](aws-alb-lambda-502-bad-gateway)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix AWS EC2 Auto Scaling Health Check Failed", "description": "Troubleshoot ASG health check failures. Fix ELB target health, adjust grace periods, and resolve instance initialization issues.", "url": "https://www.fixwikihub.com/fix-aws-ec2-auto-scaling-health-check-failed", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-03-31T20:48:48.228Z", "dateModified": "2026-03-31T20:48:48.228Z" } </script>