Home / Elasticsearch / Fix OpenSearch Cluster Red Status

Elasticsearch

Fix OpenSearch Cluster Red Status

Resolve OpenSearch cluster red status by understanding shard allocation failures, unassigned shards, and cluster health recovery procedures.

Published: Dec 12, 20258 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

Introduction

Your OpenSearch cluster shows red status, indicating some primary shards are not allocated. This means some data is unavailable and search operations on affected indices will fail.

Symptoms

Cluster health red:

```bash $ curl -XGET 'http://localhost:9200/_cluster/health?pretty'

{ "cluster_name" : "my-cluster", "status" : "red", "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 45, "active_shards" : 90, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 5, "delayed_unassigned_shards" : 0 } ```

Unassigned primary shards:

```bash $ curl -XGET 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason'

index shard prirep state unassigned.reason myindex 0 p UNASSIGNED NODE_LEFT myindex 1 p UNASSIGNED ALLOCATION_FAILED ```

Common Causes

1.Node failure - Node hosting primary shard left cluster
2.Disk space - Node exceeded disk watermark
3.Allocation failure - Shard allocation failed repeatedly
4.Configuration error - Shard allocation settings prevent assignment
5.Network partition - Nodes cannot communicate
6.Corrupted shard - Shard data corruption
7.JVM OOM - Node crashed due to memory
8.Version mismatch - Incompatible OpenSearch versions

Step-by-Step Fix

1.Check logs for specific error messages
2.Verify configuration settings
3.Test network connectivity
4.Review recent changes
5.Apply corrective action
6.Verify the fix

Step 1: Diagnose Cluster Status

```bash # Get cluster health: curl -XGET 'http://localhost:9200/_cluster/health?pretty'

# Get detailed cluster state: curl -XGET 'http://localhost:9200/_cluster/state?pretty'

# Check unassigned shards: curl -XGET 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason' | grep UNASSIGNED

# Get allocation explanation: curl -XGET 'http://localhost:9200/_cluster/allocation/explain?pretty'

# Check node stats: curl -XGET 'http://localhost:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,disk.used_percent'

# Check disk space: curl -XGET 'http://localhost:9200/_cat/allocation?v' ```

Step 2: Identify Unassigned Shard Reason

```bash # Get detailed explanation for unassigned shards: curl -XGET 'http://localhost:9200/_cluster/allocation/explain?pretty' -d '{ "index": "myindex", "shard": 0, "primary": true }'

# Common reasons: # - NODE_LEFT: Node left cluster # - ALLOCATION_FAILED: Allocation failed multiple times # - CLUSTER_RECOVERED: Cluster recovery in progress # - REINITIALIZED: Shard reinitialized # - DANGLING_INDEX_IMPORTED: Dangling index

# Check allocation settings: curl -XGET 'http://localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty'

# Check for disk watermarks: curl -XGET 'http://localhost:9200/_cluster/settings?pretty' | grep -A 5 watermark ```

Step 3: Fix Disk Space Issues

```bash # Check disk usage per node: curl -XGET 'http://localhost:9200/_cat/allocation?v'

# If disk > 85% (default flood stage watermark): # Option 1: Free disk space # Option 2: Adjust watermarks temporarily:

curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.routing.allocation.disk.watermark.flood_stage": "98%" } }'

# Reroute shards after freeing space: curl -XPOST 'http://localhost:9200/_cluster/reroute?retry_failed=true'

# Delete unnecessary indices: curl -XDELETE 'http://localhost:9200/old-index-*'

# Or close indices: curl -XPOST 'http://localhost:9200/old-index/_close' ```

Step 4: Handle Node Failure

```bash # Check cluster nodes: curl -XGET 'http://localhost:9200/_cat/nodes?v'

# If node left, check pending tasks: curl -XGET 'http://localhost:9200/_cluster/pending_tasks?pretty'

# If node won't return, remove from cluster: # Wait for timeout (default 30m), then shards will be reassigned

# Force allocation to remaining nodes: curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '{ "commands": [{ "allocate_stale_primary": { "index": "myindex", "shard": 0, "node": "node-1", "accept_data_loss": true } }] }'

# WARNING: accept_data_loss may lose data # Only use if no replica available

# Check for replica shards: curl -XGET 'http://localhost:9200/_cat/shards/myindex?v' | grep -v UNASSIGNED ```

Step 5: Fix Allocation Failures

```bash # Check allocation settings: curl -XGET 'http://localhost:9200/_cluster/settings?pretty'

# Enable allocation if disabled: curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{ "persistent": { "cluster.routing.allocation.enable": "all" } }'

# Reset allocation filters: curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{ "persistent": { "cluster.routing.allocation.exclude._name": null, "cluster.routing.allocation.require._name": null } }'

# Retry failed allocations: curl -XPOST 'http://localhost:9200/_cluster/reroute?retry_failed=true'

# Check shard corruption: curl -XGET 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state' | grep -E "INITIALIZING|RELOCATING" ```

Step 6: Handle Corrupted Shards

```bash # Check for corrupted shards in logs: grep -i "corrupt" /var/log/opensearch/my-cluster.log

# List corrupted shards: curl -XGET 'http://localhost:9200/_shard_stores?pretty'

# Remove corrupted shard (data loss): curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '{ "commands": [{ "allocate_empty_primary": { "index": "myindex", "shard": 0, "node": "node-1", "accept_data_loss": true } }] }'

# Better: restore from snapshot if available ```

Step 7: Resolve Network Partition

```bash # Check nodes communication: curl -XGET 'http://localhost:9200/_nodes/stats/os?pretty'

# Check ping responses: for node in node1 node2 node3; do curl -XGET "http://$node:9200/_cluster/health?pretty" done

# Check firewall rules: sudo iptables -L -n | grep 9200 sudo iptables -L -n | grep 9300

# Minimum master nodes setting: # Prevent split-brain: (nodes/2) + 1 curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{ "persistent": { "discovery.zen.minimum_master_nodes": 2 } }'

# Restart nodes to rejoin: systemctl restart opensearch ```

Step 8: Restore from Snapshot

```bash # Register snapshot repository: curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{ "type": "fs", "settings": { "location": "/mnt/backups/my_backup" } }'

# List snapshots: curl -XGET 'http://localhost:9200/_snapshot/my_backup/_all?pretty'

# Restore snapshot: curl -XPOST 'http://localhost:9200/_snapshot/my_backup/snapshot_1/_restore' -d '{ "indices": "myindex", "ignore_unavailable": true, "include_global_state": false }'

# Restore to new index: curl -XPOST 'http://localhost:9200/_snapshot/my_backup/snapshot_1/_restore' -d '{ "indices": "myindex", "rename_pattern": "(.+)", "rename_replacement": "restored_$1" }'

# Check restore status: curl -XGET 'http://localhost:9200/_cat/recovery?v' ```

Step 9: Prevent Future Issues

```bash # Configure replication: curl -XPUT 'http://localhost:9200/_template/default_replicas' -d '{ "index_patterns": ["*"], "settings": { "number_of_replicas": 1 } }'

# Set up snapshot automation: curl -XPUT 'http://localhost:9200/_snapshot/automated_backup' -d '{ "type": "fs", "settings": { "location": "/mnt/backups/automated" } }'

# Create snapshot lifecycle: curl -XPUT 'http://localhost:9200/_slm/policy/nightly-snapshots' -d '{ "schedule": "0 30 1 * * ?", "name": "<nightly-snap-{now/d}>", "repository": "automated_backup", "config": { "indices": ["*"] } }'

# Monitor cluster health: curl -XGET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=30s' ```

Step 10: Production Cluster Health Monitoring

```bash # Monitoring script: cat << 'EOF' > /usr/local/bin/opensearch-health.sh #!/bin/bash

HOST="localhost:9200"

echo "=== OpenSearch Cluster Health ==="

# Cluster status STATUS=$(curl -s "http://$HOST/_cluster/health" | jq -r '.status') echo "Status: $STATUS"

if [ "$STATUS" = "red" ]; then echo "ALERT: Cluster is RED!"

echo -e "\nUnassigned shards:" curl -s "http://$HOST/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason" | grep UNASSIGNED

echo -e "\nAllocation explain:" curl -s "http://$HOST/_cluster/allocation/explain?pretty" | jq '.description'

echo -e "\nNode disk usage:" curl -s "http://$HOST/_cat/allocation?v" fi

echo -e "\nNode stats:" curl -s "http://$HOST/_cat/nodes?v&h=name,heap.percent,ram.percent,disk.used_percent,load_1m"

echo -e "\nIndices count:" curl -s "http://$HOST/_cat/indices?v&health=red" | head -10 EOF

chmod +x /usr/local/bin/opensearch-health.sh

# Prometheus metrics: curl -XGET 'http://localhost:9200/_nodes/stats?metric=fs,os,jvm,process&format=prometheus'

# Key metrics to monitor: # - cluster_health_status # - cluster_health_number_of_unassigned_shards # - fs_total_disk_used_percent # - jvm_mem_heap_used_percent ```

OpenSearch Cluster Red Status Checklist

Check	Command	Expected
Cluster health	_cluster/health	green/yellow
Unassigned shards	_cat/shards	None UNASSIGNED
Disk usage	_cat/allocation	< 85%
Node status	_cat/nodes	All nodes present
Allocation enabled	_cluster/settings	"all"
Replicas	index settings	>= 1

Verification

```bash # After fixing cluster:

# 1. Check cluster health curl -XGET 'http://localhost:9200/_cluster/health?pretty' # Output: "status" : "green" or "yellow"

# 2. Verify no unassigned shards curl -XGET 'http://localhost:9200/_cat/shards?v' | grep UNASSIGNED # Output: Empty

# 3. Check all indices green curl -XGET 'http://localhost:9200/_cat/indices?v' | grep -v green # Output: Only yellow/yellow indices listed

# 4. Test search operations curl -XGET 'http://localhost:9200/myindex/_search?size=1' # Output: Valid search results

# Compare before/after: # Before: status: red, 5 unassigned shards # After: status: green, 0 unassigned shards ```

[Fix Elasticsearch Cluster Yellow Status](/articles/fix-elasticsearch-cluster-yellow-status)
[Fix Elasticsearch Shard Allocation Failed](/articles/fix-elasticsearch-shard-allocation-failed)
[Fix Elasticsearch Index Corrupted](/articles/fix-elasticsearch-index-corrupted)

[Technical troubleshooting: Fix bulk request rejected 429 too many requests Is](bulk-request-rejected-429-too-many-requests)
[Technical troubleshooting: Fix circuit breaker tripped heap memory Issue in E](circuit-breaker-tripped-heap-memory)
[Technical troubleshooting: Fix cluster red missing replica shards Issue in El](cluster-red-missing-replica-shards)
[Fix cross cluster search remote unreachable Issue in Elasticsearch-Errors](cross-cluster-search-remote-unreachable)
[Fix Elasticsearch Bulk Request Rejected 429 Too Many Requests Issue in Elasticsearch](elasticsearch-bulk-request-rejected-429-too-many-requests)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

OpenSearch Cluster Red Status OpenSearch Cluster Red Status Elasticsearch OpenSearch Cluster Red Status troubleshooting OpenSearch Cluster Red Status fix Resolve OpenSearch cluster red status by understanding shard allocation failures, unassigned shards, and cluster health recovery procedures Elasticsearch Resolve OpenSearch cluster red status by understanding shard allocation failures, unassigned shards, and cluster health recovery procedures

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Elasticsearch Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this elasticsearch-errors troubleshooting guide applies to my situation?

This guide is designed for elasticsearch-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these elasticsearch-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of elasticsearch-errors issue?

Most elasticsearch-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this elasticsearch-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for elasticsearch-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Dec 12, 2025

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation

Fix OpenSearch Cluster Red Status

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Diagnose Cluster Status

Step 2: Identify Unassigned Shard Reason

Step 3: Fix Disk Space Issues

Step 4: Handle Node Failure

Step 5: Fix Allocation Failures

Step 6: Handle Corrupted Shards

Step 7: Resolve Network Partition

Step 8: Restore from Snapshot

Step 9: Prevent Future Issues

Step 10: Production Cluster Health Monitoring

OpenSearch Cluster Red Status Checklist

Verification

People also search for

Browse Guides from Other Categories

WordPress

SSL

DNS

Elasticsearch Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading

Fix OpenSearch Cluster Red Status

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Diagnose Cluster Status

Step 2: Identify Unassigned Shard Reason

Step 3: Fix Disk Space Issues

Step 4: Handle Node Failure

Step 5: Fix Allocation Failures

Step 6: Handle Corrupted Shards

Step 7: Resolve Network Partition

Step 8: Restore from Snapshot

Step 9: Prevent Future Issues

Step 10: Production Cluster Health Monitoring

OpenSearch Cluster Red Status Checklist

Verification

Related Issues

Related Articles

People also search for

Share this guide

More Elasticsearch Troubleshooting Guides

Browse Guides from Other Categories

Elasticsearch Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading