Introduction
Nagios escalation defines how notifications are sent when problems persist or worsen over time. Instead of sending the same alert to the same contacts repeatedly, escalations can widen the notification scope, change notification methods, or involve management after initial alerts go unacknowledged. This tiered notification system ensures critical issues eventually reach someone who can address them.
When escalations fail to trigger, alerts remain confined to the initial contact group, potentially leaving critical issues unnoticed for extended periods. The causes range from misconfigured escalation definitions, incorrect timeperiod references, contact group mismatches, to notification option restrictions that suppress escalation messages. Understanding Nagios' notification logic—including how escalation criteria are evaluated and how timeperiods interact with notification windows—is essential for debugging and fixing escalation failures.
Nagios evaluates escalation conditions based on the problem state duration, notification count, and the current timeperiod. Each escalation definition specifies when it should activate based on these factors. If any element of the escalation chain is misconfigured, the escalation never fires, leaving the problem notification stuck at the base level.
Symptoms
When Nagios escalation is not working, you will observe these symptoms:
- Alerts remain stuck sending to the initial contact group despite prolonged problem duration
- Escalation contacts never receive notifications even after problems persist for hours
- Nagios logs show no escalation entries despite meeting escalation criteria
- Management or on-call personnel are not notified when initial responders fail to acknowledge
- Escalation works for some hosts/services but not others with similar configurations
- Notification history shows repeated alerts to the same contacts without escalation progression
- Escalation triggers after random delays instead of at defined thresholds
Common log patterns indicating escalation issues:
``` # Nagios notification log showing no escalation [2026-01-15 10:30:00] SERVICE ALERT: webserver;HTTP;CRITICAL;SOFT;1;Connection refused [2026-01-15 10:35:00] SERVICE ALERT: webserver;HTTP;CRITICAL;SOFT;2;Connection refused [2026-01-15 10:40:00] SERVICE ALERT: webserver;HTTP;CRITICAL;HARD;3;Connection refused # No escalation entries despite problem reaching HARD state
# Expected escalation log entry that should appear but doesn't [2026-01-15 11:00:00] SERVICE ESCALATION: webserver;HTTP;CRITICAL;escalation-1;notify-by-email ```
Nagios web interface showing stuck notifications:
Service: HTTP on webserver
Status: CRITICAL (HARD state)
Duration: 2h 30m
Last Notification: initial-contact (30 notifications sent)
Expected: Should have escalated to management after 1 hourCommon Causes
Several factors cause Nagios escalation failures:
- 1.Escalation timeperiod mismatch: The escalation definition specifies a
timeperiodthat doesn't include the current time. If escalation is configured to only trigger during "workhours" but the problem occurs at night, escalation never fires. - 2.Incorrect first or last notification values: The
first_notificationandlast_notificationparameters define when escalation activates. Iffirst_notificationis set too high (e.g., 10), escalation won't trigger until the 10th notification, which may never occur if notification interval is large. - 3.Notification options restrictions: The service or host definition may have
notification_optionsthat exclude the problem state. For example, if escalation is for CRITICAL state butnotification_optionsonly includes WARNING, escalation won't trigger. - 4.Contact group not defined or empty: The escalation references a contact group that doesn't exist or has no members. Nagios silently skips escalations with invalid contact references.
- 5.Escalation definition not applied to host/service: The escalation's host/service matching criteria don't match the problematic host or service. This commonly happens after hostgroup/servicegroup changes.
- 6.Notification interval too long: If the notification interval exceeds the escalation window, notifications may never reach the count required for escalation.
- 7.Escalation overlaps incorrectly: Multiple escalation definitions may conflict, causing Nagios to use only one based on order of definition.
- 8.State type mismatch (SOFT vs HARD): Escalations typically work on HARD states. If a problem stays in SOFT state longer than expected due to retry configuration, escalation timing shifts.
Step-by-Step Fix
Follow these steps to diagnose and resolve Nagios escalation issues:
Step 1: Verify the escalation definition exists
Check the escalation configuration:
```bash # Check escalation definitions grep -A 20 "define serviceescalation" /etc/nagios/objects/escalations.cfg
# Or search all config files grep -r "define.*escalation" /etc/nagios/
# List escalation objects via Nagios CGI (if web interface available) # Navigate to: Configuration > Escalation Definitions ```
Example escalation definition:
define serviceescalation {
host_name webserver
service_description HTTP
first_notification 5
last_notification 0 ; 0 means no upper limit
notification_interval 30
contact_groups managers,oncall
escalation_options c,r ; c=critical, r=recovery
timeperiod_name 24x7
}Step 2: Check timeperiod configuration
Verify the timeperiod referenced in the escalation:
```bash # Check timeperiod definitions grep -A 10 "define timeperiod" /etc/nagios/objects/timeperiods.cfg | grep -A 10 "24x7"
# Specific timeperiod check grep -A 15 "timeperiod_name.*workhours" /etc/nagios/objects/timeperiods.cfg ```
Timeperiod definition example:
define timeperiod {
timeperiod_name workhours
alias Normal Work Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}If escalation uses "workhours" but the problem occurs outside those hours, escalation won't fire. Use 24x7 timeperiod for critical escalations that should trigger anytime.
Step 3: Verify notification count and state
Check how many notifications have been sent:
```bash # View current problem state nagios-cli status | grep webserver
# Check notification count in Nagios status log grep "webserver;HTTP" /var/log/nagios/status.dat | grep -E "current_notification_number|state_type"
# Use Nagios web interface # Navigate to: Services > [Service] > Extended Information # Look for: "Current Notification Number" and "State Type" ```
Status.dat entries showing notification state:
current_notification_number=3
state_type=1 ; 1=HARD state, 0=SOFT state
last_notification=1734218400
next_notification=1734219000Step 4: Check contact group membership
Verify contact groups have valid members:
```bash # Check contact group definitions grep -A 10 "define contactgroup" /etc/nagios/objects/contacts.cfg
# Verify specific escalation contact group grep -A 5 "contactgroup_name.*managers" /etc/nagios/objects/contacts.cfg
# Check contact definitions grep -A 15 "define contact" /etc/nagios/objects/contacts.cfg | grep -E "contact_name|email|pager" ```
Contact group definition:
define contactgroup {
contactgroup_name managers
alias Management Team
members manager1,manager2,manager3
}Step 5: Verify escalation applies to the correct host/service
Check host/service matching in escalation:
```bash # Check if escalation host_name matches actual host grep "host_name" /etc/nagios/objects/hosts.cfg | grep webserver
# If using hostgroups, verify hostgroup membership grep -A 20 "define hostgroup" /etc/nagios/objects/hostgroups.cfg | grep -E "hostgroup_name|members"
# For service escalations using wildcards grep -A 10 "service_description.**" /etc/nagios/objects/escalations.cfg ```
Step 6: Test escalation by forcing a test notification
Trigger a test to verify notification flow:
```bash # Force Nagios to process an external command echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;webserver;HTTP;2;Test escalation" > /var/spool/nagios/nagios.cmd
# Or use nagios-cli if available nagios-cli submit webserver HTTP 2 "Test escalation trigger"
# Watch the notification log tail -f /var/log/nagios/nagios.log | grep -E "SERVICE|ESCALATION|NOTIFICATION" ```
Step 7: Fix the escalation configuration
Update the escalation definition to correct issues:
# Edit escalation configuration
vim /etc/nagios/objects/escalations.cfgCorrected escalation configuration:
define serviceescalation {
host_name webserver
service_description HTTP
first_notification 3 ; Trigger after 3rd notification
last_notification 0 ; No upper limit
notification_interval 30 ; 30 min between escalation notifications
contact_groups managers,oncall
escalation_options c,w,r ; c=critical, w=warning, r=recovery
timeperiod_name 24x7 ; Always active, not workhours only
}Step 8: Verify configuration and restart Nagios
Validate and apply the corrected configuration:
```bash # Verify configuration syntax nagios -v /etc/nagios/nagios.cfg
# Expected output # Total Warnings: 0 # Total Errors: 0
# If validation passes, restart Nagios systemctl restart nagios # Or service nagios restart
# Watch startup for errors tail -f /var/log/nagios/nagios.log ```
Verification
After fixing escalation configuration, verify it works correctly:
```bash # Trigger a test alert that should escalate echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;webserver;HTTP;2;Critical test" > /var/spool/nagios/nagios.cmd
# Wait for notifications to accumulate sleep 300
# Check notification log for escalation grep "ESCALATION" /var/log/nagios/nagios.log | tail -20
# Expected output showing escalation triggered [1734221400] SERVICE ESCALATION ALERT: webserver;HTTP;CRITICAL;escalation-1;NOTIFICATION TYPE=PROBLEM;CONTACT=manager1
# Verify contact received notification grep "manager1" /var/log/nagios/nagios.log | tail -10
# Check current notification state in status.dat grep "webserver;HTTP" /var/log/nagios/status.dat | grep -E "current_notification_number|state_type" ```
Prevention
To prevent Nagios escalation issues:
- 1.Use 24x7 timeperiod for critical escalations: Don't limit escalation triggers to business hours for critical systems.
define serviceescalation {
timeperiod_name 24x7
...
}- 1.Set appropriate first_notification values: Don't set
first_notificationtoo high. For critical escalations, use values like 2-3.
define serviceescalation {
first_notification 2 ; Escalate quickly
last_notification 0 ; Continue until resolved
}- 1.Include all relevant state options: Ensure
escalation_optionsincludes states you want to escalate.
escalation_options c,w,u,r ; critical, warning, unknown, recovery- 1.Test escalations after configuration changes: After modifying escalations, trigger test alerts to verify the chain works.
# Test script for escalation verification
for i in 1 2 3 4 5; do
echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;testhost;HTTP;2;Test alert $i" > /var/spool/nagios/nagios.cmd
sleep 60
grep "ESCALATION" /var/log/nagios/nagios.log | tail -1
done- 1.Monitor escalation logs: Set up regular checks for escalation activity alongside regular alert monitoring.
# Daily escalation check script
grep "ESCALATION" /var/log/nagios/nagios.log | wc -l
# If count is 0 over a week with alerts, escalation may be broken- 1.Document escalation chains: Keep clear documentation of escalation tiers, timing, and contacts.
## Escalation Chain for webserver HTTP
- Level 0: initial-contact (first 3 notifications)
- Level 1: managers (notifications 4+, 30min interval)
- Level 2: oncall+pagers (notifications 8+, 15min interval)- 1.Validate contact group membership: Periodically verify contact groups have active members.
# Check contact groups for empty membership
grep -A 5 "define contactgroup" /etc/nagios/objects/contacts.cfg | grep "members" | while read line; do
if [[ "$line" =~ members=\"\" ]] || [[ "$line" =~ members=.*$ ]]; then
echo "WARNING: Empty contact group found"
fi
doneRelated Articles
- [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
- [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
- [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
- [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
- [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Nagios Escalation Not Working", "description": "Resolve Nagios escalation issues by checking escalation definitions, timeperiod configurations, contact group memberships, and notification restrictions.", "url": "https://www.fixwikihub.com/fix-monitoring-nagios-escalation-not-working", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-01-23T05:15:12.498Z", "dateModified": "2026-01-23T05:15:12.498Z" } </script>