Alerts are firing in Prometheus, but you're not receiving notifications. This critical gap means your incident response is compromised. Let's systematically diagnose and fix Alertmanager notification failures.

Introduction

Alertmanager notification failures can occur at several points:

  • Alert routing and grouping
  • Receiver configuration
  • Network connectivity to notification services
  • Authentication with external services
  • Template rendering issues

Common error patterns:

bash
notify retry for *slack.Notifier: unexpected status code 404
bash
notify retry for *email.Email: dial tcp: lookup smtp.gmail.com: no such host
bash
notify retry for *pagerduty.PagerDuty: unexpected status code 401

Symptoms

Common error messages include:

bash
notify retry for *slack.Notifier: unexpected status code 404
bash
notify retry for *email.Email: dial tcp: lookup smtp.gmail.com: no such host
bash
notify retry for *pagerduty.PagerDuty: unexpected status code 401

Common Causes

  • Configuration misconfiguration
  • Missing or incorrect credentials
  • Network connectivity issues
  • Version compatibility problems
  • Resource exhaustion or limits
  • Permission or access denied

Step-by-Step Fix

  1. 1.Check logs for specific error messages
  2. 2.Verify configuration settings
  3. 3.Test network connectivity
  4. 4.Review recent changes
  5. 5.Apply corrective action
  6. 6.Verify the fix

Initial Diagnosis

Start by checking Alertmanager's status and logs:

```bash # Check Alertmanager UI # Navigate to http://alertmanager:9093

# Check Alertmanager status via API curl -s http://localhost:9093/api/v2/status | jq '.'

# View active alerts curl -s http://localhost:9093/api/v2/alerts | jq '.[] | {labels: .labels, status: .status}'

# Check Alertmanager logs kubectl logs -l app=alertmanager -n monitoring | grep -i "notify|error|failed"

# Or for systemd journalctl -u alertmanager -f | grep -i "notify|error" ```

Common Cause 1: Slack Notification Failures

Slack is one of the most common notification channels, and failures usually stem from webhook URL issues or permission problems.

Error pattern: `` notify retry for *slack.Notifier: unexpected status code 404

bash
notify retry for *slack.Notifier: invalid_auth

Diagnosis:

```bash # Test Slack webhook directly curl -X POST -H 'Content-type: application/json' \ --data '{"text":"Test alert from Alertmanager"}' \ https://hooks.slack.com/services/YOUR/WEBHOOK/URL

# Check Alertmanager configuration curl -s http://localhost:9093/api/v2/status | jq '.config.original'

# Look for Slack-specific errors in logs kubectl logs -l app=alertmanager -n monitoring | grep -i slack ```

Solution:

Verify and update Slack configuration:

```yaml # alertmanager.yml route: receiver: 'slack-notifications' routes: - match: severity: critical receiver: 'slack-critical'

receivers: - name: 'slack-notifications' slack_configs: - api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXX' channel: '#alerts' send_resolved: true title: '{{ .Status | toUpper }}: {{ .CommonLabels.alertname }}' text: >- {{ range .Alerts }} *Alert:* {{ .Labels.alertname }} *Severity:* {{ .Labels.severity }} *Description:* {{ .Annotations.description }} *Details:* {{ range .Labels.SortedPairs }} • *{{ .Name }}:* {{ .Value }} {{ end }} {{ end }} ```

If using Slack App tokens:

yaml
receivers:
  - name: 'slack-app'
    slack_configs:
      - api_url: 'https://slack.com/api/chat.postMessage'
        api_url_file: '/etc/alertmanager/slack-token'
        channel: '#alerts'
        http_config:
          authorization:
            type: Bearer
            credentials_file: '/etc/alertmanager/slack-token'

Test the configuration:

```bash # Validate Alertmanager config amtool check-config alertmanager.yml

# Reload configuration curl -X POST http://localhost:9093/-/reload ```

Common Cause 2: Email Notification Failures

Email delivery issues are common due to SMTP authentication and network problems.

Error pattern: `` notify retry for *email.Email: dial tcp: lookup smtp.gmail.com: no such host

bash
notify retry for *email.Email: 535 5.7.8 Username and Password not accepted

Diagnosis:

```bash # Test SMTP connectivity telnet smtp.gmail.com 587 # Then type: EHLO localhost # STARTTLS # etc.

# Or use openssl openssl s_client -connect smtp.gmail.com:587 -starttls smtp

# Check DNS resolution nslookup smtp.gmail.com dig smtp.gmail.com

# Check Alertmanager logs for SMTP errors grep -i "smtp|email|dial" /var/log/alertmanager/alertmanager.log ```

Solution:

Update email configuration with correct SMTP settings:

```yaml # alertmanager.yml global: smtp_smarthost: 'smtp.gmail.com:587' smtp_from: 'alertmanager@yourdomain.com' smtp_auth_username: 'your-email@gmail.com' smtp_auth_password: 'your-app-password' smtp_require_tls: true

receivers: - name: 'email-notifications' email_configs: - to: 'team@yourdomain.com' send_resolved: true html: '{{ template "email.html" . }}' ```

For services requiring app passwords:

```bash # Gmail requires app-specific passwords # Generate at: https://myaccount.google.com/apppasswords

# Store securely in Kubernetes secret kubectl create secret generic alertmanager-smtp \ --from-literal=password='your-app-password' \ -n monitoring ```

Mount the secret and use it:

```yaml # In alertmanager.yml global: smtp_auth_password_file: '/etc/alertmanager/smtp-password'

# In Kubernetes deployment volumeMounts: - name: smtp-secret mountPath: /etc/alertmanager/smtp-password subPath: password ```

Common Cause 3: PagerDuty Integration Issues

PagerDuty integration failures usually involve API key issues or routing problems.

Error pattern: `` notify retry for *pagerduty.PagerDuty: unexpected status code 401

bash
notify retry for *pagerduty.PagerDuty: Invalid Routing Key

Diagnosis:

```bash # Test PagerDuty API directly curl -X POST https://events.pagerduty.com/v2/enqueue \ -H 'Content-Type: application/json' \ -H 'Authorization: Token token=your-integration-key' \ -d '{ "routing_key": "your-routing-key", "event_action": "trigger", "dedup_key": "test-alert", "payload": { "summary": "Test alert from Alertmanager", "severity": "critical", "source": "Alertmanager" } }'

# Check PagerDuty configuration amtool config show | grep -A 20 pagerduty ```

Solution:

Correct PagerDuty configuration:

yaml
# alertmanager.yml
receivers:
  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: 'your-integration-key'
        url: 'https://events.pagerduty.com/v2/enqueue'
        severity: critical
        class: 'deployment'
        group: 'production'
        component: 'application'
        details:
          firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
          resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
          num_firing: '{{ .Alerts.Firing | len }}'
          num_resolved: '{{ .Alerts.Resolved | len }}'

For Events API V2:

yaml
receivers:
  - name: 'pagerduty-v2'
    pagerduty_configs:
      - routing_key: 'your-routing-key'
        severity: '{{ .Status }}'
        class: '{{ .CommonLabels.alertname }}'
        component: '{{ .CommonLabels.component }}'
        group: '{{ .CommonLabels.job }}'

Common Cause 4: Webhook Delivery Failures

Custom webhooks can fail due to network issues, authentication problems, or payload format issues.

Error pattern: `` notify retry for *webhook.Notifier: Post "https://webhook.example.com/": dial tcp: i/o timeout

bash
notify retry for *webhook.Notifier: unexpected status code 500

Diagnosis:

```bash # Test webhook endpoint directly curl -X POST https://webhook.example.com/alerts \ -H 'Content-Type: application/json' \ -d '{"test": true}'

# Check webhook configuration curl -s http://localhost:9093/api/v2/status | jq '.config.original' | grep -A 20 webhook

# View notification history curl -s http://localhost:9093/api/v2/alerts/groups | jq '.[].receiver' ```

Solution:

Configure webhook correctly:

yaml
# alertmanager.yml
receivers:
  - name: 'webhook'
    webhook_configs:
      - url: 'https://webhook.example.com/alerts'
        send_resolved: true
        http_config:
          basic_auth:
            username: 'alertmanager'
            password: 'webhook-password'
          tls_config:
            insecure_skip_verify: false
        max_alerts: 100

For custom payloads:

yaml
receivers:
  - name: 'custom-webhook'
    webhook_configs:
      - url: 'https://api.example.com/incidents'
        send_resolved: true
        http_config:
          authorization:
            type: Bearer
            credentials: 'your-api-token'

Common Cause 5: Routing and Grouping Issues

Sometimes alerts fire but don't reach the intended receiver due to routing misconfiguration.

Error pattern: Alerts appear in Alertmanager UI but aren't delivered to any receiver.

Diagnosis:

```bash # Check current route configuration amtool config show

# Test route matching amtool config routes test --config.file=alertmanager.yml alertname=HighCPU severity=critical

# Check alert status in Alertmanager amtool alert query

# View silence rules amtool silence query ```

Solution:

Verify and fix routing:

```yaml # alertmanager.yml route: receiver: 'default' group_by: ['alertname', 'cluster', 'service'] group_wait: 30s group_interval: 5m repeat_interval: 4h routes: # Critical alerts go to PagerDuty immediately - match: severity: critical receiver: 'pagerduty-critical' group_wait: 10s repeat_interval: 1h

# Warning alerts go to Slack - match: severity: warning receiver: 'slack-warnings' group_wait: 5m repeat_interval: 12h

# Database alerts need special handling - match_re: service: ^(mysql|postgres|redis)$ receiver: 'database-team' routes: - match: severity: critical receiver: 'database-pagerduty'

receivers: - name: 'default' slack_configs: - channel: '#alerts' api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL' ```

Common Cause 6: Template Rendering Errors

Invalid notification templates can cause failures.

Error pattern: `` template: email:1: unexpected EOF

bash
template: slack:2: undefined variable ".Alerts"

Diagnosis:

```bash # Test template rendering amtool template render --config.file=alertmanager.yml \ --template.globs="*.tmpl" \ --template.input='{{ template "slack.title" . }}'

# Check for template syntax errors cat /etc/alertmanager/templates/*.tmpl | go fmt ```

Solution:

Fix template syntax:

```yaml # In alertmanager.yml templates: - '/etc/alertmanager/templates/*.tmpl'

# templates/slack.tmpl {{ define "slack.title" }} {{ if eq .Status "firing" }} :fire: {{ .Alerts.Firing | len }} alerts firing {{ else if eq .Status "resolved" }} :checkered_flag: {{ .Alerts.Resolved | len }} alerts resolved {{ end }} {{ end }}

{{ define "slack.text" }} {{ range .Alerts }} *Alert:* {{ .Labels.alertname }} *Severity:* {{ .Labels.severity }} *Instance:* {{ .Labels.instance }} *Description:* {{ .Annotations.description }} *Started:* {{ .StartsAt.Format "2006-01-02 15:04:05" }} {{ if eq .Status "resolved" }} *Resolved:* {{ .EndsAt.Format "2006-01-02 15:04:05" }} {{ end }} {{ end }} {{ end }} ```

Common Cause 7: Alertmanager Silences

Active silences can block notifications unexpectedly.

Diagnosis:

```bash # List all silences amtool silence query

# Check silences via API curl -s http://localhost:9093/api/v2/silences | jq '.[] | {id: .id, matchers: .matchers, createdBy: .createdBy, comment: .comment}'

# Check if specific alert is silenced amtool silence query alertname=HighCPU ```

Solution:

```bash # Remove unwanted silences amtool silence expire <silence-id>

# Or via API curl -X DELETE http://localhost:9093/api/v2/silence/<silence-id> ```

Verification

After fixing, verify notifications are working:

```bash # Send a test alert amtool alert add alertname=TestAlert severity=warning \ --annotation=summary="Test notification" \ --generator-url="http://localhost:9090/graph"

# Check alert was received curl -s http://localhost:9093/api/v2/alerts | jq '.[] | select(.labels.alertname=="TestAlert")'

# Check notification was sent (look at logs) kubectl logs -l app=alertmanager -n monitoring --tail=100 | grep -i "TestAlert|notify"

# Verify receiver configuration amtool config routes test alertname=TestAlert severity=critical ```

Prevention

Monitor Alertmanager health:

```yaml # Prometheus alerting rules for Alertmanager groups: - name: alertmanager_health rules: - alert: AlertmanagerConfigInconsistent expr: count_values("config_hash", alertmanager_config_hash) BY (cluster) != 1 for: 5m labels: severity: critical annotations: summary: "Alertmanager configurations are inconsistent"

  • alert: AlertmanagerNotificationFailed
  • expr: rate(alertmanager_notifications_failed_total[5m]) > 0
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Alertmanager notification failures detected"
  • alert: AlertmanagerSilenced
  • expr: ALERTS{alertstate="firing", alertname="SilencedAlert"} > 0
  • for: 1h
  • labels:
  • severity: info
  • annotations:
  • summary: "Alerts are being silenced"
  • `

Notification failures are usually configuration or connectivity issues. Start by testing the notification channel directly, then verify Alertmanager's configuration and logs.

  • [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
  • [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
  • [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
  • [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
  • [Fix Fix Chrony Ntp Desync Issue in Monitoring](fix-chrony-ntp-desync)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Alertmanager Notification Failed", "description": "Resolve Alertmanager notification failures for email, Slack, PagerDuty, and webhooks. Debug delivery issues and fix configuration problems.", "url": "https://www.fixwikihub.com/fix-alertmanager-notification-failed", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-11-24T07:29:38.837Z", "dateModified": "2025-11-24T07:29:38.837Z" } </script>