Home / Monitoring / Fix Prometheus HA Pair Issues

Monitoring

Fix Prometheus HA Pair Issues

Resolve Prometheus HA pair synchronization issues, duplicate alerts, and data inconsistency problems

Published: Nov 25, 20257 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

Introduction

You have two Prometheus instances running in HA mode, but you're experiencing issues:

Duplicate alerts firing from both instances
Inconsistent data between the two Prometheus servers
Alertmanager receiving alerts from both without deduplication

bash

level=warn ts=2026-04-04T23:55:12.345Z caller=alerting.go:234 msg="Alert already exists" alert="HighCPU" instance="prometheus-1"
level=error ts=2026-04-04T23:55:13.456Z caller=alerting.go:235 msg="Duplicate alert source" sources="prometheus-0,prometheus-1"

HA pair issues cause alert noise, data gaps, and unreliable monitoring.

Symptoms

Common error messages include:

bash

level=warn ts=2026-04-04T23:55:12.345Z caller=alerting.go:234 msg="Alert already exists" alert="HighCPU" instance="prometheus-1"
level=error ts=2026-04-04T23:55:13.456Z caller=alerting.go:235 msg="Duplicate alert source" sources="prometheus-0,prometheus-1"

bash

# Check external labels on each Prometheus
curl -s http://prometheus-0:9090/api/v1/status/config | jq '.data.global.external_labels'
curl -s http://prometheus-1:9090/api/v1/status/config | jq '.data.global.external_labels'

```bash # Verify both Prometheus instances are connected to Alertmanager curl -s http://alertmanager:9093/api/v2/status | jq '.data'

# Check alert silences and inhibition rules curl -s http://alertmanager:9093/api/v2/silences | jq . ```

Common Causes

Configuration misconfiguration
Missing or incorrect credentials
Network connectivity issues
Version compatibility problems
Resource exhaustion or limits
Permission or access denied

Diagnosis

Check External Labels

bash

# Check external labels on each Prometheus
curl -s http://prometheus-0:9090/api/v1/status/config | jq '.data.global.external_labels'
curl -s http://prometheus-1:9090/api/v1/status/config | jq '.data.global.external_labels'

Check Alertmanager Connections

```bash # Verify both Prometheus instances are connected to Alertmanager curl -s http://alertmanager:9093/api/v2/status | jq '.data'

# Check alert silences and inhibition rules curl -s http://alertmanager:9093/api/v2/silences | jq . ```

Check for Duplicate Alerts

```promql # Count alerts from each Prometheus count by (prometheus) (ALERTS{alertstate="firing"})

# Alerts without replica label ALERTS{alertstate="firing"} unless ALERTS{alertstate="firing",prometheus=~".+"} ```

Check Data Consistency

```promql # Compare metrics from both Prometheus # Query prometheus-0 directly {job="node-exporter"} @ prometheus-0

# Compare timestamps timestamp(up{job="node-exporter"}) @ prometheus-0 == timestamp(up{job="node-exporter"}) @ prometheus-1 ```

Step-by-Step Fix

1. Configure External Labels

Each Prometheus instance must have unique external labels:

```yaml # prometheus-0 configuration # prometheus.yml global: external_labels: prometheus: 'prometheus-0' cluster: 'production' replica: '0'

# prometheus-1 configuration global: external_labels: prometheus: 'prometheus-1' cluster: 'production' replica: '1' ```

These labels are used by Alertmanager to deduplicate alerts.

2. Configure Alertmanager for Deduplication

Alertmanager uses external labels for deduplication:

```yaml # alertmanager.yml global: # Resolve timeout resolve_timeout: 5m

route: group_by: ['alertname', 'cluster', 'prometheus'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'default'

receivers: - name: 'default' webhook_configs: - url: 'http://notification-service/webhook'

# Inhibition rules for deduplication inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'cluster'] ```

The group_by must include the unique replica label.

3. Deduplicate via Thanos/VictoriaMetrics

For long-term storage deduplication:

yaml

# Thanos Receive configuration
receive:
  # Enable deduplication
  dedup_enabled: true
  replica_label: 'prometheus'
  # Hash ring configuration
  hashring:
    members:
      - address: thanos-receive-0
      - address: thanos-receive-1

Or use Victoria Metrics:

bash

# Victoria Metrics deduplication settings
vminsert -dedup.minScrapeInterval=15s

Configure Prometheus remote write:

yaml

# Both Prometheus instances
remote_write:
  - url: "https://thanos-receive:19291/api/v1/write"
    # Ensure external_labels are set globally

4. Fix Scrape Configuration Differences

Both Prometheus should have identical scrape configs:

```bash # Compare configurations diff prometheus-0.yml prometheus-1.yml

# Or via API curl -s http://prometheus-0:9090/api/v1/status/config > config-0.json curl -s http://prometheus-1:9090/api/v1/status/config > config-1.json diff config-0.json config-1.json ```

Ensure identical configs:

```yaml # Shared configuration file for both instances # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s

# Only difference should be external_labels # Use separate files for external labels or environment variables ```

Using environment variables:

```yaml # prometheus.yml global: external_labels: prometheus: '${PROMETHEUS_REPLICA}' cluster: 'production'

# Set via command line or environment export PROMETHEUS_REPLICA=prometheus-0 prometheus --config.file=prometheus.yml ```

5. Handle Alert Evaluation Timing

Alerts may fire at different times due to timing differences:

```yaml # Sync evaluation intervals global: evaluation_interval: 30s # Same on both

# Use consistent 'for' durations groups: - name: application_alerts rules: - alert: HighCPU expr: rate(process_cpu_seconds_total[1m]) > 0.8 for: 5m # Should be > scrape_interval ```

6. Configure Kubernetes HA

For Kubernetes deployments:

yaml

# prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  replicas: 2
  serviceName: prometheus
  template:
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          args:
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--storage.tsdb.path=/data'
            - '--external.label.prometheus=prometheus-$(POD_NAME)'
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name

Service configuration:

yaml

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: ClusterIP
  ports:
    - port: 9090
  selector:
    app: prometheus
---
# Headless service for StatefulSet
apiVersion: v1
kind: Service
metadata:
  name: prometheus-headless
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - port: 9090
  selector:
    app: prometheus

7. Alertmanager HA Configuration

For Alertmanager HA:

```yaml # alertmanager.yml for cluster mode cluster: peers: - alertmanager-0:9094 - alertmanager-1:9094 gossip_interval: 10s peer_timeout: 30s

high_availability: enabled: true ```

Kubernetes deployment:

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: alertmanager
spec:
  replicas: 2
  serviceName: alertmanager
  template:
    spec:
      containers:
        - name: alertmanager
          image: prom/alertmanager:latest
          args:
            - '--cluster.peer=alertmanager-1:9094'
            - '--cluster.peer=alertmanager-0:9094'
          ports:
            - containerPort: 9093
            - containerPort: 9094

Verification

Check Alert Deduplication

```bash # Verify alerts from both sources curl -s http://alertmanager:9093/api/v2/alerts | jq '.[] | {labels: .labels, fingerprint: .fingerprint}'

# Check Alertmanager cluster status curl -s http://alertmanager:9093/api/v2/status | jq '.cluster' ```

Verify External Labels

```promql # Check both Prometheus have different replica labels count by (prometheus) ({__name__=~"prometheus_.+"})

# Query from each Prometheus {prometheus="prometheus-0"} {prometheus="prometheus-1"} ```

Check Data Consistency

bash

# Compare sample counts
curl -s 'http://prometheus-0:9090/api/v1/query?query=count({__name__=~".+"})' | jq '.data.result[0].value'
curl -s 'http://prometheus-1:9090/api/v1/query?query=count({__name__=~".+"})' | jq '.data.result[0].value'

Prevention

Add monitoring for HA pair:

```yaml groups: - name: ha_pair_alerts rules: - alert: PrometheusReplicaMissingExternalLabel expr: absent({prometheus=~".+"}) for: 5m labels: severity: critical annotations: summary: "Prometheus replica missing external label" description: "Prometheus is missing the 'prometheus' external label required for HA"

alert: AlertmanagerHADown
expr: count(alertmanager_cluster_members) != count(alertmanager_cluster_members_info)
for: 5m
labels:
severity: critical
annotations:
summary: "Alertmanager cluster degraded"
description: "Expected {{ $value }} Alertmanager members but fewer are healthy"

alert: PrometheusHAConfigMismatch
expr: |
count by (job) ({__name__=~".+"}) @ prometheus-0 !=
count by (job) ({__name__=~".+"}) @ prometheus-1
for: 10m
labels:
severity: warning
annotations:
summary: "Prometheus HA configuration mismatch"

alert: DuplicateAlertSources
expr: count by (alertname) (ALERTS{alertstate="firing"}) > 1
for: 1m
labels:
severity: warning
annotations:
summary: "Duplicate alerts detected"
description: "Alert {{ $labels.alertname }} firing from multiple sources"
`

[WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
[Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
[Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
[Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
[Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

Prometheus HA Pair Issues Prometheus HA Pair Issues Monitoring Prometheus HA Pair Issues troubleshooting Prometheus HA Pair Issues fix Resolve Prometheus HA pair synchronization issues, duplicate alerts, and data inconsistency problems Monitoring Resolve Prometheus HA pair synchronization issues, duplicate alerts, and data inconsistency problems

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Monitoring Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this monitoring-errors troubleshooting guide applies to my situation?

This guide is designed for monitoring-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these monitoring-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of monitoring-errors issue?

Most monitoring-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this monitoring-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for monitoring-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Nov 25, 2025

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation

Fix Prometheus HA Pair Issues

Introduction

Symptoms

Common Causes

Diagnosis

Check External Labels

Check Alertmanager Connections

Check for Duplicate Alerts

Check Data Consistency

Step-by-Step Fix

1. Configure External Labels

2. Configure Alertmanager for Deduplication

3. Deduplicate via Thanos/VictoriaMetrics

4. Fix Scrape Configuration Differences

5. Handle Alert Evaluation Timing

6. Configure Kubernetes HA

7. Alertmanager HA Configuration

Verification

Check Alert Deduplication

Verify External Labels

Check Data Consistency

Prevention

People also search for

Browse Guides from Other Categories

WordPress

SSL

DNS

Monitoring Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading

Fix Prometheus HA Pair Issues

Introduction

Symptoms

Common Causes

Diagnosis

Check External Labels

Check Alertmanager Connections

Check for Duplicate Alerts

Check Data Consistency

Step-by-Step Fix

1. Configure External Labels

2. Configure Alertmanager for Deduplication

3. Deduplicate via Thanos/VictoriaMetrics

4. Fix Scrape Configuration Differences

5. Handle Alert Evaluation Timing

6. Configure Kubernetes HA

7. Alertmanager HA Configuration

Verification

Check Alert Deduplication

Verify External Labels

Check Data Consistency

Prevention

Related Articles

People also search for

Share this guide

More Monitoring Troubleshooting Guides

Browse Guides from Other Categories

Monitoring Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading