Introduction
You have two Prometheus instances running in HA mode, but you're experiencing issues:
- Duplicate alerts firing from both instances
- Inconsistent data between the two Prometheus servers
- Alertmanager receiving alerts from both without deduplication
level=warn ts=2026-04-04T23:55:12.345Z caller=alerting.go:234 msg="Alert already exists" alert="HighCPU" instance="prometheus-1"
level=error ts=2026-04-04T23:55:13.456Z caller=alerting.go:235 msg="Duplicate alert source" sources="prometheus-0,prometheus-1"HA pair issues cause alert noise, data gaps, and unreliable monitoring.
Symptoms
Common error messages include:
level=warn ts=2026-04-04T23:55:12.345Z caller=alerting.go:234 msg="Alert already exists" alert="HighCPU" instance="prometheus-1"
level=error ts=2026-04-04T23:55:13.456Z caller=alerting.go:235 msg="Duplicate alert source" sources="prometheus-0,prometheus-1"# Check external labels on each Prometheus
curl -s http://prometheus-0:9090/api/v1/status/config | jq '.data.global.external_labels'
curl -s http://prometheus-1:9090/api/v1/status/config | jq '.data.global.external_labels'```bash # Verify both Prometheus instances are connected to Alertmanager curl -s http://alertmanager:9093/api/v2/status | jq '.data'
# Check alert silences and inhibition rules curl -s http://alertmanager:9093/api/v2/silences | jq . ```
Common Causes
- Configuration misconfiguration
- Missing or incorrect credentials
- Network connectivity issues
- Version compatibility problems
- Resource exhaustion or limits
- Permission or access denied
Diagnosis
Check External Labels
# Check external labels on each Prometheus
curl -s http://prometheus-0:9090/api/v1/status/config | jq '.data.global.external_labels'
curl -s http://prometheus-1:9090/api/v1/status/config | jq '.data.global.external_labels'Check Alertmanager Connections
```bash # Verify both Prometheus instances are connected to Alertmanager curl -s http://alertmanager:9093/api/v2/status | jq '.data'
# Check alert silences and inhibition rules curl -s http://alertmanager:9093/api/v2/silences | jq . ```
Check for Duplicate Alerts
```promql # Count alerts from each Prometheus count by (prometheus) (ALERTS{alertstate="firing"})
# Alerts without replica label ALERTS{alertstate="firing"} unless ALERTS{alertstate="firing",prometheus=~".+"} ```
Check Data Consistency
```promql # Compare metrics from both Prometheus # Query prometheus-0 directly {job="node-exporter"} @ prometheus-0
# Compare timestamps timestamp(up{job="node-exporter"}) @ prometheus-0 == timestamp(up{job="node-exporter"}) @ prometheus-1 ```
Step-by-Step Fix
1. Configure External Labels
Each Prometheus instance must have unique external labels:
```yaml # prometheus-0 configuration # prometheus.yml global: external_labels: prometheus: 'prometheus-0' cluster: 'production' replica: '0'
# prometheus-1 configuration global: external_labels: prometheus: 'prometheus-1' cluster: 'production' replica: '1' ```
These labels are used by Alertmanager to deduplicate alerts.
2. Configure Alertmanager for Deduplication
Alertmanager uses external labels for deduplication:
```yaml # alertmanager.yml global: # Resolve timeout resolve_timeout: 5m
route: group_by: ['alertname', 'cluster', 'prometheus'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'default'
receivers: - name: 'default' webhook_configs: - url: 'http://notification-service/webhook'
# Inhibition rules for deduplication inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'cluster'] ```
The group_by must include the unique replica label.
3. Deduplicate via Thanos/VictoriaMetrics
For long-term storage deduplication:
# Thanos Receive configuration
receive:
# Enable deduplication
dedup_enabled: true
replica_label: 'prometheus'
# Hash ring configuration
hashring:
members:
- address: thanos-receive-0
- address: thanos-receive-1Or use Victoria Metrics:
# Victoria Metrics deduplication settings
vminsert -dedup.minScrapeInterval=15sConfigure Prometheus remote write:
# Both Prometheus instances
remote_write:
- url: "https://thanos-receive:19291/api/v1/write"
# Ensure external_labels are set globally4. Fix Scrape Configuration Differences
Both Prometheus should have identical scrape configs:
```bash # Compare configurations diff prometheus-0.yml prometheus-1.yml
# Or via API curl -s http://prometheus-0:9090/api/v1/status/config > config-0.json curl -s http://prometheus-1:9090/api/v1/status/config > config-1.json diff config-0.json config-1.json ```
Ensure identical configs:
```yaml # Shared configuration file for both instances # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s
# Only difference should be external_labels # Use separate files for external labels or environment variables ```
Using environment variables:
```yaml # prometheus.yml global: external_labels: prometheus: '${PROMETHEUS_REPLICA}' cluster: 'production'
# Set via command line or environment export PROMETHEUS_REPLICA=prometheus-0 prometheus --config.file=prometheus.yml ```
5. Handle Alert Evaluation Timing
Alerts may fire at different times due to timing differences:
```yaml # Sync evaluation intervals global: evaluation_interval: 30s # Same on both
# Use consistent 'for' durations groups: - name: application_alerts rules: - alert: HighCPU expr: rate(process_cpu_seconds_total[1m]) > 0.8 for: 5m # Should be > scrape_interval ```
6. Configure Kubernetes HA
For Kubernetes deployments:
# prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
replicas: 2
serviceName: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/data'
- '--external.label.prometheus=prometheus-$(POD_NAME)'
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.nameService configuration:
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: ClusterIP
ports:
- port: 9090
selector:
app: prometheus
---
# Headless service for StatefulSet
apiVersion: v1
kind: Service
metadata:
name: prometheus-headless
spec:
type: ClusterIP
clusterIP: None
ports:
- port: 9090
selector:
app: prometheus7. Alertmanager HA Configuration
For Alertmanager HA:
```yaml # alertmanager.yml for cluster mode cluster: peers: - alertmanager-0:9094 - alertmanager-1:9094 gossip_interval: 10s peer_timeout: 30s
high_availability: enabled: true ```
Kubernetes deployment:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: alertmanager
spec:
replicas: 2
serviceName: alertmanager
template:
spec:
containers:
- name: alertmanager
image: prom/alertmanager:latest
args:
- '--cluster.peer=alertmanager-1:9094'
- '--cluster.peer=alertmanager-0:9094'
ports:
- containerPort: 9093
- containerPort: 9094Verification
Check Alert Deduplication
```bash # Verify alerts from both sources curl -s http://alertmanager:9093/api/v2/alerts | jq '.[] | {labels: .labels, fingerprint: .fingerprint}'
# Check Alertmanager cluster status curl -s http://alertmanager:9093/api/v2/status | jq '.cluster' ```
Verify External Labels
```promql # Check both Prometheus have different replica labels count by (prometheus) ({__name__=~"prometheus_.+"})
# Query from each Prometheus {prometheus="prometheus-0"} {prometheus="prometheus-1"} ```
Check Data Consistency
# Compare sample counts
curl -s 'http://prometheus-0:9090/api/v1/query?query=count({__name__=~".+"})' | jq '.data.result[0].value'
curl -s 'http://prometheus-1:9090/api/v1/query?query=count({__name__=~".+"})' | jq '.data.result[0].value'Prevention
Add monitoring for HA pair:
```yaml groups: - name: ha_pair_alerts rules: - alert: PrometheusReplicaMissingExternalLabel expr: absent({prometheus=~".+"}) for: 5m labels: severity: critical annotations: summary: "Prometheus replica missing external label" description: "Prometheus is missing the 'prometheus' external label required for HA"
- alert: AlertmanagerHADown
- expr: count(alertmanager_cluster_members) != count(alertmanager_cluster_members_info)
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Alertmanager cluster degraded"
- description: "Expected {{ $value }} Alertmanager members but fewer are healthy"
- alert: PrometheusHAConfigMismatch
- expr: |
- count by (job) ({__name__=~".+"}) @ prometheus-0 !=
- count by (job) ({__name__=~".+"}) @ prometheus-1
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Prometheus HA configuration mismatch"
- alert: DuplicateAlertSources
- expr: count by (alertname) (ALERTS{alertstate="firing"}) > 1
- for: 1m
- labels:
- severity: warning
- annotations:
- summary: "Duplicate alerts detected"
- description: "Alert {{ $labels.alertname }} firing from multiple sources"
`
Related Articles
- [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
- [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
- [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
- [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
- [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Prometheus HA Pair Issues", "description": "Learn how to diagnose and fix Prometheus HA pair issues including alert deduplication, external label conflicts, and synchronization problems.", "url": "https://www.fixwikihub.com/fix-prometheus-ha-pair", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-11-25T08:11:20.894Z", "dateModified": "2025-11-25T08:11:20.894Z" } </script>