Introduction

Prometheus logs show recording rule evaluation errors:

bash
level=error ts=2026-04-04T04:20:15.789Z caller=manager.go:456 component="rule manager" msg="Error evaluating rule" rule="job:http_requests:rate5m" err="vector contains metrics with the same name but duplicate labels"
level=error ts=2026-04-04T04:20:15.790Z caller=manager.go:457 component="rule manager" err="unknown metric name \"http_requests_total_5m_rate\""

Recording rules pre-compute frequently needed or computationally expensive expressions. Errors here cascade into broken dashboards and alerts.

Symptoms

Common error messages include:

bash
level=error ts=2026-04-04T04:20:15.789Z caller=manager.go:456 component="rule manager" msg="Error evaluating rule" rule="job:http_requests:rate5m" err="vector contains metrics with the same name but duplicate labels"
level=error ts=2026-04-04T04:20:15.790Z caller=manager.go:457 component="rule manager" err="unknown metric name \"http_requests_total_5m_rate\""

```promql # Recording rule evaluation time prometheus_rule_evaluation_duration_seconds

# Failed rule evaluations prometheus_rule_evaluations_total - prometheus_rule_evaluations_total{result="success"}

# Rule group evaluation errors prometheus_rule_group_rules{state="error"} ```

```bash # List all recording rules with status curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[] | select(.type == "recording") | {name: .name, state: .state, lastError: .lastError}'

# Check for errors curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[] | select(.lastError != null) | {name: .name, error: .lastError}' ```

Common Causes

  • Configuration misconfiguration
  • Missing or incorrect credentials
  • Network connectivity issues
  • Version compatibility problems
  • Resource exhaustion or limits
  • Permission or access denied

Diagnosis

Check Rule Evaluation Status

```promql # Recording rule evaluation time prometheus_rule_evaluation_duration_seconds

# Failed rule evaluations prometheus_rule_evaluations_total - prometheus_rule_evaluations_total{result="success"}

# Rule group evaluation errors prometheus_rule_group_rules{state="error"} ```

Check Rule Groups

```bash # List all recording rules with status curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[] | select(.type == "recording") | {name: .name, state: .state, lastError: .lastError}'

# Check for errors curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[] | select(.lastError != null) | {name: .name, error: .lastError}' ```

View Prometheus Logs

bash
# Check for rule evaluation errors
journalctl -u prometheus --since "1 hour ago" | grep -i "rule evaluation"

Step-by-Step Fix

1. Fix Syntax Errors

Common syntax issues in recording rules:

```yaml # recording_rules.yml groups: - name: http_metrics interval: 30s rules: # WRONG: Using incorrect syntax # - record: job:http_requests:rate5m # expr: rate(http_requests_total[5m]) by (job)

# CORRECT: Proper aggregation syntax - record: job:http_requests:rate5m expr: sum by (job) (rate(http_requests_total[5m])) ```

Validate configuration:

```bash # Check recording rules syntax promtool check rules recording_rules.yml

# Check full config promtool check config prometheus.yml ```

2. Fix Label Collisions

When recording rules produce labels that conflict:

```yaml groups: - name: service_metrics rules: # Problem: Creates duplicate labels # - record: http_requests:rate5m # expr: rate(http_requests_total[5m])

# Solution: Explicitly aggregate labels - record: service:http_requests:rate5m expr: sum without (instance, pod) (rate(http_requests_total[5m]))

# Or keep specific labels - record: job:http_requests:rate5m expr: sum by (job, method, status) (rate(http_requests_total[5m])) ```

3. Fix Non-Existent Metrics

Recording rules referencing metrics that don't exist yet:

```yaml groups: - name: derived_metrics rules: # Problem: Source metric may not exist at startup - record: app:request_rate expr: rate(app_requests_total[5m])

# Solution: Use absent() for default values - record: app:request_rate expr: | rate(app_requests_total[5m]) or vector(0)

# Or use a default with on() - record: app:availability expr: | (sum(rate(http_requests_total{status!~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) or on() vector(1) ```

4. Handle Stale Data

Recording rules with missing data gaps:

```yaml groups: - name: availability rules: # Problem: Returns no data during gaps - record: app:availability_ratio expr: sum(rate(http_requests_total{status="200"}[5m])) / sum(rate(http_requests_total[5m]))

# Solution: Fill gaps with default - record: app:availability_ratio expr: | clamp_max( sum(rate(http_requests_total{status="200"}[5m])) / on() group_left sum(rate(http_requests_total[5m])), 1 ) or on() vector(0) ```

5. Fix Evaluation Order

Rules that depend on other recording rules must be in order:

```yaml groups: - name: derived_metrics # Order matters within a group - earlier rules are available to later ones rules: # First: Base metric - record: job:http_requests:rate5m expr: sum by (job) (rate(http_requests_total[5m]))

# Second: Depends on first - record: job:http_requests:rate5m:increase_1h expr: increase(job:http_requests:rate5m[1h])

# Separate group for independent calculations - name: error_metrics rules: - record: job:http_errors:rate5m expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m])) ```

6. Optimize Performance

Slow recording rules impact Prometheus:

```yaml groups: - name: optimized_rules interval: 30s # Reduce evaluation frequency rules: # Slow: High cardinality query every 15s # - record: pod:http_requests:rate5m # expr: sum by (pod) (rate(http_requests_total[5m]))

# Better: Less frequent, and reduce labels - record: namespace:http_requests:rate5m expr: sum without (pod, container) (rate(http_requests_total[5m])) ```

Monitor rule performance:

```promql # Slow rule evaluations histogram_quantile(0.95, rate(prometheus_rule_evaluation_duration_seconds_bucket[5m])) > 1

# Rules taking > 1s prometheus_rule_evaluation_duration_seconds > 1 ```

Verification

Verify Recording Rules Work

bash
# Check rule exists and has data
curl -s 'http://localhost:9090/api/v1/query?query=job:http_requests:rate5m' | jq '.data.result'

```promql # Verify recording rule output {__name__="job:http_requests:rate5m"}

# Compare with source sum by (job) (rate(http_requests_total[5m])) == job:http_requests:rate5m ```

Check Rule Status

bash
# All rules should show "ok"
curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[].rules[] | select(.type == "recording") | {name: .name, health: .health}'

Prevention

Add monitoring for recording rules:

```yaml groups: - name: rule_health rules: - alert: RecordingRuleEvaluationSlow expr: histogram_quantile(0.95, rate(prometheus_rule_evaluation_duration_seconds_bucket[5m])) > 0.5 for: 5m labels: severity: warning annotations: summary: "Recording rules evaluating slowly" description: "95th percentile rule evaluation time is {{ $value }}s"

  • alert: RecordingRuleMissing
  • expr: absent({__name__="job:http_requests:rate5m"})
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Recording rule job:http_requests:rate5m is missing"
  • alert: RecordingRuleStale
  • expr: time() - timestamp(job:http_requests:rate5m) > 600
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Recording rule job:http_requests:rate5m is stale"
  • `

Prevention

  1. 1.Naming Convention: Use level:metric:operations format (e.g., job:http_requests:rate5m)
  2. 2.Keep Rules Simple: One calculation per rule
  3. 3.Document Dependencies: Comment rules that depend on others
  4. 4.Monitor Performance: Track evaluation duration
  5. 5.Test Changes: Use promtool test rules to validate
  • [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
  • [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
  • [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
  • [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
  • [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Prometheus Recording Rule Error", "description": "Learn how to diagnose and fix Prometheus recording rule errors including syntax issues, evaluation failures, and performance problems.", "url": "https://www.fixwikihub.com/fix-prometheus-recording-rule", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-11-24T09:52:54.857Z", "dateModified": "2025-11-24T09:52:54.857Z" } </script>