Introduction

Prometheus queries timeout when the query takes too long to execute. This happens with large time ranges, high cardinality metrics, or complex PromQL expressions.

Symptoms

Query timeout:

```bash $ curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=15s'

{ "status": "error", "errorType": "timeout", "error": "query timed out after 30 seconds" } ```

Grafana dashboard error:

bash
Error: Prometheus query timed out

Log messages:

```bash $ journalctl -u prometheus | grep timeout

prometheus: query timed out: "rate(http_requests_total[5m])" prometheus: querier timeout after 30s ```

Common Causes

  1. 1.Large time range - Querying weeks/months of data
  2. 2.High cardinality - Too many unique label combinations
  3. 3.Complex PromQL - Nested functions, joins, aggregations
  4. 4.Insufficient resources - CPU/memory limits too low
  5. 5.Slow storage - SSD vs HDD performance difference
  6. 6.Concurrent queries - Too many simultaneous queries

Step-by-Step Fix

  1. 1.Check logs for specific error messages
  2. 2.Verify configuration settings
  3. 3.Test network connectivity
  4. 4.Review recent changes
  5. 5.Apply corrective action
  6. 6.Verify the fix

Step 1: Check Query Scope

```bash # Check query time range curl 'http://prometheus:9090/api/v1/query_range?query=up&start=now-1h&end=now&step=60s'

# Reduce time range for testing curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-1h&end=now'

# Check query complexity # Simple query - fast curl 'http://prometheus:9090/api/v1/query?query=up'

# Complex query - slower curl 'http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total[5m])) by (job)'

# Check query result size curl 'http://prometheus:9090/api/v1/query?query=http_requests_total' | jq '.data.result | length'

# If result > 10000 series, query may timeout ```

Step 2: Check Cardinality

```bash # Check high cardinality metrics curl 'http://prometheus:9090/api/v1/labels' | jq '.data[]' | while read label; do curl "http://prometheus:9090/api/v1/label/$label/values" | jq '.data | length' done

# Check series count per metric curl 'http://prometheus:9090/api/v1/query?query={__name__=~".+"}' | jq '.data.result | length'

# Find metrics with highest cardinality curl 'http://prometheus:9090/api/v1/metadata' | jq '.data | to_entries | sort_by(.value[].cardinality) | reverse | .[] | {metric: .key, cardinality: .value[].cardinality}'

# Check label value explosion curl 'http://prometheus:9090/api/v1/label/user_id/values' | jq '.data | length' # If user_id has thousands of values, queries will be slow

# Drop high cardinality labels in scrape config: scrape_configs: - job_name: 'my-app' metric_relabel_configs: - source_labels: [user_id] regex: '.+' action: drop ```

Step 3: Adjust Timeout Settings

```bash # Check current timeout curl 'http://prometheus:9090/api/v1/status/config' | jq '.data.query_timeout'

# Default: 2 minutes for instant queries, 30s for range queries

# In prometheus.yml: global: evaluation_interval: 15s scrape_interval: 15s

# Increase timeout (not recommended as primary fix): # In Prometheus startup flags: --query.timeout=5m --query.max-samples=50000000

# Or in prometheus.yml (Prometheus 2.39+): query: timeout: 5m max_samples: 50000000 max_concurrency: 20

# Restart Prometheus systemctl restart prometheus ```

Step 4: Optimize PromQL Queries

```bash # Slow query (full scan each evaluation): rate(http_requests_total[5m])

# Faster with recording rule: # In rules.yml: groups: - name: http_requests rules: - record: http_requests:rate5m expr: rate(http_requests_total[5m])

# Then query pre-computed metric: http_requests:rate5m

# Avoid nested rate: # Slow: rate(sum(http_requests_total)[5m])

# Fast: sum(rate(http_requests_total[5m]))

# Use label filters early: # Slow: sum(rate(http_requests_total[5m])) by (job)

# Fast: sum(rate(http_requests_total{job="api"}[5m])) by (job)

# Reduce resolution for large time ranges: # Query with 1h step instead of 15s curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=1h' ```

Step 5: Add Recording Rules

```bash # Create recording rules file cat << 'EOF' > /etc/prometheus/recording_rules.yml groups: - name: precomputed_metrics interval: 30s rules: # Pre-compute common queries - record: job:http_requests:rate5m expr: sum(rate(http_requests_total[5m])) by (job)

  • record: instance:http_requests:rate5m
  • expr: rate(http_requests_total[5m])

# Pre-aggregate for faster queries - record: namespace:cpu_usage:rate5m expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) EOF

# Add to prometheus.yml: rule_files: - '/etc/prometheus/recording_rules.yml'

# Restart Prometheus systemctl restart prometheus

# Verify rules loaded curl 'http://prometheus:9090/api/v1/rules' | jq '.data.groups[].name' ```

Step 6: Increase Resources

```bash # Check Prometheus memory usage ps aux | grep prometheus

# Check memory limits curl 'http://prometheus:9090/api/v1/status/tsdb' | jq '.data.headStats'

# For systemd service: cat /etc/systemd/system/prometheus.service

# Increase memory limit: MemoryLimit=8G

# Or in Docker: docker update --memory 8g prometheus

# For Kubernetes: resources: limits: memory: 8Gi cpu: 4 requests: memory: 4Gi cpu: 2

# Check CPU usage during query top -p $(pgrep prometheus)

# Increase query concurrency: --query.max-concurrency=20 ```

Step 7: Optimize Storage

```bash # Check storage performance iostat -x 1 10

# Check Prometheus data directory ls -la /var/lib/prometheus/data/

# Check TSDB stats curl 'http://prometheus:9090/api/v1/status/tsdb' | jq

# Key metrics: # - numSeries: number of active series # - numChunks: number of chunks # - headStats: memory usage

# Reduce retention to limit data size: --storage.tsdb.retention.time=15d

# Or set retention size: --storage.tsdb.retention.size=10GB

# Enable compression: --storage.tsdb.wal-compression

# Check disk usage df -h /var/lib/prometheus ```

Step 8: Use Query Hints

```bash # Grafana query hints: # In Grafana dashboard panel settings: # Query options: # - Max data points: 1000 (reduce resolution) # - Min step: 1m (minimum resolution)

# Use $__interval variable for dynamic step: rate(http_requests_total[$__interval])

# For large time ranges, increase step: # Query: rate(http_requests_total[5m]) # Step: $__interval * 10

# Use query parameter hints: curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=1h&timeout=60s' ```

Step 9: Split Complex Queries

```bash # Instead of one complex query, split into multiple:

# Complex query (slow): sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job)

# Split approach: # Query 1: Total requests sum(rate(http_requests_total[5m])) by (job)

# Query 2: Error requests sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)

# Calculate ratio in Grafana with transform

# Or use recording rule: - record: job:error_rate:ratio5m expr: | sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job) ```

Step 10: Monitor Query Performance

```bash # Check Prometheus metrics curl 'http://prometheus:9090/metrics' | grep prometheus_engine

# Key metrics: # prometheus_engine_query_duration_seconds: query latency # prometheus_engine_queries_concurrent_max: max concurrent # prometheus_engine_queries: total queries

# Create monitoring dashboard: cat << 'EOF' > /etc/prometheus/alert_rules.yml groups: - name: prometheus_query rules: - alert: PrometheusQuerySlow expr: histogram_quantile(0.9, rate(prometheus_engine_query_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "Prometheus queries are slow (>10s at 90th percentile)"

  • alert: PrometheusQueryTimeout
  • expr: rate(prometheus_engine_query_duration_seconds_sum{quantile="timeout"}[5m]) > 0
  • for: 1m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Prometheus query timeouts detected"
  • EOF

# Add to prometheus.yml: rule_files: - '/etc/prometheus/alert_rules.yml'

# Restart Prometheus systemctl restart prometheus ```

Prometheus Query Timeout Checklist

CheckCommandExpected
Time rangequery params< 24h for complex
Cardinality/api/v1/metadata< 10000 per metric
Timeout config--query.timeoutAppropriate
Recording rules/api/v1/rulesPre-computed metrics
Memory usageps aux< 80% limit
Disk I/OiostatLow latency

Verification

```bash # After optimizing queries

# 1. Test problematic query curl 'http://prometheus:9090/api/v1/query_range?query=http_requests:rate5m&start=now-1h&end=now' // Returns result within timeout

# 2. Check query latency curl 'http://prometheus:9090/metrics' | grep prometheus_engine_query_duration_seconds_bucket // P90 latency < 5s

# 3. Verify no timeouts journalctl -u prometheus | grep timeout // No recent timeout errors

# 4. Test Grafana dashboard # Open dashboard in browser // All panels render correctly

# 5. Check recording rules curl 'http://prometheus:9090/api/v1/rules' | jq '.data.groups[].rules[].name' // Pre-computed metrics available

# 6. Monitor resource usage top -p $(pgrep prometheus) // Stable CPU/memory during queries ```

  • [Fix Prometheus Scrape Target Down](/articles/fix-prometheus-scrape-target-down)
  • [Fix Prometheus Memory Limit](/articles/fix-prometheus-memory-limit)
  • [Fix Prometheus High Cardinality](/articles/fix-prometheus-metrics-cardinality-too-high-deep)
  • [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
  • [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
  • [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
  • [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
  • [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Prometheus Query Timeout", "description": "Troubleshoot Prometheus query timeout. Adjust settings, optimize queries, reduce cardinality.", "url": "https://www.fixwikihub.com/fix-prometheus-query-timeout", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-05T13:34:27.645Z", "dateModified": "2026-04-05T13:34:27.645Z" } </script>