Introduction
Prometheus queries timeout when the query takes too long to execute. This happens with large time ranges, high cardinality metrics, or complex PromQL expressions.
Symptoms
Query timeout:
```bash $ curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=15s'
{ "status": "error", "errorType": "timeout", "error": "query timed out after 30 seconds" } ```
Grafana dashboard error:
Error: Prometheus query timed outLog messages:
```bash $ journalctl -u prometheus | grep timeout
prometheus: query timed out: "rate(http_requests_total[5m])" prometheus: querier timeout after 30s ```
Common Causes
- 1.Large time range - Querying weeks/months of data
- 2.High cardinality - Too many unique label combinations
- 3.Complex PromQL - Nested functions, joins, aggregations
- 4.Insufficient resources - CPU/memory limits too low
- 5.Slow storage - SSD vs HDD performance difference
- 6.Concurrent queries - Too many simultaneous queries
Step-by-Step Fix
- 1.Check logs for specific error messages
- 2.Verify configuration settings
- 3.Test network connectivity
- 4.Review recent changes
- 5.Apply corrective action
- 6.Verify the fix
Step 1: Check Query Scope
```bash # Check query time range curl 'http://prometheus:9090/api/v1/query_range?query=up&start=now-1h&end=now&step=60s'
# Reduce time range for testing curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-1h&end=now'
# Check query complexity # Simple query - fast curl 'http://prometheus:9090/api/v1/query?query=up'
# Complex query - slower curl 'http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total[5m])) by (job)'
# Check query result size curl 'http://prometheus:9090/api/v1/query?query=http_requests_total' | jq '.data.result | length'
# If result > 10000 series, query may timeout ```
Step 2: Check Cardinality
```bash # Check high cardinality metrics curl 'http://prometheus:9090/api/v1/labels' | jq '.data[]' | while read label; do curl "http://prometheus:9090/api/v1/label/$label/values" | jq '.data | length' done
# Check series count per metric curl 'http://prometheus:9090/api/v1/query?query={__name__=~".+"}' | jq '.data.result | length'
# Find metrics with highest cardinality curl 'http://prometheus:9090/api/v1/metadata' | jq '.data | to_entries | sort_by(.value[].cardinality) | reverse | .[] | {metric: .key, cardinality: .value[].cardinality}'
# Check label value explosion curl 'http://prometheus:9090/api/v1/label/user_id/values' | jq '.data | length' # If user_id has thousands of values, queries will be slow
# Drop high cardinality labels in scrape config: scrape_configs: - job_name: 'my-app' metric_relabel_configs: - source_labels: [user_id] regex: '.+' action: drop ```
Step 3: Adjust Timeout Settings
```bash # Check current timeout curl 'http://prometheus:9090/api/v1/status/config' | jq '.data.query_timeout'
# Default: 2 minutes for instant queries, 30s for range queries
# In prometheus.yml: global: evaluation_interval: 15s scrape_interval: 15s
# Increase timeout (not recommended as primary fix): # In Prometheus startup flags: --query.timeout=5m --query.max-samples=50000000
# Or in prometheus.yml (Prometheus 2.39+): query: timeout: 5m max_samples: 50000000 max_concurrency: 20
# Restart Prometheus systemctl restart prometheus ```
Step 4: Optimize PromQL Queries
```bash # Slow query (full scan each evaluation): rate(http_requests_total[5m])
# Faster with recording rule: # In rules.yml: groups: - name: http_requests rules: - record: http_requests:rate5m expr: rate(http_requests_total[5m])
# Then query pre-computed metric: http_requests:rate5m
# Avoid nested rate: # Slow: rate(sum(http_requests_total)[5m])
# Fast: sum(rate(http_requests_total[5m]))
# Use label filters early: # Slow: sum(rate(http_requests_total[5m])) by (job)
# Fast: sum(rate(http_requests_total{job="api"}[5m])) by (job)
# Reduce resolution for large time ranges: # Query with 1h step instead of 15s curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=1h' ```
Step 5: Add Recording Rules
```bash # Create recording rules file cat << 'EOF' > /etc/prometheus/recording_rules.yml groups: - name: precomputed_metrics interval: 30s rules: # Pre-compute common queries - record: job:http_requests:rate5m expr: sum(rate(http_requests_total[5m])) by (job)
- record: instance:http_requests:rate5m
- expr: rate(http_requests_total[5m])
# Pre-aggregate for faster queries - record: namespace:cpu_usage:rate5m expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) EOF
# Add to prometheus.yml: rule_files: - '/etc/prometheus/recording_rules.yml'
# Restart Prometheus systemctl restart prometheus
# Verify rules loaded curl 'http://prometheus:9090/api/v1/rules' | jq '.data.groups[].name' ```
Step 6: Increase Resources
```bash # Check Prometheus memory usage ps aux | grep prometheus
# Check memory limits curl 'http://prometheus:9090/api/v1/status/tsdb' | jq '.data.headStats'
# For systemd service: cat /etc/systemd/system/prometheus.service
# Increase memory limit: MemoryLimit=8G
# Or in Docker: docker update --memory 8g prometheus
# For Kubernetes: resources: limits: memory: 8Gi cpu: 4 requests: memory: 4Gi cpu: 2
# Check CPU usage during query top -p $(pgrep prometheus)
# Increase query concurrency: --query.max-concurrency=20 ```
Step 7: Optimize Storage
```bash # Check storage performance iostat -x 1 10
# Check Prometheus data directory ls -la /var/lib/prometheus/data/
# Check TSDB stats curl 'http://prometheus:9090/api/v1/status/tsdb' | jq
# Key metrics: # - numSeries: number of active series # - numChunks: number of chunks # - headStats: memory usage
# Reduce retention to limit data size: --storage.tsdb.retention.time=15d
# Or set retention size: --storage.tsdb.retention.size=10GB
# Enable compression: --storage.tsdb.wal-compression
# Check disk usage df -h /var/lib/prometheus ```
Step 8: Use Query Hints
```bash # Grafana query hints: # In Grafana dashboard panel settings: # Query options: # - Max data points: 1000 (reduce resolution) # - Min step: 1m (minimum resolution)
# Use $__interval variable for dynamic step: rate(http_requests_total[$__interval])
# For large time ranges, increase step: # Query: rate(http_requests_total[5m]) # Step: $__interval * 10
# Use query parameter hints: curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=1h&timeout=60s' ```
Step 9: Split Complex Queries
```bash # Instead of one complex query, split into multiple:
# Complex query (slow): sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job)
# Split approach: # Query 1: Total requests sum(rate(http_requests_total[5m])) by (job)
# Query 2: Error requests sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
# Calculate ratio in Grafana with transform
# Or use recording rule: - record: job:error_rate:ratio5m expr: | sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job) ```
Step 10: Monitor Query Performance
```bash # Check Prometheus metrics curl 'http://prometheus:9090/metrics' | grep prometheus_engine
# Key metrics: # prometheus_engine_query_duration_seconds: query latency # prometheus_engine_queries_concurrent_max: max concurrent # prometheus_engine_queries: total queries
# Create monitoring dashboard: cat << 'EOF' > /etc/prometheus/alert_rules.yml groups: - name: prometheus_query rules: - alert: PrometheusQuerySlow expr: histogram_quantile(0.9, rate(prometheus_engine_query_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "Prometheus queries are slow (>10s at 90th percentile)"
- alert: PrometheusQueryTimeout
- expr: rate(prometheus_engine_query_duration_seconds_sum{quantile="timeout"}[5m]) > 0
- for: 1m
- labels:
- severity: critical
- annotations:
- summary: "Prometheus query timeouts detected"
- EOF
# Add to prometheus.yml: rule_files: - '/etc/prometheus/alert_rules.yml'
# Restart Prometheus systemctl restart prometheus ```
Prometheus Query Timeout Checklist
| Check | Command | Expected |
|---|---|---|
| Time range | query params | < 24h for complex |
| Cardinality | /api/v1/metadata | < 10000 per metric |
| Timeout config | --query.timeout | Appropriate |
| Recording rules | /api/v1/rules | Pre-computed metrics |
| Memory usage | ps aux | < 80% limit |
| Disk I/O | iostat | Low latency |
Verification
```bash # After optimizing queries
# 1. Test problematic query curl 'http://prometheus:9090/api/v1/query_range?query=http_requests:rate5m&start=now-1h&end=now' // Returns result within timeout
# 2. Check query latency curl 'http://prometheus:9090/metrics' | grep prometheus_engine_query_duration_seconds_bucket // P90 latency < 5s
# 3. Verify no timeouts journalctl -u prometheus | grep timeout // No recent timeout errors
# 4. Test Grafana dashboard # Open dashboard in browser // All panels render correctly
# 5. Check recording rules curl 'http://prometheus:9090/api/v1/rules' | jq '.data.groups[].rules[].name' // Pre-computed metrics available
# 6. Monitor resource usage top -p $(pgrep prometheus) // Stable CPU/memory during queries ```
Related Issues
- [Fix Prometheus Scrape Target Down](/articles/fix-prometheus-scrape-target-down)
- [Fix Prometheus Memory Limit](/articles/fix-prometheus-memory-limit)
- [Fix Prometheus High Cardinality](/articles/fix-prometheus-metrics-cardinality-too-high-deep)
Related Articles
- [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
- [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
- [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
- [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
- [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Prometheus Query Timeout", "description": "Troubleshoot Prometheus query timeout. Adjust settings, optimize queries, reduce cardinality.", "url": "https://www.fixwikihub.com/fix-prometheus-query-timeout", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-05T13:34:27.645Z", "dateModified": "2026-04-05T13:34:27.645Z" } </script>