Home / Monitoring / Fix Prometheus Query Timeout

Monitoring

Fix Prometheus Query Timeout

Resolve Prometheus query timeout issues by adjusting timeout settings, optimizing queries, and reducing data cardinality.

Published: Apr 5, 20267 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

Introduction

Prometheus queries timeout when the query takes too long to execute. This happens with large time ranges, high cardinality metrics, or complex PromQL expressions.

Symptoms

Query timeout:

```bash $ curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=15s'

{ "status": "error", "errorType": "timeout", "error": "query timed out after 30 seconds" } ```

Grafana dashboard error:

bash

Error: Prometheus query timed out

Log messages:

```bash $ journalctl -u prometheus | grep timeout

prometheus: query timed out: "rate(http_requests_total[5m])" prometheus: querier timeout after 30s ```

Common Causes

1.Large time range - Querying weeks/months of data
2.High cardinality - Too many unique label combinations
3.Complex PromQL - Nested functions, joins, aggregations
4.Insufficient resources - CPU/memory limits too low
5.Slow storage - SSD vs HDD performance difference
6.Concurrent queries - Too many simultaneous queries

Step-by-Step Fix

1.Check logs for specific error messages
2.Verify configuration settings
3.Test network connectivity
4.Review recent changes
5.Apply corrective action
6.Verify the fix

Step 1: Check Query Scope

```bash # Check query time range curl 'http://prometheus:9090/api/v1/query_range?query=up&start=now-1h&end=now&step=60s'

# Reduce time range for testing curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-1h&end=now'

# Check query complexity # Simple query - fast curl 'http://prometheus:9090/api/v1/query?query=up'

# Complex query - slower curl 'http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total[5m])) by (job)'

# Check query result size curl 'http://prometheus:9090/api/v1/query?query=http_requests_total' | jq '.data.result | length'

# If result > 10000 series, query may timeout ```

Step 2: Check Cardinality

```bash # Check high cardinality metrics curl 'http://prometheus:9090/api/v1/labels' | jq '.data[]' | while read label; do curl "http://prometheus:9090/api/v1/label/$label/values" | jq '.data | length' done

# Check series count per metric curl 'http://prometheus:9090/api/v1/query?query={__name__=~".+"}' | jq '.data.result | length'

# Find metrics with highest cardinality curl 'http://prometheus:9090/api/v1/metadata' | jq '.data | to_entries | sort_by(.value[].cardinality) | reverse | .[] | {metric: .key, cardinality: .value[].cardinality}'

# Check label value explosion curl 'http://prometheus:9090/api/v1/label/user_id/values' | jq '.data | length' # If user_id has thousands of values, queries will be slow

# Drop high cardinality labels in scrape config: scrape_configs: - job_name: 'my-app' metric_relabel_configs: - source_labels: [user_id] regex: '.+' action: drop ```

Step 3: Adjust Timeout Settings

```bash # Check current timeout curl 'http://prometheus:9090/api/v1/status/config' | jq '.data.query_timeout'

# Default: 2 minutes for instant queries, 30s for range queries

# In prometheus.yml: global: evaluation_interval: 15s scrape_interval: 15s

# Increase timeout (not recommended as primary fix): # In Prometheus startup flags: --query.timeout=5m --query.max-samples=50000000

# Or in prometheus.yml (Prometheus 2.39+): query: timeout: 5m max_samples: 50000000 max_concurrency: 20

# Restart Prometheus systemctl restart prometheus ```

Step 4: Optimize PromQL Queries

```bash # Slow query (full scan each evaluation): rate(http_requests_total[5m])

# Faster with recording rule: # In rules.yml: groups: - name: http_requests rules: - record: http_requests:rate5m expr: rate(http_requests_total[5m])

# Then query pre-computed metric: http_requests:rate5m

# Avoid nested rate: # Slow: rate(sum(http_requests_total)[5m])

# Fast: sum(rate(http_requests_total[5m]))

# Use label filters early: # Slow: sum(rate(http_requests_total[5m])) by (job)

# Fast: sum(rate(http_requests_total{job="api"}[5m])) by (job)

# Reduce resolution for large time ranges: # Query with 1h step instead of 15s curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=1h' ```

Step 5: Add Recording Rules

```bash # Create recording rules file cat << 'EOF' > /etc/prometheus/recording_rules.yml groups: - name: precomputed_metrics interval: 30s rules: # Pre-compute common queries - record: job:http_requests:rate5m expr: sum(rate(http_requests_total[5m])) by (job)

record: instance:http_requests:rate5m
expr: rate(http_requests_total[5m])

# Pre-aggregate for faster queries - record: namespace:cpu_usage:rate5m expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) EOF

# Add to prometheus.yml: rule_files: - '/etc/prometheus/recording_rules.yml'

# Restart Prometheus systemctl restart prometheus

# Verify rules loaded curl 'http://prometheus:9090/api/v1/rules' | jq '.data.groups[].name' ```

Step 6: Increase Resources

```bash # Check Prometheus memory usage ps aux | grep prometheus

# Check memory limits curl 'http://prometheus:9090/api/v1/status/tsdb' | jq '.data.headStats'

# For systemd service: cat /etc/systemd/system/prometheus.service

# Increase memory limit: MemoryLimit=8G

# Or in Docker: docker update --memory 8g prometheus

# For Kubernetes: resources: limits: memory: 8Gi cpu: 4 requests: memory: 4Gi cpu: 2

# Check CPU usage during query top -p $(pgrep prometheus)

# Increase query concurrency: --query.max-concurrency=20 ```

Step 7: Optimize Storage

```bash # Check storage performance iostat -x 1 10

# Check Prometheus data directory ls -la /var/lib/prometheus/data/

# Check TSDB stats curl 'http://prometheus:9090/api/v1/status/tsdb' | jq

# Key metrics: # - numSeries: number of active series # - numChunks: number of chunks # - headStats: memory usage

# Reduce retention to limit data size: --storage.tsdb.retention.time=15d

# Or set retention size: --storage.tsdb.retention.size=10GB

# Enable compression: --storage.tsdb.wal-compression

# Check disk usage df -h /var/lib/prometheus ```

Step 8: Use Query Hints

```bash # Grafana query hints: # In Grafana dashboard panel settings: # Query options: # - Max data points: 1000 (reduce resolution) # - Min step: 1m (minimum resolution)

# Use $__interval variable for dynamic step: rate(http_requests_total[$__interval])

# For large time ranges, increase step: # Query: rate(http_requests_total[5m]) # Step: $__interval * 10

# Use query parameter hints: curl 'http://prometheus:9090/api/v1/query_range?query=rate(http_requests_total[5m])&start=now-7d&end=now&step=1h&timeout=60s' ```

Step 9: Split Complex Queries

```bash # Instead of one complex query, split into multiple:

# Complex query (slow): sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job)

# Split approach: # Query 1: Total requests sum(rate(http_requests_total[5m])) by (job)

# Query 2: Error requests sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)

# Calculate ratio in Grafana with transform

# Or use recording rule: - record: job:error_rate:ratio5m expr: | sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job) ```

Step 10: Monitor Query Performance

```bash # Check Prometheus metrics curl 'http://prometheus:9090/metrics' | grep prometheus_engine

# Key metrics: # prometheus_engine_query_duration_seconds: query latency # prometheus_engine_queries_concurrent_max: max concurrent # prometheus_engine_queries: total queries

# Create monitoring dashboard: cat << 'EOF' > /etc/prometheus/alert_rules.yml groups: - name: prometheus_query rules: - alert: PrometheusQuerySlow expr: histogram_quantile(0.9, rate(prometheus_engine_query_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "Prometheus queries are slow (>10s at 90th percentile)"

alert: PrometheusQueryTimeout
expr: rate(prometheus_engine_query_duration_seconds_sum{quantile="timeout"}[5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Prometheus query timeouts detected"
EOF

# Add to prometheus.yml: rule_files: - '/etc/prometheus/alert_rules.yml'

# Restart Prometheus systemctl restart prometheus ```

Prometheus Query Timeout Checklist

Check	Command	Expected
Time range	query params	< 24h for complex
Cardinality	/api/v1/metadata	< 10000 per metric
Timeout config	--query.timeout	Appropriate
Recording rules	/api/v1/rules	Pre-computed metrics
Memory usage	ps aux	< 80% limit
Disk I/O	iostat	Low latency

Verification

```bash # After optimizing queries

# 1. Test problematic query curl 'http://prometheus:9090/api/v1/query_range?query=http_requests:rate5m&start=now-1h&end=now' // Returns result within timeout

# 2. Check query latency curl 'http://prometheus:9090/metrics' | grep prometheus_engine_query_duration_seconds_bucket // P90 latency < 5s

# 3. Verify no timeouts journalctl -u prometheus | grep timeout // No recent timeout errors

# 4. Test Grafana dashboard # Open dashboard in browser // All panels render correctly

# 5. Check recording rules curl 'http://prometheus:9090/api/v1/rules' | jq '.data.groups[].rules[].name' // Pre-computed metrics available

# 6. Monitor resource usage top -p $(pgrep prometheus) // Stable CPU/memory during queries ```

[Fix Prometheus Scrape Target Down](/articles/fix-prometheus-scrape-target-down)
[Fix Prometheus Memory Limit](/articles/fix-prometheus-memory-limit)
[Fix Prometheus High Cardinality](/articles/fix-prometheus-metrics-cardinality-too-high-deep)

[WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
[Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
[Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
[Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
[Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

Prometheus Query Timeout Prometheus Query Timeout Monitoring Prometheus Query Timeout troubleshooting Prometheus Query Timeout fix Resolve Prometheus query timeout issues by adjusting timeout settings, optimizing queries, and reducing data cardinality Monitoring Resolve Prometheus query timeout issues by adjusting timeout settings, optimizing queries, and reducing data cardinality

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Monitoring Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this monitoring-errors troubleshooting guide applies to my situation?

This guide is designed for monitoring-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these monitoring-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of monitoring-errors issue?

Most monitoring-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this monitoring-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for monitoring-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Apr 5, 2026

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation

Fix Prometheus Query Timeout

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Check Query Scope

Step 2: Check Cardinality

Step 3: Adjust Timeout Settings

Step 4: Optimize PromQL Queries

Step 5: Add Recording Rules

Step 6: Increase Resources

Step 7: Optimize Storage

Step 8: Use Query Hints

Step 9: Split Complex Queries

Step 10: Monitor Query Performance

Prometheus Query Timeout Checklist

Verification

People also search for

Browse Guides from Other Categories

WordPress

SSL

DNS

Monitoring Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading

Fix Prometheus Query Timeout

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Check Query Scope

Step 2: Check Cardinality

Step 3: Adjust Timeout Settings

Step 4: Optimize PromQL Queries

Step 5: Add Recording Rules

Step 6: Increase Resources

Step 7: Optimize Storage

Step 8: Use Query Hints

Step 9: Split Complex Queries

Step 10: Monitor Query Performance

Prometheus Query Timeout Checklist

Verification

Related Issues

Related Articles

People also search for

Share this guide

More Monitoring Troubleshooting Guides

Browse Guides from Other Categories

Monitoring Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading