# Fix Thanos Sidecar Upload Failures

You're monitoring Thanos metrics and seeing thanos_sidecar_upload_failures_total increasing, or your Prometheus blocks aren't being uploaded to object storage. The Thanos sidecar is responsible for uploading Prometheus TSDB blocks to object storage for long-term retention.

Introduction

  1. 1.The Thanos sidecar runs alongside Prometheus and:
  2. 2.Uploads TSDB blocks to object storage (S3, GCS, Azure)
  3. 3.Serves Prometheus data via the StoreAPI
  4. 4.Exposes Prometheus as a Prometheus remote write receiver

When uploads fail, you lose long-term metric retention.

Symptoms

Common error messages include:

```bash # Kubernetes kubectl logs -l app=thanos-sidecar -n monitoring

# Docker docker logs thanos-sidecar

# Systemd journalctl -u thanos-sidecar -f ```

```bash # Query Prometheus for upload failures curl -s 'http://localhost:9090/api/v1/query?query=thanos_sidecar_upload_failures_total' | jq

# Check upload success rate curl -s 'http://localhost:9090/api/v1/query?query=rate(thanos_sidecar_upload_successes_total[5m])' | jq

# Check queued blocks curl -s 'http://localhost:9090/api/v1/query?query=thanos_sidecar_queue_length' | jq ```

```bash # For S3 aws s3 ls s3://your-thanos-bucket/

# For GCS gsutil ls gs://your-thanos-bucket/

# Test bucket access aws s3api head-bucket --bucket your-thanos-bucket ```

Common Causes

  • Configuration misconfiguration
  • Missing or incorrect credentials
  • Network connectivity issues
  • Version compatibility problems
  • Resource exhaustion or limits
  • Permission or access denied

Step-by-Step Fix

Check Thanos sidecar logs:

```bash # Kubernetes kubectl logs -l app=thanos-sidecar -n monitoring

# Docker docker logs thanos-sidecar

# Systemd journalctl -u thanos-sidecar -f ```

Check upload metrics:

```bash # Query Prometheus for upload failures curl -s 'http://localhost:9090/api/v1/query?query=thanos_sidecar_upload_failures_total' | jq

# Check upload success rate curl -s 'http://localhost:9090/api/v1/query?query=rate(thanos_sidecar_upload_successes_total[5m])' | jq

# Check queued blocks curl -s 'http://localhost:9090/api/v1/query?query=thanos_sidecar_queue_length' | jq ```

Check object storage connectivity:

```bash # For S3 aws s3 ls s3://your-thanos-bucket/

# For GCS gsutil ls gs://your-thanos-bucket/

# Test bucket access aws s3api head-bucket --bucket your-thanos-bucket ```

Common Causes and Solutions

Cause 1: Object Storage Credentials Invalid

bash
# Error: Access Denied, Invalid credentials

Solution: Verify credentials configuration:

yaml
# Kubernetes secret for S3
apiVersion: v1
kind: Secret
metadata:
  name: thanos-object-storage
type: Opaque
stringData:
  object-storage.yaml: |
    type: S3
    config:
      bucket: your-thanos-bucket
      endpoint: s3.amazonaws.com
      region: us-east-1
      access_key: AKIAIOSFODNN7EXAMPLE
      secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Test credentials:

```bash # Verify AWS credentials aws sts get-caller-identity

# Test bucket write aws s3 cp /tmp/test.txt s3://your-thanos-bucket/test.txt aws s3 rm s3://your-thanos-bucket/test.txt ```

Cause 2: Bucket Permissions Missing

bash
# Error: 403 Forbidden

Solution: Add proper bucket policy:

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/thanos-role"
      },
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-thanos-bucket",
        "arn:aws:s3:::your-thanos-bucket/*"
      ]
    }
  ]
}

Apply policy:

bash
aws s3api put-bucket-policy \
  --bucket your-thanos-bucket \
  --policy file://bucket-policy.json

Cause 3: Network Connectivity Issues

bash
# Error: Connection timeout, Network unreachable

Solution: Check network configuration:

```bash # Test connectivity curl -I https://s3.amazonaws.com

# Check DNS resolution nslookup s3.amazonaws.com

# For Kubernetes, check egress kubectl run test --image=busybox --rm -it --restart=Never -- curl -I https://s3.amazonaws.com ```

For private endpoints:

yaml
# object-storage.yaml
type: S3
config:
  bucket: your-thanos-bucket
  endpoint: s3.internal.company.com  # Private endpoint
  region: us-east-1
  insecure: false
  signature_version2: false

Cause 4: Prometheus Not Ready

bash
# Error: Prometheus not ready, TSDB not initialized

Solution: Ensure Prometheus is fully started:

```bash # Check Prometheus health curl http://localhost:9090/-/healthy

# Check TSDB status curl http://localhost:9090/api/v1/status/tsdb

# Verify Prometheus is ready before sidecar starts # In Kubernetes, use init container or readiness probe ```

yaml
# Kubernetes deployment
spec:
  containers:
  - name: prometheus
    readinessProbe:
      httpGet:
        path: /-/ready
        port: 9090
      initialDelaySeconds: 30
  - name: thanos-sidecar
    # Wait for Prometheus

Cause 5: Block Upload Timeout

bash
# Error: Upload timeout, context deadline exceeded

Solution: Increase upload timeout:

yaml
# Thanos sidecar configuration
--upload.timeout=30m
--upload.wait-interval=5s
--upload.max-upload-timeout=1h

Or in Kubernetes:

yaml
args:
  - sidecar
  - --prometheus.url=http://localhost:9090
  - --objstore.config-file=/etc/thanos/object-storage.yaml
  - --upload.timeout=30m

Cause 6: Disk Space Issues

bash
# Error: No space left on device

Solution: Check and clean disk space:

```bash # Check Prometheus data directory df -h /var/lib/prometheus

# Check TSDB blocks ls -la /var/lib/prometheus/data/

# Clean old blocks if needed # Thanos should handle this, but check retention curl http://localhost:9090/api/v1/status/tsdb | jq '.data.headGC' ```

Configure retention:

yaml
# Prometheus configuration
storage:
  tsdb:
    retention.time: 15d
    retention.size: 50GB

Cause 7: Concurrent Upload Conflicts

bash
# Error: Block already exists, Conflict

Solution: This is usually harmless - Thanos handles conflicts:

```bash # Check if blocks are being uploaded by multiple sidecars kubectl get pods -l app=thanos-sidecar

# Ensure unique instance labels # Each Prometheus should have unique external_labels ```

yaml
# Prometheus configuration
global:
  external_labels:
    cluster: 'prod'
    replica: 'prometheus-1'  # Unique per instance

Cause 8: Invalid Block Data

bash
# Error: Invalid block, checksum mismatch

Solution: Verify TSDB integrity:

```bash # Check Prometheus TSDB promtool tsdb check /var/lib/prometheus/data

# List blocks promtool tsdb list /var/lib/prometheus/data

# Verify specific block promtool tsdb verify /var/lib/prometheus/data/<block-id> ```

Cause 9: Thanos Version Mismatch

bash
# Error: Incompatible version, unsupported format

Solution: Ensure Thanos and Prometheus versions are compatible:

```bash # Check Thanos version thanos --version

# Check Prometheus version prometheus --version

# Recommended: Thanos v0.30+ with Prometheus v2.40+ ```

Complete Thanos Sidecar Configuration

Kubernetes Deployment

```yaml apiVersion: apps/v1 kind: Deployment metadata: name: prometheus spec: replicas: 1 selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: containers: - name: prometheus image: prom/prometheus:v2.45.0 args: - --config.file=/etc/prometheus/prometheus.yml - --storage.tsdb.path=/var/lib/prometheus/data - --storage.tsdb.retention.time=15d - --web.enable-remote-write-receiver - --web.enable-lifecycle ports: - containerPort: 9090 volumeMounts: - name: config mountPath: /etc/prometheus - name: data mountPath: /var/lib/prometheus/data readinessProbe: httpGet: path: /-/ready port: 9090 initialDelaySeconds: 30

  • name: thanos-sidecar
  • image: thanosio/thanos:v0.32.0
  • args:
  • - sidecar
  • - --prometheus.url=http://localhost:9090
  • - --objstore.config-file=/etc/thanos/object-storage.yaml
  • - --grpc-address=0.0.0.0:10901
  • - --http-address=0.0.0.0:10902
  • - --upload.timeout=30m
  • - --shipper.upload.compaction-timeout=5m
  • ports:
  • - containerPort: 10901
  • name: grpc
  • - containerPort: 10902
  • name: http
  • volumeMounts:
  • - name: object-storage
  • mountPath: /etc/thanos
  • - name: data
  • mountPath: /var/lib/prometheus/data

volumes: - name: config configMap: name: prometheus-config - name: data emptyDir: {} - name: object-storage secret: secretName: thanos-object-storage ```

Prometheus Configuration

```yaml global: external_labels: cluster: 'prod' replica: 'prometheus-1'

scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']

remote_write: - url: http://thanos-receive:10908/api/v1/receive ```

Verification

After fixing issues:

```bash # Check upload success rate curl -s 'http://thanos-sidecar:10902/api/v1/query?query=rate(thanos_sidecar_upload_successes_total[5m])'

# Verify no failures curl -s 'http://thanos-sidecar:10902/api/v1/query?query=thanos_sidecar_upload_failures_total'

# Check blocks in object storage aws s3 ls s3://your-thanos-bucket/ | grep "meta.json"

# Verify Thanos query can access data curl -s 'http://thanos-query:10902/api/v1/query?query=up' | jq ```

Monitoring Alerts

```yaml # Alert for upload failures groups: - name: thanos-sidecar rules: - alert: ThanosSidecarUploadFailures expr: rate(thanos_sidecar_upload_failures_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Thanos sidecar upload failures" description: "Thanos sidecar {{ $labels.instance }} is failing to upload blocks"

  • alert: ThanosSidecarUploadQueueGrowing
  • expr: thanos_sidecar_queue_length > 10
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Thanos sidecar upload queue growing"
  • description: "Upload queue length is {{ $value }}"
  • alert: ThanosSidecarNoUploads
  • expr: rate(thanos_sidecar_upload_successes_total[1h]) == 0
  • for: 1h
  • labels:
  • severity: warning
  • annotations:
  • summary: "No Thanos sidecar uploads"
  • description: "No blocks uploaded in the last hour"
  • `
  • [WordPress troubleshooting: Fix IAM Timeout Error - Complete Trouble](fix-iam-timeout-error)
  • [Technical troubleshooting: Fix Cloudwatch Alarm Not Triggering Issue in Monit](cloudwatch-alarm-not-triggering)
  • [Fix Datadog Agent Not Sending Metrics Issue in Monitoring](datadog-agent-not-sending-metrics)
  • [Fix Elasticsearch Cluster Red Yellow Status Issue in Monitoring](elasticsearch-cluster-red-yellow-status)
  • [Fix Alertmanager Notification Failed](fix-alertmanager-notification-failed)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Thanos Sidecar Upload Failures", "description": "Step-by-step guide to fix Thanos sidecar upload failures. Configure object storage, resolve upload errors, and ensure long-term Prometheus metrics storage.", "url": "https://www.fixwikihub.com/fix-thanos-sidecar-upload-failures", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-27T10:08:00.000Z", "dateModified": "2026-04-27T10:08:00.000Z" } </script>