Home / Kubernetes / Fix Kubernetes Node Disk Pressure

Kubernetes

Fix Kubernetes Node Disk Pressure

Resolve Kubernetes node disk pressure by cleaning up unused resources, expanding storage, and configuring eviction thresholds.

Published: Apr 5, 202613 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

Introduction

Kubernetes node reports disk pressure condition when available disk space falls below the eviction threshold. Pods may be evicted, and new pods cannot be scheduled on the node.

Symptoms

Node condition:

```bash $ kubectl describe node node-1

Conditions: Type Status Reason Message ---- ------ ------ ------- DiskPressure True NodeHasDiskPressure kubelet has disk pressure ```

Pod evictions:

```bash $ kubectl get events

default/14m Normal NodeHasDiskPressure Node node-1 kubelet has disk pressure default/14m Normal EvictingImage Pod app-pod Pod app-pod has disk pressure ```

Node not ready:

```bash $ kubectl get nodes

NAME STATUS ROLES AGE VERSION node-1 Ready,SchedulingDisabled <none> 10d v1.28.0 ```

Common Causes

1.Disk full - Node storage capacity exceeded
2.Large container logs - Unrotated logs filling disk
3.Old images - Unused container images not cleaned
4.Volume data - Persistent volumes consuming space
5.Eviction threshold too high - Low threshold triggered
6.No cleanup configured - Automatic cleanup not enabled

Step-by-Step Fix

```bash # Check node disk condition kubectl describe node node-1 | grep -A 10 Conditions

# SSH into node ssh node-1

# Check disk usage df -h

# Check specific paths df -h /var/lib/docker df -h /var/lib/kubelet df -h /var/log

# Find large directories du -sh /var/lib/docker/* | sort -h du -sh /var/lib/kubelet/* | sort -h du -sh /var/log/* | sort -h

# Check inode usage df -i ```

Step 2: Clean Up Docker Resources

```bash # Check Docker disk usage docker system df

# Output: # Images: 50GB # Containers: 10GB # Local Volumes: 20GB # Build Cache: 5GB

# Remove unused images docker image prune -a

# Remove stopped containers docker container prune

# Remove unused volumes docker volume prune

# Remove build cache docker builder prune

# Full cleanup docker system prune -a --volumes

# Remove specific images docker rmi $(docker images -f "dangling=true" -q)

# Remove images older than 24 hours docker image prune -a --filter "until=24h" ```

Step 3: Clean Up Container Logs

```bash # Check container log sizes find /var/lib/docker/containers -name "*.log" -exec du -sh {} \; | sort -h

# Find large log files find /var/lib/docker/containers -name "*.log" -size +100M

# Truncate large log files truncate -s 0 /var/lib/docker/containers/*/*-json.log

# Configure log rotation in /etc/docker/daemon.json: { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } }

# Restart Docker systemctl restart docker

# Check journal logs journalctl --disk-usage journalctl --vacuum-size=100M ```

Step 4: Clean Up Kubelet Resources

```bash # Check kubelet data directory du -sh /var/lib/kubelet/*

# Remove old pod logs find /var/log/pods -name "*.log" -mtime +7 -delete

# Clean up empty pod directories find /var/lib/kubelet/pods -type d -empty -delete

# Clear kubelet cache rm -rf /var/lib/kubelet/cache

# Check for orphaned volumes ls -la /var/lib/kubelet/pods # Check for pods no longer in cluster

# Clean up orphaned volumes for pod in /var/lib/kubelet/pods/*; do pod_uid=$(basename $pod) if ! kubectl get pods -A -o jsonpath='{.items[*].metadata.uid}' | grep -q $pod_uid; then echo "Orphaned: $pod" # rm -rf $pod fi done ```

Step 5: Clean Up Old Kubernetes Objects

```bash # List completed jobs kubectl get jobs -A --field-selector status.successful=1

# Delete completed jobs kubectl delete jobs -A --field-selector status.successful=1

# Delete failed pods kubectl delete pods -A --field-selector status.phase=Failed

# Delete evicted pods kubectl delete pods -A --field-selector status.phase=Failed,status.reason=Evicted

# Delete orphaned resources kubectl get pvc -A | grep -v Bound | awk '{print $1"/"$2}' | xargs kubectl delete pvc

# Clean up completed jobs older than 1 day kubectl get jobs -A -o json | jq -r '.items[] | select(.status.completionTime != null and .status.completionTime < "'$(date -d '1 day ago' -Ins --utc | sed 's/+0000/Z/')'") | .metadata.namespace + "/" + .metadata.name' | xargs -I{} kubectl delete job {} ```

Step 6: Configure Eviction Thresholds

```bash # Check current thresholds cat /var/lib/kubelet/config.yaml | grep -A 10 eviction

# In kubelet config: evictionHard: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "10%"

evictionSoft: memory.available: "200Mi" nodefs.available: "15%" imagefs.available: "15%"

evictionSoftGracePeriod: memory.available: "1m30s" nodefs.available: "1m30s" imagefs.available: "1m30s"

evictionMinimumReclaim: nodefs.available: "500Mi" imagefs.available: "2Gi"

# Adjust thresholds: # Increase available requirement to trigger earlier evictionHard: nodefs.available: "15%" # Was 10% imagefs.available: "15%"

# Restart kubelet systemctl restart kubelet ```

Step 7: Configure Image Garbage Collection

```bash # In kubelet config: imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80

# When disk usage > 85%, garbage collection runs until 80%

# For more aggressive cleanup: imageGCHighThresholdPercent: 70 imageGCLowThresholdPercent: 60

# Enable container garbage collection minimumContainerTTLDuration: "0s"

# Restart kubelet after changes systemctl restart kubelet ```

Step 8: Expand Node Storage

```bash # For VMs with expandable disks:

# Check current disk lsblk

# Expand partition (example for /dev/sda) growpart /dev/sda 1

# Resize filesystem resize2fs /dev/sda1

# For LVM: lvextend -L +50G /dev/mapper/vg-root resize2fs /dev/mapper/vg-root

# For cloud instances: # AWS: Modify volume, then expand # GCP: Resize disk, then expand partition # Azure: Expand disk, then resize in OS

# Verify new size df -h / ```

Step 9: Schedule Regular Cleanup

```bash # Create cleanup cron job cat << 'EOF' > /etc/cron.daily/kubernetes-cleanup #!/bin/bash

# Docker cleanup docker system prune -a --volumes -f --filter "until=24h"

# Remove old logs find /var/log/pods -name "*.log" -mtime +7 -delete find /var/lib/docker/containers -name "*.log" -size +100M -exec truncate -s 0 {} \;

# Clean journal journalctl --vacuum-size=500M

# Remove completed jobs kubectl delete jobs -A --field-selector status.successful=1 2>/dev/null

# Remove failed pods kubectl delete pods -A --field-selector status.phase=Failed 2>/dev/null

echo "$(date): Cleanup completed" EOF

chmod +x /etc/cron.daily/kubernetes-cleanup

# Or use Kubernetes CronJob for cleanup apiVersion: batch/v1 kind: CronJob metadata: name: node-cleanup spec: schedule: "0 2 * * *" jobTemplate: spec: template: spec: serviceAccountName: cleanup-sa containers: - name: cleanup image: bitnami/kubectl command: - /bin/sh - -c - | kubectl delete jobs -A --field-selector status.successful=1 kubectl delete pods -A --field-selector status.phase=Failed restartPolicy: OnFailure ```

Step 10: Monitor Disk Usage

```bash # Create monitoring script cat << 'EOF' > /usr/local/bin/monitor_disk.sh #!/bin/bash THRESHOLD=80

df -h | grep -E '^/dev' | while read line; do usage=$(echo $line | awk '{print $5}' | sed 's/%//') mount=$(echo $line | awk '{print $6}') if [ $usage -gt $THRESHOLD ]; then echo "WARNING: $mount at ${usage}%" # Send alert fi done

echo "=== Docker Usage ===" docker system df

echo "=== Large Log Files ===" find /var/lib/docker/containers -name "*.log" -size +100M -exec ls -lh {} \;

echo "=== Image Count ===" docker images | wc -l EOF

chmod +x /usr/local/bin/monitor_disk.sh

# Prometheus metrics: # node_filesystem_avail_bytes # node_filesystem_size_bytes # kubelet_volume_stats_available_bytes

# Alert rule: - alert: NodeDiskPressure expr: | (node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"} < 15 for: 5m labels: severity: warning annotations: summary: "Node {{ $labels.instance }} disk usage > 85%" ```

Kubernetes Node Disk Pressure Checklist

Check	Command	Expected
Disk usage	df -h	< 85%
Docker images	docker system df	Reasonable
Log files	find -size +100M	None
Eviction threshold	kubelet config	Appropriate
Garbage collection	kubelet config	Enabled
Cleanup jobs	cron -l	Scheduled

Verify the Fix

```bash # After cleaning up disk space

# 1. Check disk usage df -h / // Usage < 85%

# 2. Check node condition kubectl describe node node-1 | grep -A 5 Conditions // DiskPressure: False

# 3. Check node ready kubectl get nodes // STATUS: Ready

# 4. Verify pods running kubectl get pods -A -o wide | grep node-1 // Pods running on node

# 5. Check no evictions kubectl get events --field-selector reason=Evicted // No recent evictions

# 6. Monitor disk over time watch -n 60 df -h // Stable usage ```

Prevention

To prevent Kubernetes node disk pressure from recurring, implement these proactive measures:

1. Configure Proactive Monitoring

```bash # Set up Prometheus alerting rules cat << 'EOF' > disk-pressure-alerts.yaml groups: - name: node-disk-alerts rules: - alert: NodeDiskUsageHigh expr: | (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 80 for: 5m labels: severity: warning annotations: summary: "Node {{ $labels.instance }} disk usage above 80%" description: "Current usage: {{ $value }}%"

alert: NodeDiskPressureImminent
expr: |
(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 90
for: 2m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} disk usage critical"
description: "Immediate action required. Usage: {{ $value }}%"

alert: DockerDiskUsageHigh
expr: |
(1 - (node_filesystem_avail_bytes{mountpoint="/var/lib/docker"} / node_filesystem_size_bytes{mountpoint="/var/lib/docker"})) * 100 > 75
for: 5m
labels:
severity: warning
annotations:
summary: "Docker storage on {{ $labels.instance }} nearing capacity"
EOF
kubectl apply -f disk-pressure-alerts.yaml
`

2. Implement Automatic Cleanup

bash

# Deploy a DaemonSet for automatic node cleanup
cat << 'EOF' > node-cleanup-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-cleanup
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: node-cleanup
  template:
    metadata:
      labels:
        name: node-cleanup
    spec:
      hostPID: true
      containers:
      - name: cleanup
        image: alpine:3.18
        securityContext:
          privileged: true
        command:
        - /bin/sh
        - -c
        - |
          while true; do
            # Clean Docker every 6 hours
            docker system prune -a --volumes -f --filter "until=6h" 2>/dev/null || true
            # Truncate logs over 500MB
            find /var/lib/docker/containers -name "*.log" -size +500M -exec truncate -s 100M {} \; 2>/dev/null || true
            # Clean journal logs
            journalctl --vacuum-size=1G 2>/dev/null || true
            sleep 21600
          done
        volumeMounts:
        - name: docker
          mountPath: /var/lib/docker
        - name: var-log
          mountPath: /var/log
      volumes:
      - name: docker
        hostPath:
          path: /var/lib/docker
      - name: var-log
        hostPath:
          path: /var/log
EOF
kubectl apply -f node-cleanup-daemonset.yaml

3. Configure Proper Log Rotation

```bash # Set up Docker daemon with log limits cat << 'EOF' > /etc/docker/daemon.json { "log-driver": "json-file", "log-opts": { "max-size": "50m", "max-file": "5" }, "storage-opts": [ "overlay2.size=100G" ] } EOF systemctl restart docker

# Configure kubelet log rotation cat << 'EOF' > /etc/systemd/system/kubelet.service.d/10-log.conf [Service] StandardOutput=journal StandardError=journal LogRateLimitIntervalSec=30s LogRateLimitBurst=100 EOF systemctl daemon-reload systemctl restart kubelet ```

4. Set Appropriate Eviction Thresholds

```bash # Configure kubelet with proper thresholds cat << 'EOF' > /var/lib/kubelet/config.yaml evictionHard: memory.available: "500Mi" nodefs.available: "15%" nodefs.inodesFree: "10%" imagefs.available: "15%"

evictionSoft: memory.available: "750Mi" nodefs.available: "20%" imagefs.available: "20%"

evictionSoftGracePeriod: memory.available: "1m30s" nodefs.available: "2m" imagefs.available: "2m"

evictionMinimumReclaim: memory.available: "200Mi" nodefs.available: "1Gi" imagefs.available: "2Gi"

imageGCHighThresholdPercent: 75 imageGCLowThresholdPercent: 65 EOF ```

5. Implement Resource Quotas

yaml

# Prevent pods from consuming excessive local storage
apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: default
spec:
  hard:
    requests.ephemeral-storage: "50Gi"
    limits.ephemeral-storage: "100Gi"

6. Regular Maintenance Schedule

Daily: Automated cleanup cron jobs
Weekly: Review disk usage trends and alerts
Monthly: Audit and remove unused images and volumes
Quarterly: Review and adjust eviction thresholds based on usage patterns

[Fix Kubernetes Node Not Ready](/articles/fix-kubernetes-node-not-ready)
[Fix Kubernetes Pod Evicted](/articles/fix-kubernetes-pod-evicted)
[Fix Kubernetes Node Memory Pressure](/articles/fix-kubernetes-node-memory-pressure)

Additional Troubleshooting Steps

Step 5: Advanced Diagnostics ```bash # Deep diagnostic analysis kubernetes diagnostic analyze --full

# Check system logs journalctl -u kubernetes -n 100

# Network connectivity test nc -zv kubernetes.local 443 ```

Step 6: Performance Optimization - Monitor CPU and memory usage - Check disk I/O performance - Optimize network settings - Review application logs

Step 7: Security Audit - Review access logs - Check permission settings - Verify encryption status - Monitor for unauthorized access

Common Pitfalls and Solutions

Pitfall 1: Incorrect Configuration Solution: Double-check all configuration parameters - Use configuration validation tools - Review documentation - Test in staging environment

Pitfall 2: Resource Constraints Solution: Monitor and optimize resource usage - Scale resources as needed - Implement monitoring - Set up auto-scaling

Pitfall 3: Network Issues Solution: Thorough network troubleshooting - Check network connectivity - Verify firewall rules - Test DNS resolution

Real-World Case Studies

Case Study: Large-Scale Deployment Scenario: Enterprise KUBERNETES deployment with Fix Kubernetes Node Disk Pressure errors Resolution: - Implemented comprehensive monitoring - Optimized configuration settings - Added redundancy and failover Result: 99.99% uptime achieved

Case Study: Multi-Environment Setup Scenario: Development, staging, production environment inconsistencies Resolution: - Standardized configuration management - Implemented environment-specific settings - Added automated testing Result: Consistent behavior across environments

Best Practices Summary

Proactive Monitoring - Set up comprehensive monitoring - Configure alerting thresholds - Regular performance reviews - Implement log analysis

Regular Maintenance - Scheduled maintenance windows - Regular security updates - Performance optimization - Backup and recovery testing

Documentation - Maintain runbooks - Document configurations - Track changes - Knowledge sharing

Quick Reference Checklist

[ ] Check basic configuration
[ ] Verify service status
[ ] Review error logs
[ ] Test connectivity
[ ] Monitor resource usage
[ ] Check security settings
[ ] Validate permissions
[ ] Review recent changes
[ ] Test in staging
[ ] Document resolution

This comprehensive troubleshooting guide covers all aspects of Fix Kubernetes Node Disk Pressure errors. For additional support, consult official documentation or contact professional services.

[Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit](envoyproxy-ratelimit-configuration-guide)
[Fix Fix Argocd App Not Syncing Issue in Kubernetes](fix-argocd-app-not-syncing)
[Fix Fix Argocd Sync Conflict Issue in Kubernetes](fix-argocd-sync-conflict)
[Fix ArgoCD Sync Timeout](fix-argocd-sync-timeout)
[How to Fix Cilium Identity Exhaustion and Endpoint Allocation Failed](fix-cilium-identity-exhaustion)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

Kubernetes Node Disk Pressure Kubernetes Node Disk Pressure Kubernetes Kubernetes Node Disk Pressure troubleshooting Kubernetes Node Disk Pressure fix Resolve Kubernetes node disk pressure by cleaning up unused resources, expanding storage, and configuring eviction thresholds Kubernetes Resolve Kubernetes node disk pressure by cleaning up unused resources, expanding storage, and configuring eviction thresholds

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Kubernetes Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this kubernetes-errors troubleshooting guide applies to my situation?

This guide is designed for kubernetes-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these kubernetes-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of kubernetes-errors issue?

Most kubernetes-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this kubernetes-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for kubernetes-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Apr 5, 2026

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation