Your Kubernetes Job was supposed to run a task and complete, but it's stuck. The status shows the job is running but never reaches completion, or pods are failing and retrying indefinitely. Jobs are designed for finite tasks, but when they don't complete, you need to diagnose whether it's a pod failure, configuration issue, or resource constraint.
Introduction
This article covers troubleshooting steps and solutions for Fix Kubernetes Job Not Completing. The error typically occurs in production environments and can cause service disruptions if not addressed promptly.
Symptoms
Common error messages include:
```bash # Get job status kubectl get jobs -n namespace kubectl describe job job-name -n namespace
# Check job conditions kubectl get job job-name -n namespace -o jsonpath='{.status.conditions}' kubectl get job job-name -n namespace -o yaml | grep -A 20 status
# Check completion status kubectl get job job-name -n namespace -o jsonpath='{.status.succeeded}' kubectl get job job-name -n namespace -o jsonpath='{.status.failed}' ```
```bash # Get pods created by job kubectl get pods -n namespace -l job-name=job-name
# Check pod status kubectl describe pod job-pod -n namespace
# Check pod logs kubectl logs job-pod -n namespace
# Check previous pod logs (if pod restarted) kubectl logs job-pod -n namespace --previous ```
```bash # Check job spec kubectl get job job-name -n namespace -o yaml | grep -A 30 spec
# Check completions and parallelism kubectl get job job-name -n namespace -o jsonpath='{.spec.completions}' kubectl get job job-name -n namespace -o jsonpath='{.spec.parallelism}' kubectl get job job-name -n namespace -o jsonpath='{.spec.backoffLimit}' ```
Common Causes
- Configuration misconfiguration
- Missing or incorrect credentials
- Network connectivity issues
- Version compatibility problems
- Resource exhaustion or limits
- Permission or access denied
Understanding Job Completion
Jobs create pods to perform a task and track completion. A Job is complete when the specified number of pods successfully terminate (completions). Jobs can run multiple pods (parallelism) and retry failed pods (backoffLimit). Understanding these parameters helps diagnose why a job isn't completing.
Job completion requires: pods must start successfully, pods must complete their task without error, and enough pods must complete to meet the completions count.
Step-by-Step Fix
Check Job status:
```bash # Get job status kubectl get jobs -n namespace kubectl describe job job-name -n namespace
# Check job conditions kubectl get job job-name -n namespace -o jsonpath='{.status.conditions}' kubectl get job job-name -n namespace -o yaml | grep -A 20 status
# Check completion status kubectl get job job-name -n namespace -o jsonpath='{.status.succeeded}' kubectl get job job-name -n namespace -o jsonpath='{.status.failed}' ```
Check Job pods:
```bash # Get pods created by job kubectl get pods -n namespace -l job-name=job-name
# Check pod status kubectl describe pod job-pod -n namespace
# Check pod logs kubectl logs job-pod -n namespace
# Check previous pod logs (if pod restarted) kubectl logs job-pod -n namespace --previous ```
Check Job configuration:
```bash # Check job spec kubectl get job job-name -n namespace -o yaml | grep -A 30 spec
# Check completions and parallelism kubectl get job job-name -n namespace -o jsonpath='{.spec.completions}' kubectl get job job-name -n namespace -o jsonpath='{.spec.parallelism}' kubectl get job job-name -n namespace -o jsonpath='{.spec.backoffLimit}' ```
Common Solutions
Solution 1: Fix Pod Failing to Start
Job pods might fail to start due to image or resource issues:
```bash # Check pod status kubectl get pods -n namespace -l job-name=job-name
# Look for ImagePullBackOff, ErrImagePull, Pending kubectl describe pod job-pod -n namespace ```
Fix image issues:
```yaml # Check job image configuration kubectl get job job-name -n namespace -o jsonpath='{.spec.template.spec.containers[*].image}'
# Fix wrong image name kubectl set image job/job-name container-name=correct-image:tag -n namespace ```
Fix resource constraints:
# Add appropriate resource requests
spec:
template:
spec:
containers:
- name: task
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"Solution 2: Fix Pod Task Failure
Pod starts but the task fails:
```bash # Check pod logs for error kubectl logs job-pod -n namespace
# Check pod exit code kubectl get pod job-pod -n namespace -o jsonpath='{.status.containerStatuses[*].state.terminated.exitCode}'
# Exit code 0 = success, non-zero = failure ```
Fix task errors:
```bash # Identify what's failing in the task kubectl logs job-pod -n namespace
# Common issues: # - Missing environment variables # - Missing config files # - Permission errors # - Dependency unavailable ```
Add error handling to job command:
spec:
template:
spec:
containers:
- name: task
command: ["sh", "-c", "your-command && echo 'Success' || echo 'Failed: $?' && exit 1"]Solution 3: Fix BackoffLimit Exceeded
Job has retry limit that might be exceeded:
```bash # Check backoff limit kubectl get job job-name -n namespace -o jsonpath='{.spec.backoffLimit}' # Default is 6
# Check if backoffLimit exceeded kubectl describe job job-name -n namespace | grep -A 5 "BackoffLimitExceeded" ```
Increase backoffLimit:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job
spec:
backoffLimit: 10 # Increase retries
template:
spec:
containers:
- name: task
image: myimageCheck job status for backoff:
# Look for "BackoffLimitExceeded" condition
kubectl get job job-name -n namespace -o yaml | grep -A 10 conditionsSolution 4: Fix ActiveDeadline Exceeded
Job has maximum runtime limit:
# Check activeDeadlineSeconds
kubectl get job job-name -n namespace -o jsonpath='{.spec.activeDeadlineSeconds}'Fix timeout issues:
```yaml spec: activeDeadlineSeconds: 3600 # 1 hour max runtime
# Increase if task takes longer spec: activeDeadlineSeconds: 7200 # 2 hours ```
Solution 5: Fix Completions Count Issues
Jobs with multiple completions need each pod to succeed:
```bash # Check completions required kubectl get job job-name -n namespace -o jsonpath='{.spec.completions}'
# Check succeeded count kubectl get job job-name -n namespace -o jsonpath='{.status.succeeded}' ```
For indexed jobs (each pod has a work item):
spec:
completions: 10 # Need 10 pods to complete
parallelism: 3 # Run 3 at a time
completionMode: Indexed # Each pod gets index 0-9Fix completions tracking:
```bash # Check job progress kubectl describe job job-name -n namespace | grep -A 5 "Status"
# Verify pods are completing successfully kubectl get pods -n namespace -l job-name=job-name ```
Solution 6: Fix Parallelism Issues
Parallelism affects how pods run simultaneously:
# Check parallelism
kubectl get job job-name -n namespace -o jsonpath='{.spec.parallelism}'Adjust parallelism:
```yaml spec: completions: 10 parallelism: 5 # Run 5 pods simultaneously
# For single pod job spec: completions: 1 parallelism: 1
# For non-indexed job (any pod can complete any work) spec: completions: 10 parallelism: 5 completionMode: NonIndexed ```
Solution 7: Fix Restart Policy
Job pods must have appropriate restart policy:
# Check restart policy
kubectl get job job-name -n namespace -o jsonpath='{.spec.template.spec.restartPolicy}'Job pods only allow OnFailure or Never:
spec:
template:
spec:
restartPolicy: OnFailure # Pod restarts on failure
# Or:
restartPolicy: Never # New pod created on failureSolution 8: Fix Init Container Failure
Init containers blocking pod start:
```bash # Check init container status kubectl get pod job-pod -n namespace -o jsonpath='{.status.initContainerStatuses}'
# Check init container logs kubectl logs job-pod -n namespace -c init-container-name ```
Fix init container issues:
spec:
template:
spec:
initContainers:
- name: setup
image: busybox
command: ["sh", "-c", "setup-command"]
# Add timeout or error handlingSolution 9: Delete and Recreate Job
Sometimes job is stuck and needs recreation:
```bash # Delete stuck job kubectl delete job job-name -n namespace
# Recreate job kubectl apply -f job.yaml
# Or create new job from old spec kubectl get job job-name -n namespace -o yaml > job-backup.yaml kubectl delete job job-name -n namespace # Edit backup.yaml (remove status, update metadata) kubectl apply -f job-backup.yaml ```
Solution 10: Check for Resource Quota Blocking
Namespace quota might prevent pod creation:
```bash # Check resource quota kubectl get resourcequota -n namespace kubectl describe resourcequota quota-name -n namespace
# Check if pods are blocked kubectl get events -n namespace | grep -i "quota|exceeded" ```
Fix quota or job resources:
# Reduce job resource requests if quota tight
spec:
template:
spec:
containers:
- name: task
resources:
requests:
cpu: "50m" # Lower request
memory: "64Mi"Solution 11: Fix Job Dependencies
Job might depend on unavailable resources:
```bash # Check job environment kubectl describe pod job-pod -n namespace
# Check if job needs: # - ConfigMap that doesn't exist # - Secret that doesn't exist # - Service that's unavailable # - PVC that's not bound ```
Fix dependencies:
```bash # Check referenced ConfigMaps/Secrets kubectl get job job-name -n namespace -o yaml | grep -A 5 configMapRef|secretRef
# Verify they exist kubectl get configmap config-name -n namespace kubectl get secret secret-name -n namespace ```
Solution 12: Monitor Job Progress
Watch job status:
```bash # Watch job status kubectl get job job-name -n namespace -w
# Watch pods kubectl get pods -n namespace -l job-name=job-name -w
# Check job events kubectl get events -n namespace --field-selector involvedObject.name=job-name ```
Verification
After fixing Job issues:
```bash # Check job completed kubectl get job job-name -n namespace
# Verify completion condition kubectl get job job-name -n namespace -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
# Check succeeded pods kubectl get job job-name -n namespace -o jsonpath='{.status.succeeded}'
# Verify no failed pods beyond backoffLimit kubectl describe job job-name -n namespace | grep -A 5 "Failed" ```
Job Completion Status
# Quick job status check
kubectl get job job-name -n namespace -o custom-columns='NAME:.metadata.name,COMPLETIONS:.spec.completions,PARALLELISM:.spec.parallelism,SUCCEEDED:.status.succeeded,FAILED:.status.failed,ACTIVE:.status.active'Job Not Completing Causes Summary
| Cause | Check | Solution |
|---|---|---|
| Pod image error | kubectl describe pod | Fix image name or registry |
| Pod task fails | kubectl logs pod | Fix task command or dependencies |
| BackoffLimit exceeded | kubectl describe job | Increase backoffLimit |
| ActiveDeadline exceeded | kubectl get job -o yaml | Increase activeDeadlineSeconds |
| Resource quota blocking | kubectl get quota | Reduce requests or increase quota |
| Wrong restart policy | kubectl get job -o yaml | Use OnFailure or Never |
| Init container fails | kubectl logs -c init | Fix init container |
| Missing ConfigMap/Secret | kubectl describe pod | Create missing resources |
| PVC not bound | kubectl get pvc | Fix PVC configuration |
| Completions not reached | Check succeeded count | Fix pods or adjust completions |
Prevention
Set appropriate backoffLimit for expected retries. Use activeDeadlineSeconds to prevent runaway jobs. Set proper resource requests for job pods. Test job commands locally before deploying. Use meaningful labels for job tracking. Implement proper error handling in job scripts. Monitor job completion with alerts.
Job not completing usually means the pods are failing to start or the task inside the pod is failing. Check pod status first, then pod logs - these will tell you whether it's an infrastructure issue or a task execution issue.
Related Articles
- [Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit](envoyproxy-ratelimit-configuration-guide)
- [Fix Fix Argocd App Not Syncing Issue in Kubernetes](fix-argocd-app-not-syncing)
- [Fix Fix Argocd Sync Conflict Issue in Kubernetes](fix-argocd-sync-conflict)
- [Fix ArgoCD Sync Timeout](fix-argocd-sync-timeout)
- [How to Fix Cilium Identity Exhaustion and Endpoint Allocation Failed](fix-cilium-identity-exhaustion)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Kubernetes Job Not Completing", "description": "Learn how to diagnose and fix Kubernetes Jobs not completing with solutions for pod failures, retry limits, and job configuration issues.", "url": "https://www.fixwikihub.com/fix-kubernetes-job-not-completing", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-11-27T00:00:03.470Z", "dateModified": "2025-11-27T00:00:03.470Z" } </script>