Introduction
As infrastructure scales from dozens to hundreds or thousands of hosts, Ansible playbook execution time can grow from minutes to hours. A playbook that completes in 2 minutes on 10 hosts might take 4 hours on 500 hosts if not properly optimized. The bottleneck is rarely the tasks themselves but rather the parallelism configuration, fact gathering overhead, SSH connection management, and execution strategy choices.
Slow execution impacts deployment windows, incident response times, and team productivity. Understanding and tuning Ansible's parallelism controls is essential for production-scale automation.
Symptoms
Playbooks take excessively long to complete:
```bash $ time ansible-playbook deploy.yml -i production
PLAY [Deploy application] *********** # Each task takes minutes as hosts process sequentially
TASK [Update packages] ************** Tuesday 14:00:00 - changed: [server-001] Tuesday 14:01:30 - changed: [server-002] Tuesday 14:03:00 - changed: [server-003] # ... continuing one at a time ...
real 4h32m15s user 0m45.123s sys 0m12.456s ```
Fact gathering dominates execution time:
``` TASK [Gather Facts] ************ ok: [web-server-001] ok: [web-server-002] # 10 minutes of fact gathering for 100 hosts
PLAY RECAP ***************** web-server-001 : ok=5 changed=2 unreachable=0 failed=0 # Total time: 45 minutes, of which 30 minutes was fact gathering ```
Low CPU utilization on control node:
# During playbook run
$ top -p $(pgrep ansible)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12345 admin 20 0 245624 45124 12345 S 5.2 1.2 145:23.45 ansible-playbook
# Only 5% CPU - control node is mostly idle waiting for hostsTower job queue backing up:
Tower Dashboard:
Running Jobs: 5 (each taking 2+ hours)
Pending Jobs: 47
Average Job Duration: 2h 15mConnection timeouts during large-scale runs:
TASK [Deploy application] *******************************************************
fatal: [server-150]: FAILED! => {"msg": "Failed to connect to the host via ssh: Connection timed out during SSH handshake"}Common Causes
1. Default Forks Too Low
The default forks = 5 means only 5 hosts execute simultaneously:
# With 100 hosts and forks=5:
# Each task batch processes 5 hosts at a time
# 100 hosts / 5 forks = 20 sequential batches
# If each batch takes 30 seconds: 20 * 30s = 10 minutes per task2. Fact Gathering Not Cached
Every play gathers facts from scratch:
```yaml - name: Play 1 hosts: all # Implicit: gather_facts: yes (default) # Gathers facts from ALL hosts
- name: Play 2
- hosts: all
- # Gathers facts AGAIN from ALL hosts
`
3. SSH Connection Overhead
Each task creates a new SSH connection by default:
``` # Without pipelining: # 1. SSH connect -> 2. Transfer module -> 3. Execute -> 4. Return result -> 5. Disconnect # Overhead: 1-3 seconds per host per task
# With 100 hosts and 20 tasks: # 100 * 20 * 2s = 4000s = 66 minutes of overhead alone ```
4. Serial Execution Limiting Parallelism
Using serial: 1 or low serial values:
- name: Rolling deploy
hosts: webservers
serial: 1 # Only 1 host at a time!
# For 50 hosts, this means 50 sequential deployments5. Strategy Plugin Overhead
The linear strategy waits for all hosts before proceeding:
- name: Deploy
hosts: all
strategy: linear # Default - waits for slowest host
# If one host is slow, all others wait6. Callback Plugin Overhead
Some callbacks add significant overhead:
# Heavy callbacks
callbacks_enabled = ansible.builtin.profile_tasks, ansible.builtin.timer, my_custom_logging
# Each callback processes every eventStep-by-Step Fix
Step 1: Diagnose Performance Bottlenecks
Measure where time is spent:
```bash # Enable profiling export ANSIBLE_CALLBACKS_ENABLED=profile_tasks,profile_roles
ansible-playbook deploy.yml
# Output shows time per task: # Tuesday 14:30:00 - TASK: Gather Facts (0:02:15.123) # Tuesday 14:32:15 - TASK: Install packages (0:05:30.456) # Tuesday 14:37:45 - TASK: Configure app (0:01:45.789) ```
Time the playbook components:
```bash # Time fact gathering only time ansible all -m setup
# Time a single task time ansible all -m ping
# Check SSH connection overhead time ansible all -m command -a "echo test" ```
Monitor control node resources:
# Watch during playbook run
watch -n 1 "ps aux | grep ansible; echo '---'; free -h; echo '---'; uptime"Step 2: Increase Forks Configuration
Set appropriate forks for your environment:
```bash # Run with more forks ansible-playbook deploy.yml -f 50
# Or configure permanently in ansible.cfg ```
```ini # ansible.cfg [defaults] # Set forks based on your infrastructure # Rule of thumb: forks = number of hosts / 10, minimum 10, maximum 100-200 forks = 50
# For very large runs (1000+ hosts) # forks = 100
# Consider control node resources: # Each fork uses ~10-30MB memory # 100 forks = ~1-3GB additional memory ```
Calculate optimal forks:
```bash #!/bin/bash # calculate_forks.sh
HOST_COUNT=$(ansible all --list-hosts | wc -l) CONTROL_MEMORY_GB=$(free -g | awk '/^Mem:/{print $2}')
# Estimate: 20MB per fork, leave 50% memory for other processes MAX_FORKS_BY_MEMORY=$((CONTROL_MEMORY_GB * 1024 / 20 / 2))
# Estimate: forks = host_count / 10 SUGGESTED_FORKS=$((HOST_COUNT / 10))
# Cap at reasonable maximum FINAL_FORKS=$((SUGGESTED_FORKS < MAX_FORKS_BY_MEMORY ? SUGGESTED_FORKS : MAX_FORKS_BY_MEMORY)) FINAL_FORKS=$((FINAL_FORKS > 100 ? 100 : FINAL_FORKS)) FINAL_FORKS=$((FINAL_FORKS < 10 ? 10 : FINAL_FORKS))
echo "Host count: $HOST_COUNT" echo "Control node memory: ${CONTROL_MEMORY_GB}GB" echo "Suggested forks: $FINAL_FORKS" ```
Step 3: Enable SSH Pipelining
Reduce SSH connection overhead:
```ini # ansible.cfg [defaults] pipelining = True
[ssh_connection] pipelining = True ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=/tmp/ansible-ssh-%h-%p-%r pipelining = True
# For faster SSH connections scp_if_ssh = smart transfer_method = smart ```
Verify pipelining is working:
```bash # Run with verbose SSH output ANSIBLE_DEBUG=1 ansible all -m ping 2>&1 | grep -i pipeline
# Compare performance time ansible all -m ping # With pipelining time ANSIBLE_PIPELINING=False ansible all -m ping # Without pipelining ```
Step 4: Configure Fact Caching
Cache facts to avoid repeated gathering:
```ini # ansible.cfg [defaults] gathering = smart fact_caching = jsonfile fact_caching_connection = /var/cache/ansible/facts fact_caching_timeout = 86400 # 24 hours
# Or use Redis for distributed caching # fact_caching = redis # fact_caching_connection = localhost:6379:0 ```
Create the cache directory:
sudo mkdir -p /var/cache/ansible/facts
sudo chown $USER:$USER /var/cache/ansible/factsUse selective fact gathering:
```yaml # playbook.yml - name: First play - gather facts once hosts: all gather_facts: yes tasks: - name: Cache facts set_fact: facts_cached: true delegate_to: localhost delegate_facts: true
- name: Second play - use cached facts
- hosts: all
- gather_facts: no # Don't re-gather
- tasks:
- - name: Use cached fact
- debug:
- msg: "IP is {{ ansible_default_ipv4.address }}"
`
Clear cache when needed:
```bash # Clear all cached facts ansible all -m meta -a "clear_facts=true"
# Clear cache files rm -rf /var/cache/ansible/facts/* ```
Step 5: Optimize Strategy Configuration
Use the free strategy for faster execution:
```yaml # playbook.yml - name: Fast parallel deployment hosts: all strategy: free # Don't wait for other hosts tasks: - name: Task 1 # Hosts proceed to Task 2 as soon as they finish Task 1 # Instead of waiting for all hosts to finish Task 1
- name: Task 2
- # Some hosts may start this while others are still on Task 1
`
```ini # ansible.cfg [defaults] # Strategy for faster execution strategy = free
# For ordered execution with some parallelism # strategy = linear (default) ```
For rolling updates with proper parallelism:
```yaml - name: Rolling deployment hosts: webservers serial: "20%" # Process 20% of hosts at a time # For 100 hosts: 20 at a time # Better than serial: 1
tasks: - name: Deploy # ... ```
Step 6: Optimize Task Execution
Reduce unnecessary operations:
```yaml # Bad: Runs on every host every time - name: Install package yum: name: nginx state: present # Always checks/installs
# Better: Conditional execution - name: Install package yum: name: nginx state: present when: "'nginx' not in ansible_facts.packages" # Only runs if not already installed
# Best: Use check mode in CI, then apply - name: Check package status command: rpm -q nginx register: nginx_check changed_when: false failed_when: false
- name: Install package
- yum:
- name: nginx
- state: present
- when: nginx_check.rc != 0
`
Use async for long-running tasks:
```yaml - name: Long-running update yum: name: "*" state: latest async: 3600 # 1 hour timeout poll: 0 # Don't wait, move on register: update_async
- name: Check update status later
- async_status:
- jid: "{{ update_async.ansible_job_id }}"
- register: update_result
- until: update_result.finished
- retries: 120
- delay: 30
`
Step 7: Configure Tower/AWX Performance
For Tower environments:
```python # /etc/tower/settings.py # Increase parallel job capacity AWX_TASK_ENV = { 'ANSIBLE_FORKS': 50, 'ANSIBLE_PIPELINING': 'True', 'ANSIBLE_GATHERING': 'smart', 'ANSIBLE_FACT_CACHING': 'jsonfile', 'ANSIBLE_FACT_CACHING_CONNECTION': '/var/lib/awx/facts_cache', }
# Configure instance capacity CLUSTER_HOST_CAPACITY = 100 # Max concurrent forks per instance ```
For Kubernetes AWX:
# awx-deployment.yaml
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
spec:
task_env:
- name: ANSIBLE_FORKS
value: "50"
- name: ANSIBLE_PIPELINING
value: "True"Verification
Test performance improvements:
```bash # Run with timing time ansible-playbook deploy.yml -f 50
# Compare before/after: # Before (forks=5): 45 minutes # After (forks=50): 8 minutes
# Check fact cache is working ls -la /var/cache/ansible/facts/ # Should see files for each host
# Run again to see cache benefit time ansible-playbook deploy.yml -f 50 # Second run should be faster due to cached facts
# Verify pipelining with debug ANSIBLE_DEBUG=1 ansible all -m ping 2>&1 | grep -c "Pipelining" # Should show pipelining is enabled ```
Monitor resource usage during execution:
```bash # Watch control node during playbook run watch -n 1 "ps aux | grep ansible-playbook | head -1; echo '---'; uptime"
# Expected: Higher CPU usage (efficient), stable memory ```
Related Issues
- [ansible-ssh-unreachable-host-key-verification-failed](/articles/ansible-ssh-unreachable-host-key-verification-failed) - SSH connection issues
- [ansible-inventory-dynamic-cloud-source-failed](/articles/ansible-inventory-dynamic-cloud-source-failed) - Inventory performance
- [ansible-handler-not-triggered-notify-missing](/articles/ansible-handler-not-triggered-notify-missing) - Handler execution optimization
Related Articles
- [WordPress troubleshooting: Ansible Artifact Download Uses an Old Mi](ansible-artifact-download-uses-an-old-mirror-after-proxy-change)
- [WordPress troubleshooting: Ansible Audit Trail Misses Events Under ](ansible-audit-trail-misses-events-under-burst-load)
- [WordPress troubleshooting: Ansible Background Worker Gets Stuck in ](ansible-background-worker-stuck-in-a-retry-loop)
- [WordPress troubleshooting: Ansible Backup Completes but Restore Fai](ansible-backup-completes-but-restore-fails-checksum-validation)
- [WordPress troubleshooting: Ansible Batch Importer Duplicates Rows A](ansible-batch-importer-duplicates-rows-after-a-retry)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "WordPress troubleshooting: Ansible Execution Too Slow - Parallelism", "description": "Learn how to fix Ansible Execution Too Slow - Parallelism and Forks. Professional WordPress troubleshooting solutions with step-by-step guidance. WP error fix, WordPress optimization, WP security, WordPress performance.", "url": "https://www.fixwikihub.com/ansible-forks-too-slow-parallel-execution", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-12-15T09:25:13.665Z", "dateModified": "2025-12-15T09:25:13.665Z" } </script>