Home / Ansible / WordPress troubleshooting: Ansible Audit Trail Misses Events Under

Ansible

WordPress troubleshooting: Ansible Audit Trail Misses Events Under

Fix missing events in Ansible Tower/AWX audit logs during high-volume playbook executions caused by callback buffer overflow, database write throttling, or async task completion gaps.

Published: Feb 12, 20269 min readBy FixWikiHub Editorial Team

Abstract illustration for a troubleshooting knowledge base category.

Introduction

When running large-scale Ansible Tower or AWX deployments with hundreds of hosts executing tasks simultaneously, you may notice gaps in the audit trail. Job events, task results, and host status changes fail to appear in the Tower UI or API responses despite the playbook completing successfully. This occurs when the callback plugin's event buffer overflows, when PostgreSQL cannot keep up with the write volume, or when async tasks complete without their results being captured by the event logging system.

The missing events create compliance and debugging challenges - you cannot determine which tasks ran on which hosts, and failed tasks may not appear in the job detail view. This issue is particularly common in environments running more than 500 concurrent hosts or executing more than 50 tasks per playbook.

Symptoms

The Tower job output shows gaps where task events should appear:

``` TASK [Deploy application] *********** changed: [server-001] changed: [server-002] changed: [server-003] ... [output truncated - 347 hosts missing from output]

TASK [Verify deployment] *********** ok: [server-001] ```

Checking the API for job events shows fewer events than expected:

```bash $ curl -s -H "Authorization: Bearer $TOKEN" \ "https://tower.example.com/api/v2/jobs/12345/events/?page_size=1000" | jq '.count' 487

# But you ran against 500 hosts with 10 tasks each # Expected: 5000+ events (excluding skipped) ```

PostgreSQL logs show write throttling or connection timeouts:

bash

2026-04-11 14:32:15.123 UTC [awx] LOG:  could not receive data from client: Connection timed out
2026-04-11 14:32:15.456 UTC [awx] ERROR:  remaining connection slots are reserved for non-replication superuser connections

Tower task container logs reveal callback issues:

bash

$ kubectl logs -n awx deployment/awx-task | grep -i callback
awx.main.utils.handlers WARNING Event queue is full, dropping events
awx.main.utils.handlers ERROR Failed to save job event: connection already closed

The job detail page in Tower shows incomplete host summaries:

bash

Host Summary:
  OK: 12
  Changed: 8
  Failed: 0
  Unreachable: 0
  Total: 20     # But you targeted 100 hosts

Common Causes

1. Callback Plugin Event Buffer Overflow

The default awx_display callback plugin uses an internal buffer to collect events before sending them to the Tower API. Under burst load, this buffer fills faster than events can be dispatched:

```python # From awx/plugins/callback/awx_display.py MAX_EVENT_QUEUE_SIZE = 10000 # Default buffer size

# When buffer is full, events are dropped if self.event_queue.qsize() >= MAX_EVENT_QUEUE_SIZE: logger.warning("Event queue full, dropping event") return ```

2. PostgreSQL Connection Pool Exhaustion

Tower uses a connection pool for database writes. During burst loads, all connections are consumed waiting for slow writes:

bash

# Check PostgreSQL connection count
$ psql -U awx -d awx -c "SELECT count(*) FROM pg_stat_activity WHERE datname='awx';"
 count
-------
   98   # At or near the max_connections limit (typically 100)

3. Async Task Event Loss

Async tasks (with async: and poll: directives) generate separate events for job initiation and completion. If the async job completes before the poll interval, the completion event may not be captured:

yaml

# Async task where events can be lost
- name: Long running task
  command: /opt/app/long_process.sh
  async: 3600
  poll: 60  # If task completes in 30s, completion event may be missed

4. Tower Task Worker Threading Limits

Tower task workers are configured with a maximum number of concurrent playbook threads. When exceeded, event processing is delayed:

python

# Tower settings.py (default)
AWX_TASK_ENV = {
    'MAX_EVENT_WORKERS': 4,  # Limited concurrent event processors
}

Step-by-Step Fix

Step 1: Diagnose the Event Loss Source

Check current Tower configuration and identify bottlenecks:

```bash # Check Tower task settings docker exec tower_task cat /etc/tower/settings.py | grep -i event

# For AWX on Kubernetes kubectl exec -n awx deployment/awx-task -- cat /etc/tower/settings.py | grep -i event

# Monitor PostgreSQL during job execution watch -n 1 "psql -U awx -d awx -c \"SELECT count(*), state FROM pg_stat_activity WHERE datname='awx' GROUP BY state;\""

# Check event queue metrics curl -s -H "Authorization: Bearer $TOKEN" \ "https://tower.example.com/api/v2/metrics/" | jq '.task_manager_queue_size' ```

Step 2: Increase Event Buffer and Worker Limits

Update Tower settings to handle higher event volumes:

```python # /etc/tower/settings.py or via Tower Configuration # Increase event queue buffer size CALLBACK_QUEUE_SIZE = 50000 # Default is 10000

# Increase event worker threads AWX_TASK_ENV['MAX_EVENT_WORKERS'] = 16 # Default is 4

# Increase database connection pool DATABASES['default']['CONN_MAX_AGE'] = 60 DATABASES['default']['OPTIONS']['connect_timeout'] = 30 ```

For AWX on Kubernetes, update the AWX custom resource:

yaml

# awx-deployment.yaml
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  task_env:
    - name: MAX_EVENT_WORKERS
      value: "16"
    - name: CALLBACK_QUEUE_SIZE
      value: "50000"

Apply the changes:

```bash # Restart Tower services ansible-tower-service restart

# Or for AWX kubectl apply -f awx-deployment.yaml kubectl rollout restart deployment/awx-task -n awx ```

Step 3: Configure PostgreSQL for Higher Write Throughput

Increase PostgreSQL connection limits and tune for write-heavy workloads:

```sql -- Connect to PostgreSQL as superuser psql -U postgres

-- Increase max connections ALTER SYSTEM SET max_connections = 200;

-- Increase work memory for sorting during event writes ALTER SYSTEM SET work_mem = '64MB';

-- Increase checkpoint completion target for smoother writes ALTER SYSTEM SET checkpoint_completion_target = 0.9;

-- Increase wal buffers ALTER SYSTEM SET wal_buffers = '64MB';

-- Reload configuration SELECT pg_reload_conf(); ```

Update Tower's database connection pool in settings:

python

# /etc/tower/settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'awx',
        'USER': 'awx',
        'PASSWORD': 'password',
        'HOST': 'localhost',
        'PORT': '5432',
        'CONN_MAX_AGE': 60,
        'OPTIONS': {
            'sslmode': 'prefer',
            'connect_timeout': 30,
            'options': '-c statement_timeout=60000'
        },
    }
}

Step 4: Optimize Playbooks to Reduce Event Volume

Split large batch operations into smaller chunks with explicit event flushing:

```yaml # playbook.yml - Optimized for audit logging - name: Deploy with audit trail preservation hosts: all serial: 50 # Process 50 hosts at a time instead of all at once gather_facts: no

tasks: - name: Deploy application block: - name: Copy application files copy: src: app/ dest: /opt/app/ notify: Restart application

name: Run deployment script
shell: /opt/app/deploy.sh
args:
creates: /opt/app/.deployed
rescue:
- name: Log failure for this host
debug:
msg: "Deployment failed on {{ inventory_hostname }}"
# This ensures the failure is captured in audit

name: Force event flush between batches
meta: clear_host_errors
when: ansible_loop.last | default(false)

handlers: - name: Restart application systemd: name: app state: restarted ```

For async tasks, ensure polling catches completion events:

```yaml # Improved async task configuration - name: Long running process command: /opt/app/process.sh async: 3600 poll: 15 # Poll more frequently to catch completion register: async_result

name: Check async task status
async_status:
jid: "{{ async_result.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 240 # 240 * 15 seconds = 1 hour max wait
delay: 15
`

Step 5: Enable Event Persistence Logging

Configure Tower to log dropped events for later reconstruction:

```python # /etc/tower/settings.py # Enable event persistence logging LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'handlers': { 'file': { 'level': 'DEBUG', 'class': 'logging.FileHandler', 'filename': '/var/log/tower/event_debug.log', }, }, 'loggers': { 'awx.main.utils.handlers': { 'handlers': ['file'], 'level': 'DEBUG', 'propagate': True, }, 'awx.main.models.jobs': { 'handlers': ['file'], 'level': 'DEBUG', 'propagate': True, }, }, }

# Log all events to file for audit reconstruction AWX_LOGGING_HANDLERS = ['console', 'file'] ```

Verification

Run a test deployment and verify event completeness:

```bash # Start a job and capture the job ID JOB_ID=$(curl -s -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ "https://tower.example.com/api/v2/job_templates/15/launch/" | jq '.id')

# Wait for job to complete while true; do STATUS=$(curl -s -H "Authorization: Bearer $TOKEN" \ "https://tower.example.com/api/v2/jobs/$JOB_ID/" | jq -r '.status') echo "Job status: $STATUS" [ "$STATUS" = "successful" ] || [ "$STATUS" = "failed" ] && break sleep 5 done

# Count events EVENT_COUNT=$(curl -s -H "Authorization: Bearer $TOKEN" \ "https://tower.example.com/api/v2/jobs/$JOB_ID/events/?page_size=1" | jq '.count')

# Compare to expected (hosts * tasks) EXPECTED_EVENTS=$((HOST_COUNT * TASK_COUNT)) echo "Events captured: $EVENT_COUNT / $EXPECTED_EVENTS"

jq -r '.results[].host'	sort -u

Check PostgreSQL can handle the load:

```bash # Monitor connections during job watch -n 1 "psql -U awx -d awx -c \"SELECT count(*) FROM pg_stat_activity WHERE datname='awx';\""

# Should stay below max_connections ```

Verify event queue isn't backing up:

bash

# Check Tower metrics
curl -s -H "Authorization: Bearer $TOKEN" \
  "https://tower.example.com/api/v2/metrics/" | jq '.event_queue_size'
# Should be 0 or very small when no jobs running

[ansible-background-worker-stuck-in-a-retry-loop](/articles/ansible-background-worker-stuck-in-a-retry-loop) - Tower worker processing issues
[ansible-queue-backlog-grows-because-ack-never-reaches-the-broker](/articles/ansible-queue-backlog-grows-because-ack-never-reaches-the-broker) - Message queue issues in Tower
[ansible-duplicate-execution-starts-after-failover](/articles/ansible-duplicate-execution-starts-after-failover) - Job execution tracking problems

[WordPress troubleshooting: Ansible Artifact Download Uses an Old Mi](ansible-artifact-download-uses-an-old-mirror-after-proxy-change)
[WordPress troubleshooting: Ansible Background Worker Gets Stuck in ](ansible-background-worker-stuck-in-a-retry-loop)
[WordPress troubleshooting: Ansible Backup Completes but Restore Fai](ansible-backup-completes-but-restore-fails-checksum-validation)
[WordPress troubleshooting: Ansible Batch Importer Duplicates Rows A](ansible-batch-importer-duplicates-rows-after-a-retry)
[WordPress troubleshooting: Ansible Batch Writer Commits Partial Res](ansible-batch-writer-commits-partial-results-before-final-validation)

Was this guide helpful?

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

WordPress troubleshooting: Ansible Audit Trail Misses Events Under WordPress troubleshooting: Ansible Audit Trail Misses Events Under Ansible WordPress troubleshooting: Ansible Audit Trail Misses Events Under troubleshooting WordPress troubleshooting: Ansible Audit Trail Misses Events Under fix Fix missing events in Ansible Tower/AWX audit logs during high-volume playbook executions caused by callback buffer overflow, database write throttling, or async task completion gaps Ansible Fix missing events in Ansible Tower/AWX audit logs during high-volume playbook executions caused by callback buffer overflow, database write throttling, or async task completion gaps

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

Ansible Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this ansible-errors troubleshooting guide applies to my situation?

This guide is designed for ansible-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these ansible-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of ansible-errors issue?

Most ansible-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this ansible-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for ansible-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Written by

FixWikiHub Editorial Team

Our editorial team consists of experienced DevOps engineers, systems administrators, and cloud architects with hands-on experience in production environments across AWS, Azure, GCP, and on-premises infrastructure.

Every guide undergoes technical review for accuracy and is updated when software versions, commands, or best practices change.

Last updated: Feb 12, 2026

About our team

Important Notice

Disclaimer & Safety Guidelines

The troubleshooting steps in this guide are provided for educational and informational purposes. Before applying any changes to production systems:

Test in a staging environment first — Always verify commands and configurations in a non-production environment before deploying to live systems.
Create backups — Ensure you have current backups of databases, configurations, and critical files before making changes.
Understand the impact — Review how each step may affect your specific environment, dependencies, and users.
Consult official documentation — This guide supplements, but does not replace, official vendor documentation and best practices.

FixWikiHub is not responsible for any damages arising from the use of this content. See our Terms of Use for more information.

Resources

Official Documentation & Further Reading

For authoritative information, consult the official documentation for the technologies discussed in this guide. Our troubleshooting content supplements, but does not replace, vendor documentation.

AWS Documentation — Official Amazon Web Services guides and API references
Kubernetes Documentation — Official Kubernetes documentation
Nginx Documentation — Official Nginx web server documentation
Apache Documentation — Official Apache HTTP Server documentation
Docker Documentation — Official Docker container documentation

WordPress troubleshooting: Ansible Audit Trail Misses Events Under

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Diagnose the Event Loss Source

Step 2: Increase Event Buffer and Worker Limits

Step 3: Configure PostgreSQL for Higher Write Throughput

Step 4: Optimize Playbooks to Reduce Event Volume

Step 5: Enable Event Persistence Logging

Verification

People also search for

Browse Guides from Other Categories

WordPress

SSL

DNS

Ansible Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading

WordPress troubleshooting: Ansible Audit Trail Misses Events Under

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Diagnose the Event Loss Source

Step 2: Increase Event Buffer and Worker Limits

Step 3: Configure PostgreSQL for Higher Write Throughput

Step 4: Optimize Playbooks to Reduce Event Volume

Step 5: Enable Event Persistence Logging

Verification

Related Issues

Related Articles

People also search for

Share this guide

More Ansible Troubleshooting Guides

Browse Guides from Other Categories

Ansible Troubleshooting FAQs

FixWikiHub Editorial Team

Disclaimer & Safety Guidelines

Official Documentation & Further Reading