Introduction
Cosmos DB Change Feed provides a persistent log of document changes. When consumer applications process changes slower than they're generated, lag increases, causing delayed data synchronization and stale downstream systems.
Symptoms
Change feed lag increasing:
```bash # Monitor change feed processor metrics # Estimated lag = Current LSN - Last processed LSN
| Metric | Value |
|---|---|
| ChangeFeedProcessor.Lag | 10000+ documents |
| Processing latency | > 5 minutes |
Processor health check warning:
// Change feed processor shows warning
processor.GetEstimatedLag() // Returns high value
processor.GetHealthStatus() // Shows unhealthyLease container contention:
# In lease container logs:
"Lease acquisition failed due to contention"
"Another processor instance already owns this lease"Common Causes
- 1.Consumer throughput insufficient - Can't keep up with change rate
- 2.Lease container throttling - Lease operations hitting RU limits
- 3.Too few processor instances - Not enough parallel processing
- 4.Batch size too small - Processing few documents per batch
- 5.Slow downstream processing - External service bottleneck
- 6.Partition skew - Hot partition with more changes
- 7.Network latency - Cross-region read latency
Step-by-Step Fix
- 1.Check logs for specific error messages
- 2.Verify configuration settings
- 3.Test network connectivity
- 4.Review recent changes
- 5.Apply corrective action
- 6.Verify the fix
Step 1: Monitor Change Feed Metrics
```csharp // Get change feed processor metrics var processor = container.GetChangeFeedProcessorBuilder<dynamic>( "myProcessor", async (changes, cancellationToken) => { // Process changes }) .WithInstanceName("instance-1") .WithLeaseContainer(leaseContainer) .Build();
await processor.StartAsync();
// Check lag var lag = await processor.GetEstimatedLag(); Console.WriteLine($"Estimated lag: {lag}"); ```
Step 2: Check Lease Container RU/s
```bash # Get lease container throughput az cosmosdb sql container show \ --account-name my-cosmos \ --resource-group my-rg \ --database-name lease-db \ --name leases \ --query '{Throughput:resource.throughput}'
# Lease container needs sufficient RU/s # Each processor instance periodically updates its lease # Rule of thumb: 100-500 RU/s per processor instance ```
Step 3: Increase Lease Container RU/s
```bash # Increase lease container throughput az cosmosdb sql container throughput update \ --account-name my-cosmos \ --resource-group my-rg \ --database-name lease-db \ --name leases \ --throughput 1000
# Or enable autoscale az cosmosdb sql container throughput update \ --account-name my-cosmos \ --resource-group my-rg \ --database-name lease-db \ --name leases \ --max-throughput 2000 ```
Step 4: Optimize Batch Size
// Increase batch size for more throughput
var processor = container.GetChangeFeedProcessorBuilder<dynamic>(
"myProcessor",
async (changes, cancellationToken) => {
foreach (var change in changes)
{
await ProcessChangeAsync(change);
}
})
.WithInstanceName("instance-1")
.WithLeaseContainer(leaseContainer)
.WithMaxItems(1000) // Increase from default 100
.WithPollInterval(TimeSpan.FromMilliseconds(100)) // Faster polling
.Build();Step 5: Scale Processor Instances
```csharp // Deploy more processor instances // Each instance handles a subset of partitions // Maximum instances = number of physical partitions
// Run multiple processor instances (same processor name!) // They will automatically distribute partitions among themselves
// Instance 1: var processor1 = container.GetChangeFeedProcessorBuilder<dynamic>( "myProcessor", // Same processor name ProcessChangesAsync) .WithInstanceName("instance-1") // Unique instance name .WithLeaseContainer(leaseContainer) .Build();
// Instance 2: var processor2 = container.GetChangeFeedProcessorBuilder<dynamic>( "myProcessor", // Same processor name ProcessChangesAsync) .WithInstanceName("instance-2") // Different instance name .WithLeaseContainer(leaseContainer) .Build(); ```
Step 6: Optimize Downstream Processing
```csharp // Use batch processing instead of individual items async Task ProcessChangesAsync( IReadOnlyCollection<dynamic> changes, CancellationToken cancellationToken) { // Batch process all changes var batch = new List<WriteModel<BsonDocument>>();
foreach (var change in changes) { batch.Add(new ReplaceOneModel<BsonDocument>( Builders<BsonDocument>.Filter.Eq("_id", change.id), change.ToBsonDocument(), new ReplaceOptions { IsUpsert = true })); }
if (batch.Count > 0) { await mongoCollection.BulkWriteAsync(batch); } } ```
Step 7: Use In-Memory Lease Container (Development)
```csharp // For development/testing, use in-memory lease container // WARNING: Not for production - leases lost on restart
var processor = container.GetChangeFeedProcessorBuilder<dynamic>( "myProcessor", ProcessChangesAsync) .WithInstanceName("instance-1") .WithLeaseContainer(leaseContainer) .WithStartFromBeginning() // Or WithStartTime(DateTime.UtcNow.AddHours(-1)) .Build(); ```
Step 8: Check for Blocked Processing
```csharp // Add timeout to processing async Task ProcessChangesAsync( IReadOnlyCollection<dynamic> changes, CancellationToken cancellationToken) { using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); cts.CancelAfter(TimeSpan.FromMinutes(5)); // Timeout
try { foreach (var change in changes) { cts.Token.ThrowIfCancellationRequested(); await ProcessChangeAsync(change, cts.Token); } } catch (OperationCanceledException) { // Log timeout, don't block processor _logger.LogWarning("Change feed processing timed out"); throw; // Will retry } } ```
Step 9: Monitor Processor Health
```csharp // Implement health check public class ChangeFeedHealthCheck : IHealthCheck { private readonly ChangeFeedProcessor _processor;
public Task<HealthCheckResult> CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { var lag = _processor.GetEstimatedLag().Result;
if (lag < 1000) return Task.FromResult(HealthCheckResult.Healthy($"Lag: {lag}")); else if (lag < 10000) return Task.FromResult(HealthCheckResult.Degraded($"Lag: {lag}")); else return Task.FromResult(HealthCheckResult.Unhealthy($"Lag: {lag}")); } } ```
Step 10: Use Push Model for Real-time
```csharp // For lower latency, use push model (Azure Functions) // Azure Functions Cosmos DB trigger automatically uses change feed
[FunctionName("ProcessChanges")] public static async Task Run( [CosmosDBTrigger( databaseName: "my-db", containerName: "my-container", Connection = "CosmosConnection", LeaseContainerName = "leases", CreateLeaseContainerIfNotExists = true)] IReadOnlyList<dynamic> changes, ILogger log) { foreach (var change in changes) { await ProcessChangeAsync(change); } } ```
Change Feed Processor Configuration
| Setting | Default | Recommendation |
|---|---|---|
| MaxItems | 100 | 500-1000 for throughput |
| PollInterval | 1s | 100-500ms for low latency |
| LeaseAcquireInterval | 13s | Lower for faster scaling |
| LeaseExpirationInterval | 60s | Lower for faster failover |
| LeaseRenewInterval | 17s | Adjust based on workload |
Verification
```bash # After changes, monitor lag # Should see decreasing lag
# Check processor health var health = await processor.GetHealthStatus(); var lag = await processor.GetEstimatedLag();
Console.WriteLine($"Health: {health}, Lag: {lag}");
| ChangeFeedProcessor.Lag | < 1000 |
|---|---|
| Processing latency | < 1 second |
Related Issues
- [Fix Azure Cosmos DB Throttling](/articles/fix-azure-cosmos-db-throttling)
- [Fix Azure Cosmos DB Partition Hotspot](/articles/fix-azure-cosmos-db-partition-hotspot)
- [Fix Azure Cosmos DB SQL Query Slow](/articles/fix-azure-cosmos-db-sql-query-slow)
Related Articles
- [Technical troubleshooting: Fix Azure Aks Pod Crashloopbackoff Issue in Azure](azure-aks-pod-crashloopbackoff)
- [Technical troubleshooting: Fix Azure Api Management Policy Expression Runtime](azure-api-management-policy-expression-runtime-error)
- [Technical troubleshooting: Fix Azure App Configuration Feature Flag Not Refre](azure-app-configuration-feature-flag-not-refreshing)
- [Technical troubleshooting: Fix Azure App Service 503 Always On Disabled Issue](azure-app-service-503-always-on-disabled)
- [Technical troubleshooting: Fix Azure Application Gateway Err SSL Unrecognized](azure-application-gateway-err-ssl-unrecognized-name-alert)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Azure Cosmos DB Change Feed Lag", "description": "Troubleshoot Cosmos DB change feed lag. Optimize consumer throughput, lease container settings, and processor scaling.", "url": "https://www.fixwikihub.com/fix-azure-cosmos-db-change-feed-lag", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-02T12:07:18.357Z", "dateModified": "2026-04-02T12:07:18.357Z" } </script>