Introduction

Azure SQL Database geo-replication asynchronously replicates data from primary to secondary regions. When replication lag increases, the secondary database falls behind the primary, affecting disaster recovery RPO and read-scale workloads.

Symptoms

High replication lag:

```sql -- Query replication link status sqlcmd -S my-server.database.windows.net -d master -U admin -P password -Q " SELECT partner_server, partner_database, state_desc, replication_lag_sec FROM sys.dm_geo_replication_link_status "

# replication_lag_sec shows high value (> 60 seconds) ```

Replication state warning:

```bash $ az sql db replica show \ --name my-database \ --server my-server \ --resource-group my-rg \ --query '{State:replicationState,Lag:replicationLagSeconds}'

{ "State": "CATCH_UP", "Lag": 120 # Seconds behind primary } ```

Read-scale timeout:

json
{
  "error": {
    "code": "QueryTimeout",
    "message": "Query against geo-replica timed out - replica is catching up"
  }
}

Common Causes

  1. 1.High write workload - Transaction log generation exceeds replication bandwidth
  2. 2.Secondary under-provisioned - Compute tier lower than primary
  3. 3.Network latency - Cross-region network slow or congested
  4. 4.Long-running transactions - Large transactions increase log queue
  5. 5.Secondary throttling - Secondary DTU/vCore limit hit
  6. 6.Maintenance operations - Index rebuilds, stats updates increase log
  7. 7.Cross-region distance - Physical distance adds latency

Step-by-Step Fix

Step 1: Check Replication Lag

```sql -- Check current replication lag sqlcmd -S my-server.database.windows.net -d master -U admin -P password -Q " SELECT database_id, start_date, modify_date, partner_server, partner_database, state_desc, synchronization_state_desc, last_replication_date, replication_lag_sec FROM sys.dm_geo_replication_link_status "

-- replication_lag_sec: Seconds behind primary -- > 60s indicates potential issues ```

Step 2: Compare Primary and Secondary Compute

```bash # Check primary tier az sql db show \ --name my-database \ --server my-server \ --resource-group my-rg \ --query '{Sku:sku,Capacity:sku.capacity}'

# Check secondary tier az sql db show \ --name my-database-secondary \ --server my-server-secondary \ --resource-group my-rg \ --query '{Sku:sku,Capacity:sku.capacity}'

# Secondary should match or exceed primary compute for low lag ```

Step 3: Upgrade Secondary Compute

```bash # If secondary has lower tier, upgrade it az sql db update \ --name my-database-secondary \ --server my-server-secondary \ --resource-group my-rg \ --sku Standard \ --capacity 200 # Match primary S3 tier

# For vCore model az sql db update \ --name my-database-secondary \ --server my-server-secondary \ --resource-group my-rg \ --capacity 4 \ --family Gen5 ```

Step 4: Analyze Write Workload

```sql -- Check transaction log generation rate sqlcmd -S my-server.database.windows.net -d my-database -U admin -P password -Q " SELECT database_name, log_generation_rate_mb_per_sec, log_backup_rate_mb_per_sec, log_repl_rate_mb_per_sec FROM sys.dm_database_log_rate_stats "

-- If log_generation_rate >> log_repl_rate, replication can't keep up

-- Find high-write tables sqlcmd -S my-server.database.windows.net -d my-database -U admin -P password -Q " SELECT TOP 10 OBJECT_NAME(object_id) as table_name, SUM(user_updates) as total_updates, SUM(user_seeks + user_scans) as total_reads FROM sys.dm_db_index_usage_stats WHERE database_id = DB_ID() GROUP BY object_id ORDER BY total_updates DESC " ```

Step 5: Optimize Write Operations

```sql -- Batch large inserts -- Instead of: INSERT INTO LargeTable VALUES (1, 'data'); INSERT INTO LargeTable VALUES (2, 'data'); -- ... millions of individual inserts

-- Use bulk insert: BULK INSERT LargeTable FROM 'data.csv' WITH (TABLOCK);

-- Or batch inserts: INSERT INTO LargeTable SELECT * FROM StagingTable;

-- Reduce transaction size -- Instead of one large transaction, batch into smaller chunks BEGIN TRANSACTION; DELETE TOP (10000) FROM LargeTable WHERE Date < '2023-01-01'; COMMIT TRANSACTION; -- Repeat until done ```

Step 6: Check Network Connectivity

```bash # Check regions distance az account list-locations \ --query "[?name=='eastus' || name=='westus'].{Name:name,Region:regionalDisplayName}"

# Test network latency between regions # From primary region VM to secondary region storage az storage account show \ --name testlatency \ --resource-group my-rg \ --query 'location'

# Cross-region latency should be < 100ms for acceptable replication # If latency high, consider closer region pair ```

Step 7: Monitor Secondary Throttling

```sql -- Check secondary resource usage sqlcmd -S my-server-secondary.database.windows.net -d my-database-secondary -U admin -P password -Q " SELECT AVG(avg_cpu_percent) as avg_cpu, MAX(avg_cpu_percent) as max_cpu, AVG(avg_data_io_percent) as avg_io, MAX(avg_data_io_percent) as max_io FROM sys.dm_db_resource_stats WHERE end_time > DATEADD(hour, -1, GETDATE()) "

-- If max values > 90%, secondary is throttling ```

Step 8: Pause Heavy Maintenance During Peak

```sql -- Index rebuilds generate significant log -- Schedule during low activity

-- Check for running index operations sqlcmd -S my-server.database.windows.net -d my-database -U admin -P password -Q " SELECT r.session_id, r.status, r.command, r.percent_complete, r.estimated_completion_time/60000 as minutes_remaining FROM sys.dm_exec_requests r WHERE r.command IN ('CREATE INDEX', 'ALTER INDEX', 'UPDATE STATISTICS') "

-- Cancel if needed KILL 53; -- session_id from above

-- Use ONLINE index operations when possible CREATE INDEX IX_Table_Col ON Table(Col) WITH (ONLINE = ON); ```

Step 9: Configure Auto-Failover Group

```bash # Auto-failover groups provide better replication management az sql failover-group show \ --name my-failover-group \ --resource-group my-rg \ --server my-server \ --query '{Databases:databases,State:replicationState}'

# Create failover group if not exists az sql failover-group create \ --name my-failover-group \ --resource-group my-rg \ --server my-server \ --partner-server my-server-secondary \ --database my-database \ --grace-period 1 # minutes before forced failover ```

Step 10: Set Up Replication Lag Alerts

```bash # Create alert for high replication lag az monitor metrics alert create \ --name sql-replica-lag \ --resource-group my-rg \ --scopes /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.Sql/servers/my-server/databases/my-database \ --condition "avg replication_lag_seconds > 60" \ --window-size 5m

Replication Lag Thresholds

LagStatusAction
< 30sNormalNo action needed
30-60sElevatedMonitor
60-300sHighInvestigate and optimize
> 300sCriticalReduce write load or upgrade secondary

Verification

```sql -- After reducing write load or upgrading secondary sqlcmd -S my-server.database.windows.net -d master -U admin -P password -Q " SELECT partner_server, state_desc, synchronization_state_desc, replication_lag_sec FROM sys.dm_geo_replication_link_status "

-- Should show: -- state_desc: ONLINE -- synchronization_state_desc: SYNCHRONIZED -- replication_lag_sec: < 30

-- Monitor for 24 hours -- Query hourly lag SELECT DATEPART(hour, modify_date) as hour, AVG(replication_lag_sec) as avg_lag FROM sys.dm_geo_replication_link_status_history GROUP BY DATEPART(hour, modify_date) ORDER BY hour; ```

Prevention

To prevent Azure SQL database geo-replica lag issues from recurring, implement these proactive measures:

1. Monitor Replication Lag

yaml
groups:
- name: azure-sql-replication
  rules:
  - alert: AzureSQLGeoReplicaLag
    expr: |
      azure_sql_geo_replica_lag_seconds > 30
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Azure SQL geo-replica lag exceeds 30 seconds"

2. Size Secondary Appropriately

```bash # Ensure secondary has same or higher compute tier az sql db create --name my-db --partner-server my-secondary-server --resource-group my-rg --server my-server --service-objective S3

# Monitor DTU usage on primary az monitor metrics list --resource /subscriptions/.../servers/my-server/databases/my-db --metric dtu_consumption_percent

# Scale secondary if needed az sql db update --name my-db --server my-secondary-server --resource-group my-rg --service-objective S4 ```

3. Optimize Write Workload

```sql -- Reduce write volume on primary -- Use bulk operations instead of individual inserts BULK INSERT orders FROM 'orders.csv' WITH (ROWTERMINATOR = '\n');

-- Batch updates UPDATE orders SET status = 'processed' WHERE id BETWEEN 1 AND 1000; UPDATE orders SET status = 'processed' WHERE id BETWEEN 1001 AND 2000; ```

Best Practices Checklist

  • [ ] Monitor replication lag
  • [ ] Size secondary appropriately
  • [ ] Optimize write workload
  • [ ] Use same compute tier on primary and secondary
  • [ ] Monitor network latency
  • [ ] Test failover regularly
  • [Fix Azure SQL Database DTU Limit](/articles/fix-azure-sql-database-dtu-limit)
  • [Fix Azure SQL Transaction Log Full](/articles/fix-azure-sql-transaction-log-full)
  • [Fix Azure SQL Failover Group Not Syncing](/articles/fix-azure-sql-failover-group-not-syncing)
  • [Technical troubleshooting: Fix Azure Aks Pod Crashloopbackoff Issue in Azure](azure-aks-pod-crashloopbackoff)
  • [Technical troubleshooting: Fix Azure Api Management Policy Expression Runtime](azure-api-management-policy-expression-runtime-error)
  • [Technical troubleshooting: Fix Azure App Configuration Feature Flag Not Refre](azure-app-configuration-feature-flag-not-refreshing)
  • [Technical troubleshooting: Fix Azure App Service 503 Always On Disabled Issue](azure-app-service-503-always-on-disabled)
  • [Technical troubleshooting: Fix Azure Application Gateway Err SSL Unrecognized](azure-application-gateway-err-ssl-unrecognized-name-alert)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Azure SQL Database Geo-Replica Lag", "description": "Troubleshoot Azure SQL geo-replica lag. Optimize write workload, increase secondary compute, and check network connectivity.", "url": "https://www.fixwikihub.com/fix-azure-sql-database-geo-replica-lag", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-03T18:01:18.143Z", "dateModified": "2026-04-03T18:01:18.143Z" } </script>