Introduction
Azure SQL Database geo-replication asynchronously replicates data from primary to secondary regions. When replication lag increases, the secondary database falls behind the primary, affecting disaster recovery RPO and read-scale workloads.
Symptoms
High replication lag:
```sql -- Query replication link status sqlcmd -S my-server.database.windows.net -d master -U admin -P password -Q " SELECT partner_server, partner_database, state_desc, replication_lag_sec FROM sys.dm_geo_replication_link_status "
# replication_lag_sec shows high value (> 60 seconds) ```
Replication state warning:
```bash $ az sql db replica show \ --name my-database \ --server my-server \ --resource-group my-rg \ --query '{State:replicationState,Lag:replicationLagSeconds}'
{ "State": "CATCH_UP", "Lag": 120 # Seconds behind primary } ```
Read-scale timeout:
{
"error": {
"code": "QueryTimeout",
"message": "Query against geo-replica timed out - replica is catching up"
}
}Common Causes
- 1.High write workload - Transaction log generation exceeds replication bandwidth
- 2.Secondary under-provisioned - Compute tier lower than primary
- 3.Network latency - Cross-region network slow or congested
- 4.Long-running transactions - Large transactions increase log queue
- 5.Secondary throttling - Secondary DTU/vCore limit hit
- 6.Maintenance operations - Index rebuilds, stats updates increase log
- 7.Cross-region distance - Physical distance adds latency
Step-by-Step Fix
Step 1: Check Replication Lag
```sql -- Check current replication lag sqlcmd -S my-server.database.windows.net -d master -U admin -P password -Q " SELECT database_id, start_date, modify_date, partner_server, partner_database, state_desc, synchronization_state_desc, last_replication_date, replication_lag_sec FROM sys.dm_geo_replication_link_status "
-- replication_lag_sec: Seconds behind primary -- > 60s indicates potential issues ```
Step 2: Compare Primary and Secondary Compute
```bash # Check primary tier az sql db show \ --name my-database \ --server my-server \ --resource-group my-rg \ --query '{Sku:sku,Capacity:sku.capacity}'
# Check secondary tier az sql db show \ --name my-database-secondary \ --server my-server-secondary \ --resource-group my-rg \ --query '{Sku:sku,Capacity:sku.capacity}'
# Secondary should match or exceed primary compute for low lag ```
Step 3: Upgrade Secondary Compute
```bash # If secondary has lower tier, upgrade it az sql db update \ --name my-database-secondary \ --server my-server-secondary \ --resource-group my-rg \ --sku Standard \ --capacity 200 # Match primary S3 tier
# For vCore model az sql db update \ --name my-database-secondary \ --server my-server-secondary \ --resource-group my-rg \ --capacity 4 \ --family Gen5 ```
Step 4: Analyze Write Workload
```sql -- Check transaction log generation rate sqlcmd -S my-server.database.windows.net -d my-database -U admin -P password -Q " SELECT database_name, log_generation_rate_mb_per_sec, log_backup_rate_mb_per_sec, log_repl_rate_mb_per_sec FROM sys.dm_database_log_rate_stats "
-- If log_generation_rate >> log_repl_rate, replication can't keep up
-- Find high-write tables sqlcmd -S my-server.database.windows.net -d my-database -U admin -P password -Q " SELECT TOP 10 OBJECT_NAME(object_id) as table_name, SUM(user_updates) as total_updates, SUM(user_seeks + user_scans) as total_reads FROM sys.dm_db_index_usage_stats WHERE database_id = DB_ID() GROUP BY object_id ORDER BY total_updates DESC " ```
Step 5: Optimize Write Operations
```sql -- Batch large inserts -- Instead of: INSERT INTO LargeTable VALUES (1, 'data'); INSERT INTO LargeTable VALUES (2, 'data'); -- ... millions of individual inserts
-- Use bulk insert: BULK INSERT LargeTable FROM 'data.csv' WITH (TABLOCK);
-- Or batch inserts: INSERT INTO LargeTable SELECT * FROM StagingTable;
-- Reduce transaction size -- Instead of one large transaction, batch into smaller chunks BEGIN TRANSACTION; DELETE TOP (10000) FROM LargeTable WHERE Date < '2023-01-01'; COMMIT TRANSACTION; -- Repeat until done ```
Step 6: Check Network Connectivity
```bash # Check regions distance az account list-locations \ --query "[?name=='eastus' || name=='westus'].{Name:name,Region:regionalDisplayName}"
# Test network latency between regions # From primary region VM to secondary region storage az storage account show \ --name testlatency \ --resource-group my-rg \ --query 'location'
# Cross-region latency should be < 100ms for acceptable replication # If latency high, consider closer region pair ```
Step 7: Monitor Secondary Throttling
```sql -- Check secondary resource usage sqlcmd -S my-server-secondary.database.windows.net -d my-database-secondary -U admin -P password -Q " SELECT AVG(avg_cpu_percent) as avg_cpu, MAX(avg_cpu_percent) as max_cpu, AVG(avg_data_io_percent) as avg_io, MAX(avg_data_io_percent) as max_io FROM sys.dm_db_resource_stats WHERE end_time > DATEADD(hour, -1, GETDATE()) "
-- If max values > 90%, secondary is throttling ```
Step 8: Pause Heavy Maintenance During Peak
```sql -- Index rebuilds generate significant log -- Schedule during low activity
-- Check for running index operations sqlcmd -S my-server.database.windows.net -d my-database -U admin -P password -Q " SELECT r.session_id, r.status, r.command, r.percent_complete, r.estimated_completion_time/60000 as minutes_remaining FROM sys.dm_exec_requests r WHERE r.command IN ('CREATE INDEX', 'ALTER INDEX', 'UPDATE STATISTICS') "
-- Cancel if needed KILL 53; -- session_id from above
-- Use ONLINE index operations when possible CREATE INDEX IX_Table_Col ON Table(Col) WITH (ONLINE = ON); ```
Step 9: Configure Auto-Failover Group
```bash # Auto-failover groups provide better replication management az sql failover-group show \ --name my-failover-group \ --resource-group my-rg \ --server my-server \ --query '{Databases:databases,State:replicationState}'
# Create failover group if not exists az sql failover-group create \ --name my-failover-group \ --resource-group my-rg \ --server my-server \ --partner-server my-server-secondary \ --database my-database \ --grace-period 1 # minutes before forced failover ```
Step 10: Set Up Replication Lag Alerts
```bash # Create alert for high replication lag az monitor metrics alert create \ --name sql-replica-lag \ --resource-group my-rg \ --scopes /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.Sql/servers/my-server/databases/my-database \ --condition "avg replication_lag_seconds > 60" \ --window-size 5m
Replication Lag Thresholds
| Lag | Status | Action |
|---|---|---|
| < 30s | Normal | No action needed |
| 30-60s | Elevated | Monitor |
| 60-300s | High | Investigate and optimize |
| > 300s | Critical | Reduce write load or upgrade secondary |
Verification
```sql -- After reducing write load or upgrading secondary sqlcmd -S my-server.database.windows.net -d master -U admin -P password -Q " SELECT partner_server, state_desc, synchronization_state_desc, replication_lag_sec FROM sys.dm_geo_replication_link_status "
-- Should show: -- state_desc: ONLINE -- synchronization_state_desc: SYNCHRONIZED -- replication_lag_sec: < 30
-- Monitor for 24 hours -- Query hourly lag SELECT DATEPART(hour, modify_date) as hour, AVG(replication_lag_sec) as avg_lag FROM sys.dm_geo_replication_link_status_history GROUP BY DATEPART(hour, modify_date) ORDER BY hour; ```
Prevention
To prevent Azure SQL database geo-replica lag issues from recurring, implement these proactive measures:
1. Monitor Replication Lag
groups:
- name: azure-sql-replication
rules:
- alert: AzureSQLGeoReplicaLag
expr: |
azure_sql_geo_replica_lag_seconds > 30
for: 5m
labels:
severity: warning
annotations:
summary: "Azure SQL geo-replica lag exceeds 30 seconds"2. Size Secondary Appropriately
```bash # Ensure secondary has same or higher compute tier az sql db create --name my-db --partner-server my-secondary-server --resource-group my-rg --server my-server --service-objective S3
# Monitor DTU usage on primary az monitor metrics list --resource /subscriptions/.../servers/my-server/databases/my-db --metric dtu_consumption_percent
# Scale secondary if needed az sql db update --name my-db --server my-secondary-server --resource-group my-rg --service-objective S4 ```
3. Optimize Write Workload
```sql -- Reduce write volume on primary -- Use bulk operations instead of individual inserts BULK INSERT orders FROM 'orders.csv' WITH (ROWTERMINATOR = '\n');
-- Batch updates UPDATE orders SET status = 'processed' WHERE id BETWEEN 1 AND 1000; UPDATE orders SET status = 'processed' WHERE id BETWEEN 1001 AND 2000; ```
Best Practices Checklist
- [ ] Monitor replication lag
- [ ] Size secondary appropriately
- [ ] Optimize write workload
- [ ] Use same compute tier on primary and secondary
- [ ] Monitor network latency
- [ ] Test failover regularly
Related Issues
- [Fix Azure SQL Database DTU Limit](/articles/fix-azure-sql-database-dtu-limit)
- [Fix Azure SQL Transaction Log Full](/articles/fix-azure-sql-transaction-log-full)
- [Fix Azure SQL Failover Group Not Syncing](/articles/fix-azure-sql-failover-group-not-syncing)
Related Articles
- [Technical troubleshooting: Fix Azure Aks Pod Crashloopbackoff Issue in Azure](azure-aks-pod-crashloopbackoff)
- [Technical troubleshooting: Fix Azure Api Management Policy Expression Runtime](azure-api-management-policy-expression-runtime-error)
- [Technical troubleshooting: Fix Azure App Configuration Feature Flag Not Refre](azure-app-configuration-feature-flag-not-refreshing)
- [Technical troubleshooting: Fix Azure App Service 503 Always On Disabled Issue](azure-app-service-503-always-on-disabled)
- [Technical troubleshooting: Fix Azure Application Gateway Err SSL Unrecognized](azure-application-gateway-err-ssl-unrecognized-name-alert)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Azure SQL Database Geo-Replica Lag", "description": "Troubleshoot Azure SQL geo-replica lag. Optimize write workload, increase secondary compute, and check network connectivity.", "url": "https://www.fixwikihub.com/fix-azure-sql-database-geo-replica-lag", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-03T18:01:18.143Z", "dateModified": "2026-04-03T18:01:18.143Z" } </script>