Introduction
Cosmos DB distributes data across physical partitions based on the partition key. When a partition key has low cardinality or uneven access patterns, all traffic goes to a single physical partition, causing throttling despite having sufficient total RU/s.
Symptoms
Hot partition throttling:
```bash # 429 errors despite having enough total RU/s # Errors only on specific partition key values
# In metrics, see: PartitionKeyRangeId: 0 -> 100% utilization (hot) PartitionKeyRangeId: 1 -> 10% utilization (cold) PartitionKeyRangeId: 2 -> 5% utilization (cold) ```
Uneven partition distribution:
```sql -- Query to check partition distribution SELECT c.partitionKey, COUNT(1) as count FROM c GROUP BY c.partitionKey
-- Shows most documents in few partition keys ```
Query performance variance:
```bash # Queries on hot partition are slow # Cross-partition queries faster than single-partition
SELECT * FROM c WHERE c.status = 'active' -- Slow (all in one partition) SELECT * FROM c WHERE c.id = 'xyz' -- Fast (different partition) ```
Common Causes
- 1.Low cardinality partition key - Few distinct values (e.g., status field)
- 2.Temporal access pattern - All writes to current date partition
- 3.Tenant hotspot - Large tenant in single partition
- 4.Monotonically increasing keys - Sequential IDs cause all writes to same partition
- 5.Incorrect partition key choice - Not aligned with query patterns
- 6.Missing synthetic key - Not combining fields for better distribution
Step-by-Step Fix
- 1.Check logs for specific error messages
- 2.Verify configuration settings
- 3.Test network connectivity
- 4.Review recent changes
- 5.Apply corrective action
- 6.Verify the fix
Step 1: Identify Hot Partition
```bash # Check partition metrics in Azure Portal # Metrics > Partition Key Range Id breakdown
# Or via CLI az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "ProvisionedThroughput" "ConsumedThroughput" \ --query 'value[0].timeseries'
# Look for one partition consuming most RUs ```
Step 2: Analyze Current Partition Key Distribution
```sql -- Check document distribution SELECT c.partitionKey, COUNT(1) as count, SUM(c._ts) as activity FROM c GROUP BY c.partitionKey ORDER BY count DESC
-- Check cardinality SELECT VALUE COUNT(1) FROM ( SELECT DISTINCT c.partitionKey FROM c )
-- Low cardinality = hotspot risk ```
Step 3: Design Better Partition Key
```json // Bad partition keys: // - "status" (only a few values: active, inactive) // - "tenantId" for multi-tenant (one large tenant = hotspot) // - "date" for time-series (current date = hotspot)
// Good partition keys: // - "userId" (high cardinality, even distribution) // - "orderId" (unique per document) // - Synthetic key: "tenantId-userId" (compound for distribution) ```
Step 4: Create New Container with Better Partition Key
```bash # Create new container with improved partition key az cosmosdb sql container create \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container-new \ --partition-key-path "/userId" \ --throughput 1000
# Or with synthetic key az cosmosdb sql container create \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container-new \ --partition-key-path "/compositeKey" \ --throughput 1000 ```
Step 5: Migrate Data with New Partition Key
```csharp // Migrate data with new partition key var sourceContainer = cosmosClient.GetContainer("my-db", "my-container"); var targetContainer = cosmosClient.GetContainer("my-db", "my-container-new");
var query = sourceContainer.GetItemQueryIterator<dynamic>("SELECT * FROM c");
while (query.HasMoreResults) { var page = await query.ReadNextAsync();
foreach (var item in page) { // Add new partition key item.compositeKey = $"{item.tenantId}-{item.userId}";
await targetContainer.CreateItemAsync(item, new PartitionKey(item.compositeKey.ToString())); } } ```
Step 6: Use Hierarchical Partition Keys (v2)
```bash # Cosmos DB now supports hierarchical partition keys # Better for multi-tenant scenarios
az cosmosdb sql container create \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container \ --partition-key-path '["/tenantId", "/userId"]'
// Queries can specify both or just first level // Better distribution for multi-tenant apps ```
Step 7: Implement Random Suffix for Time-Series
```csharp // For time-series data, add random suffix to avoid hot partition public string GetPartitionKey(DateTime timestamp) { // Add random bucket (0-9) to date var bucket = Random.Shared.Next(0, 10); return $"{timestamp:yyyy-MM-dd}-{bucket}"; }
// Query across all buckets for date range var dateStr = DateTime.UtcNow.ToString("yyyy-MM-dd"); var query = "SELECT * FROM c WHERE STARTSWITH(c.partitionKey, @date)"; var parameters = new[] { new SqlParameter("@date", dateStr) }; ```
Step 8: Monitor Partition Distribution
```bash # After migration, monitor new distribution az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "PartitionKeyRangeId"
# Check storage distribution az cosmosdb sql container show \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container \ --query 'resource.partitionKey' ```
Step 9: Update Query Patterns
```csharp // Queries must include partition key for efficiency // Old query (slow, cross-partition): var query = container.GetItemQueryIterator<dynamic>( "SELECT * FROM c WHERE c.userId = 'user123'");
// New query (fast, single partition): var query = container.GetItemQueryIterator<dynamic>( new QueryDefinition("SELECT * FROM c WHERE c.userId = @userId"), null, new QueryRequestOptions { PartitionKey = new PartitionKey("user123") }); ```
Step 10: Set Up Alerts for Hot Partitions
```bash # Create alert for partition imbalance az monitor metrics alert create \ --name cosmos-hot-partition-alert \ --resource-group my-rg \ --scopes /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --condition "avg ConsumedRUs > 80" \ --window-size 5m
# Monitor specific partition metrics az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "PhysicalPartitionThroughputInfo" ```
Partition Key Selection Guidelines
| Scenario | Good Partition Key | Avoid |
|---|---|---|
| User data | userId | status, country |
| Multi-tenant | tenantId-userId | tenantId alone |
| Time-series | date-randomSuffix | date alone |
| Orders | orderId | status |
| IoT devices | deviceId | deviceType |
Verification
```bash # Check RU distribution across partitions az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "ConsumedThroughput"
# Should show even distribution across partitions
# Query performance should improve # Throttling on hot partition should stop ```
Related Issues
- [Fix Azure Cosmos DB Throttling](/articles/fix-azure-cosmos-db-throttling)
- [Fix Azure Cosmos DB SQL Query Slow](/articles/fix-azure-cosmos-db-sql-query-slow)
- [Fix Azure Cosmos DB Change Feed Lag](/articles/fix-azure-cosmos-db-change-feed-lag)
Related Articles
- [Technical troubleshooting: Fix Azure Aks Pod Crashloopbackoff Issue in Azure](azure-aks-pod-crashloopbackoff)
- [Technical troubleshooting: Fix Azure Api Management Policy Expression Runtime](azure-api-management-policy-expression-runtime-error)
- [Technical troubleshooting: Fix Azure App Configuration Feature Flag Not Refre](azure-app-configuration-feature-flag-not-refreshing)
- [Technical troubleshooting: Fix Azure App Service 503 Always On Disabled Issue](azure-app-service-503-always-on-disabled)
- [Technical troubleshooting: Fix Azure Application Gateway Err SSL Unrecognized](azure-application-gateway-err-ssl-unrecognized-name-alert)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Azure Cosmos DB Partition Hotspot", "description": "Troubleshoot Cosmos DB partition hotspots. Redesign partition keys for better distribution and avoid throttling on hot partitions.", "url": "https://www.fixwikihub.com/fix-azure-cosmos-db-partition-hotspot", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-02T18:50:11.239Z", "dateModified": "2026-04-02T18:50:11.239Z" } </script>