Introduction

Cosmos DB distributes data across physical partitions based on the partition key. When a partition key has low cardinality or uneven access patterns, all traffic goes to a single physical partition, causing throttling despite having sufficient total RU/s.

Symptoms

Hot partition throttling:

```bash # 429 errors despite having enough total RU/s # Errors only on specific partition key values

# In metrics, see: PartitionKeyRangeId: 0 -> 100% utilization (hot) PartitionKeyRangeId: 1 -> 10% utilization (cold) PartitionKeyRangeId: 2 -> 5% utilization (cold) ```

Uneven partition distribution:

```sql -- Query to check partition distribution SELECT c.partitionKey, COUNT(1) as count FROM c GROUP BY c.partitionKey

-- Shows most documents in few partition keys ```

Query performance variance:

```bash # Queries on hot partition are slow # Cross-partition queries faster than single-partition

SELECT * FROM c WHERE c.status = 'active' -- Slow (all in one partition) SELECT * FROM c WHERE c.id = 'xyz' -- Fast (different partition) ```

Common Causes

  1. 1.Low cardinality partition key - Few distinct values (e.g., status field)
  2. 2.Temporal access pattern - All writes to current date partition
  3. 3.Tenant hotspot - Large tenant in single partition
  4. 4.Monotonically increasing keys - Sequential IDs cause all writes to same partition
  5. 5.Incorrect partition key choice - Not aligned with query patterns
  6. 6.Missing synthetic key - Not combining fields for better distribution

Step-by-Step Fix

  1. 1.Check logs for specific error messages
  2. 2.Verify configuration settings
  3. 3.Test network connectivity
  4. 4.Review recent changes
  5. 5.Apply corrective action
  6. 6.Verify the fix

Step 1: Identify Hot Partition

```bash # Check partition metrics in Azure Portal # Metrics > Partition Key Range Id breakdown

# Or via CLI az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "ProvisionedThroughput" "ConsumedThroughput" \ --query 'value[0].timeseries'

# Look for one partition consuming most RUs ```

Step 2: Analyze Current Partition Key Distribution

```sql -- Check document distribution SELECT c.partitionKey, COUNT(1) as count, SUM(c._ts) as activity FROM c GROUP BY c.partitionKey ORDER BY count DESC

-- Check cardinality SELECT VALUE COUNT(1) FROM ( SELECT DISTINCT c.partitionKey FROM c )

-- Low cardinality = hotspot risk ```

Step 3: Design Better Partition Key

```json // Bad partition keys: // - "status" (only a few values: active, inactive) // - "tenantId" for multi-tenant (one large tenant = hotspot) // - "date" for time-series (current date = hotspot)

// Good partition keys: // - "userId" (high cardinality, even distribution) // - "orderId" (unique per document) // - Synthetic key: "tenantId-userId" (compound for distribution) ```

Step 4: Create New Container with Better Partition Key

```bash # Create new container with improved partition key az cosmosdb sql container create \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container-new \ --partition-key-path "/userId" \ --throughput 1000

# Or with synthetic key az cosmosdb sql container create \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container-new \ --partition-key-path "/compositeKey" \ --throughput 1000 ```

Step 5: Migrate Data with New Partition Key

```csharp // Migrate data with new partition key var sourceContainer = cosmosClient.GetContainer("my-db", "my-container"); var targetContainer = cosmosClient.GetContainer("my-db", "my-container-new");

var query = sourceContainer.GetItemQueryIterator<dynamic>("SELECT * FROM c");

while (query.HasMoreResults) { var page = await query.ReadNextAsync();

foreach (var item in page) { // Add new partition key item.compositeKey = $"{item.tenantId}-{item.userId}";

await targetContainer.CreateItemAsync(item, new PartitionKey(item.compositeKey.ToString())); } } ```

Step 6: Use Hierarchical Partition Keys (v2)

```bash # Cosmos DB now supports hierarchical partition keys # Better for multi-tenant scenarios

az cosmosdb sql container create \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container \ --partition-key-path '["/tenantId", "/userId"]'

// Queries can specify both or just first level // Better distribution for multi-tenant apps ```

Step 7: Implement Random Suffix for Time-Series

```csharp // For time-series data, add random suffix to avoid hot partition public string GetPartitionKey(DateTime timestamp) { // Add random bucket (0-9) to date var bucket = Random.Shared.Next(0, 10); return $"{timestamp:yyyy-MM-dd}-{bucket}"; }

// Query across all buckets for date range var dateStr = DateTime.UtcNow.ToString("yyyy-MM-dd"); var query = "SELECT * FROM c WHERE STARTSWITH(c.partitionKey, @date)"; var parameters = new[] { new SqlParameter("@date", dateStr) }; ```

Step 8: Monitor Partition Distribution

```bash # After migration, monitor new distribution az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "PartitionKeyRangeId"

# Check storage distribution az cosmosdb sql container show \ --account-name my-cosmos \ --resource-group my-rg \ --database-name my-db \ --name my-container \ --query 'resource.partitionKey' ```

Step 9: Update Query Patterns

```csharp // Queries must include partition key for efficiency // Old query (slow, cross-partition): var query = container.GetItemQueryIterator<dynamic>( "SELECT * FROM c WHERE c.userId = 'user123'");

// New query (fast, single partition): var query = container.GetItemQueryIterator<dynamic>( new QueryDefinition("SELECT * FROM c WHERE c.userId = @userId"), null, new QueryRequestOptions { PartitionKey = new PartitionKey("user123") }); ```

Step 10: Set Up Alerts for Hot Partitions

```bash # Create alert for partition imbalance az monitor metrics alert create \ --name cosmos-hot-partition-alert \ --resource-group my-rg \ --scopes /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --condition "avg ConsumedRUs > 80" \ --window-size 5m

# Monitor specific partition metrics az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "PhysicalPartitionThroughputInfo" ```

Partition Key Selection Guidelines

ScenarioGood Partition KeyAvoid
User datauserIdstatus, country
Multi-tenanttenantId-userIdtenantId alone
Time-seriesdate-randomSuffixdate alone
OrdersorderIdstatus
IoT devicesdeviceIddeviceType

Verification

```bash # Check RU distribution across partitions az monitor metrics list \ --resource /subscriptions/SUB/resourceGroups/my-rg/providers/Microsoft.DocumentDB/databaseAccounts/my-cosmos \ --metric "ConsumedThroughput"

# Should show even distribution across partitions

# Query performance should improve # Throttling on hot partition should stop ```

  • [Fix Azure Cosmos DB Throttling](/articles/fix-azure-cosmos-db-throttling)
  • [Fix Azure Cosmos DB SQL Query Slow](/articles/fix-azure-cosmos-db-sql-query-slow)
  • [Fix Azure Cosmos DB Change Feed Lag](/articles/fix-azure-cosmos-db-change-feed-lag)
  • [Technical troubleshooting: Fix Azure Aks Pod Crashloopbackoff Issue in Azure](azure-aks-pod-crashloopbackoff)
  • [Technical troubleshooting: Fix Azure Api Management Policy Expression Runtime](azure-api-management-policy-expression-runtime-error)
  • [Technical troubleshooting: Fix Azure App Configuration Feature Flag Not Refre](azure-app-configuration-feature-flag-not-refreshing)
  • [Technical troubleshooting: Fix Azure App Service 503 Always On Disabled Issue](azure-app-service-503-always-on-disabled)
  • [Technical troubleshooting: Fix Azure Application Gateway Err SSL Unrecognized](azure-application-gateway-err-ssl-unrecognized-name-alert)

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "Fix Azure Cosmos DB Partition Hotspot", "description": "Troubleshoot Cosmos DB partition hotspots. Redesign partition keys for better distribution and avoid throttling on hot partitions.", "url": "https://www.fixwikihub.com/fix-azure-cosmos-db-partition-hotspot", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2026-04-02T18:50:11.239Z", "dateModified": "2026-04-02T18:50:11.239Z" } </script>