# How to Fix Python Memory Error Large File
MemoryError occurs when Python runs out of available RAM while processing large files. This guide shows memory-efficient techniques to handle large datasets.
Introduction
This article covers troubleshooting steps and solutions for How to Fix Python Memory Error Large File. The error typically occurs in production environments and can cause service disruptions if not addressed promptly.
Symptoms
Common error messages include:
MemoryError
Traceback (most recent call last):
File "script.py", line 15, in <module>
data = f.read()
MemoryError: Unable to allocate arrayMemoryError: Unable to allocate 10.0 GiB for an array with shape (10000, 10000) and data type float64# DON'T: Load entire file into memory
with open('large_file.txt', 'r') as f:
content = f.read() # MemoryError for large files
lines = content.split('\n')Common Causes
- 1.Loading entire file into memory - Using
f.read()for large files - 2.Reading all lines at once - Using
f.readlines()creates huge lists - 3.Creating large data structures - Lists, dicts exceeding available RAM
- 4.Processing large datasets - Loading entire CSV/JSON files
- 5.Image/video processing - Loading full media files in memory
- 6.Database query results - Fetching all rows without pagination
Step-by-Step Fix for i in range(100_000_000): results.append(complex_calculation(i)) # MemoryError ```
Step-by-Step Fix
Solution 1: Process Line by Line
# DO: Process one line at a time
with open('large_file.txt', 'r') as f:
for line in f:
process(line)Solution 2: Use Chunked Reading
```python def read_in_chunks(file_path, chunk_size=8192): """Read file in chunks of bytes.""" with open(file_path, 'rb') as f: while True: chunk = f.read(chunk_size) if not chunk: break yield chunk
for chunk in read_in_chunks('large_file.bin'): process_chunk(chunk) ```
Solution 3: Use Generators
```python # DON'T: Return list def get_all_records(file_path): records = [] with open(file_path, 'r') as f: for line in f: records.append(parse_record(line)) return records # All in memory
# DO: Use generator def get_records(file_path): with open(file_path, 'r') as f: for line in f: yield parse_record(line) # One at a time
for record in get_records('large_file.csv'): process(record) ```
Solution 4: Process CSV with Pandas Chunks
```python import pandas as pd
# Process CSV in chunks chunk_size = 10000 for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): process_chunk(chunk) ```
Solution 5: Use Memory-Efficient Data Types
```python import pandas as pd
# Specify dtypes to save memory dtypes = { 'id': 'int32', # Instead of int64 'price': 'float32', # Instead of float64 'category': 'category' # For strings with few unique values }
df = pd.read_csv('large_file.csv', dtype=dtypes) ```
Solution 6: Filter Early
```python import pandas as pd
# Only load needed columns df = pd.read_csv('large_file.csv', usecols=['id', 'name', 'value'])
# Filter rows during read df = pd.read_csv('large_file.csv', usecols=['id', 'status'], skiprows=lambda x: x > 0 and should_skip(x)) ```
Solution 7: Use Memory-Mapped Files
```python import mmap
with open('large_file.txt', 'r') as f: # Create memory-mapped file mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
# Process without loading entire file for line in iter(mm.readline, b''): process(line)
mm.close() ```
Solution 8: Process JSON Files Efficiently
```python import json import ijson # pip install ijson
# DON'T: Load entire JSON with open('large.json', 'r') as f: data = json.load(f) # MemoryError
# DO: Stream JSON with ijson with open('large.json', 'rb') as f: for item in ijson.items(f, 'items.item'): process(item) ```
Solution 9: Use Dask for Large Datasets
```python import dask.dataframe as dd
# Dask handles larger-than-memory datasets ddf = dd.read_csv('large_file.csv') result = ddf.groupby('category').value.sum().compute() ```
Solution 10: Write Output Incrementally
```python # DON'T: Build output in memory output = [] for item in items: output.append(transform(item)) with open('output.txt', 'w') as f: f.write('\n'.join(output))
# DO: Write incrementally with open('output.txt', 'w') as f: for item in items: f.write(transform(item) + '\n') ```
Memory Profiling
Check Memory Usage
```python import psutil import os
process = psutil.Process(os.getpid()) print(f"Memory: {process.memory_info().rss / 1024 / 1024:.2f} MB") ```
Use Memory Profiler
pip install memory-profiler
python -m memory_profiler script.py```python from memory_profiler import profile
@profile def my_function(): # Your code here pass ```
Specialized Libraries
For Large Text Files
```python # Use fileinput for multiple files import fileinput
for line in fileinput.input(['file1.txt', 'file2.txt']): process(line) ```
For Large CSV Files
```python import csv
# Use csv module instead of loading all into memory with open('large.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: process(row) ```
For Large XML Files
```python import xml.etree.ElementTree as ET
# Use iterparse for streaming for event, elem in ET.iterparse('large.xml', events=('end',)): if elem.tag == 'record': process(elem) elem.clear() # Free memory ```
Verification
After applying the fixes, verify that the MemoryError is resolved by testing memory-efficient processing:
```python import psutil import os
def verify_memory_usage(file_path, max_mb=500): """Verify that file processing stays within memory limits.""" process = psutil.Process(os.getpid()) initial_memory = process.memory_info().rss / 1024 / 1024 # MB
# Process file using streaming (the fixed approach) with open(file_path, 'r') as f: for line in f: process_line(line)
final_memory = process.memory_info().rss / 1024 / 1024 # MB memory_increase = final_memory - initial_memory
print(f"Initial memory: {initial_memory:.2f} MB") print(f"Final memory: {final_memory:.2f} MB") print(f"Memory increase: {memory_increase:.2f} MB")
if memory_increase < max_mb: print(f"✓ Memory usage is within limits (< {max_mb} MB)") return True else: print(f"✗ Memory usage exceeded limit") return False
# Example usage verify_memory_usage('large_file.txt') ```
Prevention
- 1.Never load entire large files - always stream or chunk
- 2.Use generators instead of lists for large sequences
- 3.Profile memory usage before deploying to production
- 4.Consider databases for very large datasets (SQLite, PostgreSQL)
- 5.Use appropriate data types (int32 vs int64, float32 vs float64)
- 6.Delete unused variables:
del large_variable
# Force garbage collection
import gc
gc.collect()Related Articles
- [WordPress troubleshooting: Fix Django TypeError - Complete Troubles](fix-django-typeerror)
- [WordPress troubleshooting: Fix async task exception not awaited Iss](async-task-exception-not-awaited)
- [WordPress troubleshooting: Fix FastAPI AttributeError - Complete Tr](fix-fastapi-attributeerror)
- [WordPress troubleshooting: Fix Flask AttributeError - Complete Trou](fix-flask-attributeerror)
- [WordPress troubleshooting: Fix asyncio event loop closed rerun Issu](asyncio-event-loop-closed-rerun)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "TechArticle", "headline": "How to Fix Python Memory Error Large File", "description": "Solve Python MemoryError with streaming, chunked processing, generators, and memory-efficient file handling for large datasets.", "url": "https://www.fixwikihub.com/fix-python-memory-error-large-file", "publisher": { "@type": "Organization", "name": "FixWikiHub", "url": "https://www.fixwikihub.com" }, "author": { "@type": "Person", "name": "FixWikiHub Editorial Team" }, "datePublished": "2025-11-21T01:00:34.605Z", "dateModified": "2025-11-21T01:00:34.605Z" } </script>