Claude Agent Skill · by Wshobson

Python Performance Optimization

This walks you through the essential Python profiling toolkit, from cProfile for CPU bottlenecks to memory_profiler for tracking allocations and py-spy for prod

Install
Terminal · npx
$npx skills add https://github.com/wshobson/agents --skill python-performance-optimization
Works with Paperclip

How Python Performance Optimization fits into a Paperclip company.

Python Performance Optimization drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md437 lines
Expand
---name: python-performance-optimizationdescription: Profile and optimize Python code using cProfile, memory profilers, and performance best practices. Use when debugging slow Python code, optimizing bottlenecks, or improving application performance.--- # Python Performance Optimization Comprehensive guide to profiling, analyzing, and optimizing Python code for better performance, including CPU profiling, memory optimization, and implementation best practices. ## When to Use This Skill - Identifying performance bottlenecks in Python applications- Reducing application latency and response times- Optimizing CPU-intensive operations- Reducing memory consumption and memory leaks- Improving database query performance- Optimizing I/O operations- Speeding up data processing pipelines- Implementing high-performance algorithms- Profiling production applications ## Core Concepts ### 1. Profiling Types - **CPU Profiling**: Identify time-consuming functions- **Memory Profiling**: Track memory allocation and leaks- **Line Profiling**: Profile at line-by-line granularity- **Call Graph**: Visualize function call relationships ### 2. Performance Metrics - **Execution Time**: How long operations take- **Memory Usage**: Peak and average memory consumption- **CPU Utilization**: Processor usage patterns- **I/O Wait**: Time spent on I/O operations ### 3. Optimization Strategies - **Algorithmic**: Better algorithms and data structures- **Implementation**: More efficient code patterns- **Parallelization**: Multi-threading/processing- **Caching**: Avoid redundant computation- **Native Extensions**: C/Rust for critical paths ## Quick Start ### Basic Timing ```pythonimport time def measure_time():    """Simple timing measurement."""    start = time.time()     # Your code here    result = sum(range(1000000))     elapsed = time.time() - start    print(f"Execution time: {elapsed:.4f} seconds")    return result # Better: use timeit for accurate measurementsimport timeit execution_time = timeit.timeit(    "sum(range(1000000))",    number=100)print(f"Average time: {execution_time/100:.6f} seconds")``` ## Profiling Tools ### Pattern 1: cProfile - CPU Profiling ```pythonimport cProfileimport pstatsfrom pstats import SortKey def slow_function():    """Function to profile."""    total = 0    for i in range(1000000):        total += i    return total def another_function():    """Another function."""    return [i**2 for i in range(100000)] def main():    """Main function to profile."""    result1 = slow_function()    result2 = another_function()    return result1, result2 # Profile the codeif __name__ == "__main__":    profiler = cProfile.Profile()    profiler.enable()     main()     profiler.disable()     # Print stats    stats = pstats.Stats(profiler)    stats.sort_stats(SortKey.CUMULATIVE)    stats.print_stats(10)  # Top 10 functions     # Save to file for later analysis    stats.dump_stats("profile_output.prof")``` **Command-line profiling:** ```bash# Profile a scriptpython -m cProfile -o output.prof script.py # View resultspython -m pstats output.prof# In pstats:# sort cumtime# stats 10``` ### Pattern 2: line_profiler - Line-by-Line Profiling ```python# Install: pip install line-profiler # Add @profile decorator (line_profiler provides this)@profiledef process_data(data):    """Process data with line profiling."""    result = []    for item in data:        processed = item * 2        result.append(processed)    return result # Run with:# kernprof -l -v script.py``` **Manual line profiling:** ```pythonfrom line_profiler import LineProfiler def process_data(data):    """Function to profile."""    result = []    for item in data:        processed = item * 2        result.append(processed)    return result if __name__ == "__main__":    lp = LineProfiler()    lp.add_function(process_data)     data = list(range(100000))     lp_wrapper = lp(process_data)    lp_wrapper(data)     lp.print_stats()``` ### Pattern 3: memory_profiler - Memory Usage ```python# Install: pip install memory-profiler from memory_profiler import profile @profiledef memory_intensive():    """Function that uses lots of memory."""    # Create large list    big_list = [i for i in range(1000000)]     # Create large dict    big_dict = {i: i**2 for i in range(100000)}     # Process data    result = sum(big_list)     return result if __name__ == "__main__":    memory_intensive() # Run with:# python -m memory_profiler script.py``` ### Pattern 4: py-spy - Production Profiling ```bash# Install: pip install py-spy # Profile a running Python processpy-spy top --pid 12345 # Generate flamegraphpy-spy record -o profile.svg --pid 12345 # Profile a scriptpy-spy record -o profile.svg -- python script.py # Dump current call stackpy-spy dump --pid 12345``` ## Optimization Patterns ### Pattern 5: List Comprehensions vs Loops ```pythonimport timeit # Slow: Traditional loopdef slow_squares(n):    """Create list of squares using loop."""    result = []    for i in range(n):        result.append(i**2)    return result # Fast: List comprehensiondef fast_squares(n):    """Create list of squares using comprehension."""    return [i**2 for i in range(n)] # Benchmarkn = 100000 slow_time = timeit.timeit(lambda: slow_squares(n), number=100)fast_time = timeit.timeit(lambda: fast_squares(n), number=100) print(f"Loop: {slow_time:.4f}s")print(f"Comprehension: {fast_time:.4f}s")print(f"Speedup: {slow_time/fast_time:.2f}x") # Even faster for simple operations: mapdef faster_squares(n):    """Use map for even better performance."""    return list(map(lambda x: x**2, range(n)))``` ### Pattern 6: Generator Expressions for Memory ```pythonimport sys def list_approach():    """Memory-intensive list."""    data = [i**2 for i in range(1000000)]    return sum(data) def generator_approach():    """Memory-efficient generator."""    data = (i**2 for i in range(1000000))    return sum(data) # Memory comparisonlist_data = [i for i in range(1000000)]gen_data = (i for i in range(1000000)) print(f"List size: {sys.getsizeof(list_data)} bytes")print(f"Generator size: {sys.getsizeof(gen_data)} bytes") # Generators use constant memory regardless of size``` ### Pattern 7: String Concatenation ```pythonimport timeit def slow_concat(items):    """Slow string concatenation."""    result = ""    for item in items:        result += str(item)    return result def fast_concat(items):    """Fast string concatenation with join."""    return "".join(str(item) for item in items) def faster_concat(items):    """Even faster with list."""    parts = [str(item) for item in items]    return "".join(parts) items = list(range(10000)) # Benchmarkslow = timeit.timeit(lambda: slow_concat(items), number=100)fast = timeit.timeit(lambda: fast_concat(items), number=100)faster = timeit.timeit(lambda: faster_concat(items), number=100) print(f"Concatenation (+): {slow:.4f}s")print(f"Join (generator): {fast:.4f}s")print(f"Join (list): {faster:.4f}s")``` ### Pattern 8: Dictionary Lookups vs List Searches ```pythonimport timeit # Create test datasize = 10000items = list(range(size))lookup_dict = {i: i for i in range(size)} def list_search(items, target):    """O(n) search in list."""    return target in items def dict_search(lookup_dict, target):    """O(1) search in dict."""    return target in lookup_dict target = size - 1  # Worst case for list # Benchmarklist_time = timeit.timeit(    lambda: list_search(items, target),    number=1000)dict_time = timeit.timeit(    lambda: dict_search(lookup_dict, target),    number=1000) print(f"List search: {list_time:.6f}s")print(f"Dict search: {dict_time:.6f}s")print(f"Speedup: {list_time/dict_time:.0f}x")``` ### Pattern 9: Local Variable Access ```pythonimport timeit # Global variable (slow)GLOBAL_VALUE = 100 def use_global():    """Access global variable."""    total = 0    for i in range(10000):        total += GLOBAL_VALUE    return total def use_local():    """Use local variable."""    local_value = 100    total = 0    for i in range(10000):        total += local_value    return total # Local is fasterglobal_time = timeit.timeit(use_global, number=1000)local_time = timeit.timeit(use_local, number=1000) print(f"Global access: {global_time:.4f}s")print(f"Local access: {local_time:.4f}s")print(f"Speedup: {global_time/local_time:.2f}x")``` ### Pattern 10: Function Call Overhead ```pythonimport timeit def calculate_inline():    """Inline calculation."""    total = 0    for i in range(10000):        total += i * 2 + 1    return total def helper_function(x):    """Helper function."""    return x * 2 + 1 def calculate_with_function():    """Calculation with function calls."""    total = 0    for i in range(10000):        total += helper_function(i)    return total # Inline is faster due to no call overheadinline_time = timeit.timeit(calculate_inline, number=1000)function_time = timeit.timeit(calculate_with_function, number=1000) print(f"Inline: {inline_time:.4f}s")print(f"Function calls: {function_time:.4f}s")``` For advanced optimization techniques including NumPy vectorization, caching, memory management, parallelization, async I/O, database optimization, and benchmarking tools, see [references/advanced-patterns.md](references/advanced-patterns.md) ## Best Practices 1. **Profile before optimizing** - Measure to find real bottlenecks2. **Focus on hot paths** - Optimize code that runs most frequently3. **Use appropriate data structures** - Dict for lookups, set for membership4. **Avoid premature optimization** - Clarity first, then optimize5. **Use built-in functions** - They're implemented in C6. **Cache expensive computations** - Use lru_cache7. **Batch I/O operations** - Reduce system calls8. **Use generators** for large datasets9. **Consider NumPy** for numerical operations10. **Profile production code** - Use py-spy for live systems ## Common Pitfalls - Optimizing without profiling- Using global variables unnecessarily- Not using appropriate data structures- Creating unnecessary copies of data- Not using connection pooling for databases- Ignoring algorithmic complexity- Over-optimizing rare code paths- Not considering memory usage