npx skills add https://github.com/wshobson/agents --skill python-performance-optimizationHow Python Performance Optimization fits into a Paperclip company.
Python Performance Optimization drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
SKILL.md437 linesExpandCollapse
---name: python-performance-optimizationdescription: Profile and optimize Python code using cProfile, memory profilers, and performance best practices. Use when debugging slow Python code, optimizing bottlenecks, or improving application performance.--- # Python Performance Optimization Comprehensive guide to profiling, analyzing, and optimizing Python code for better performance, including CPU profiling, memory optimization, and implementation best practices. ## When to Use This Skill - Identifying performance bottlenecks in Python applications- Reducing application latency and response times- Optimizing CPU-intensive operations- Reducing memory consumption and memory leaks- Improving database query performance- Optimizing I/O operations- Speeding up data processing pipelines- Implementing high-performance algorithms- Profiling production applications ## Core Concepts ### 1. Profiling Types - **CPU Profiling**: Identify time-consuming functions- **Memory Profiling**: Track memory allocation and leaks- **Line Profiling**: Profile at line-by-line granularity- **Call Graph**: Visualize function call relationships ### 2. Performance Metrics - **Execution Time**: How long operations take- **Memory Usage**: Peak and average memory consumption- **CPU Utilization**: Processor usage patterns- **I/O Wait**: Time spent on I/O operations ### 3. Optimization Strategies - **Algorithmic**: Better algorithms and data structures- **Implementation**: More efficient code patterns- **Parallelization**: Multi-threading/processing- **Caching**: Avoid redundant computation- **Native Extensions**: C/Rust for critical paths ## Quick Start ### Basic Timing ```pythonimport time def measure_time(): """Simple timing measurement.""" start = time.time() # Your code here result = sum(range(1000000)) elapsed = time.time() - start print(f"Execution time: {elapsed:.4f} seconds") return result # Better: use timeit for accurate measurementsimport timeit execution_time = timeit.timeit( "sum(range(1000000))", number=100)print(f"Average time: {execution_time/100:.6f} seconds")``` ## Profiling Tools ### Pattern 1: cProfile - CPU Profiling ```pythonimport cProfileimport pstatsfrom pstats import SortKey def slow_function(): """Function to profile.""" total = 0 for i in range(1000000): total += i return total def another_function(): """Another function.""" return [i**2 for i in range(100000)] def main(): """Main function to profile.""" result1 = slow_function() result2 = another_function() return result1, result2 # Profile the codeif __name__ == "__main__": profiler = cProfile.Profile() profiler.enable() main() profiler.disable() # Print stats stats = pstats.Stats(profiler) stats.sort_stats(SortKey.CUMULATIVE) stats.print_stats(10) # Top 10 functions # Save to file for later analysis stats.dump_stats("profile_output.prof")``` **Command-line profiling:** ```bash# Profile a scriptpython -m cProfile -o output.prof script.py # View resultspython -m pstats output.prof# In pstats:# sort cumtime# stats 10``` ### Pattern 2: line_profiler - Line-by-Line Profiling ```python# Install: pip install line-profiler # Add @profile decorator (line_profiler provides this)@profiledef process_data(data): """Process data with line profiling.""" result = [] for item in data: processed = item * 2 result.append(processed) return result # Run with:# kernprof -l -v script.py``` **Manual line profiling:** ```pythonfrom line_profiler import LineProfiler def process_data(data): """Function to profile.""" result = [] for item in data: processed = item * 2 result.append(processed) return result if __name__ == "__main__": lp = LineProfiler() lp.add_function(process_data) data = list(range(100000)) lp_wrapper = lp(process_data) lp_wrapper(data) lp.print_stats()``` ### Pattern 3: memory_profiler - Memory Usage ```python# Install: pip install memory-profiler from memory_profiler import profile @profiledef memory_intensive(): """Function that uses lots of memory.""" # Create large list big_list = [i for i in range(1000000)] # Create large dict big_dict = {i: i**2 for i in range(100000)} # Process data result = sum(big_list) return result if __name__ == "__main__": memory_intensive() # Run with:# python -m memory_profiler script.py``` ### Pattern 4: py-spy - Production Profiling ```bash# Install: pip install py-spy # Profile a running Python processpy-spy top --pid 12345 # Generate flamegraphpy-spy record -o profile.svg --pid 12345 # Profile a scriptpy-spy record -o profile.svg -- python script.py # Dump current call stackpy-spy dump --pid 12345``` ## Optimization Patterns ### Pattern 5: List Comprehensions vs Loops ```pythonimport timeit # Slow: Traditional loopdef slow_squares(n): """Create list of squares using loop.""" result = [] for i in range(n): result.append(i**2) return result # Fast: List comprehensiondef fast_squares(n): """Create list of squares using comprehension.""" return [i**2 for i in range(n)] # Benchmarkn = 100000 slow_time = timeit.timeit(lambda: slow_squares(n), number=100)fast_time = timeit.timeit(lambda: fast_squares(n), number=100) print(f"Loop: {slow_time:.4f}s")print(f"Comprehension: {fast_time:.4f}s")print(f"Speedup: {slow_time/fast_time:.2f}x") # Even faster for simple operations: mapdef faster_squares(n): """Use map for even better performance.""" return list(map(lambda x: x**2, range(n)))``` ### Pattern 6: Generator Expressions for Memory ```pythonimport sys def list_approach(): """Memory-intensive list.""" data = [i**2 for i in range(1000000)] return sum(data) def generator_approach(): """Memory-efficient generator.""" data = (i**2 for i in range(1000000)) return sum(data) # Memory comparisonlist_data = [i for i in range(1000000)]gen_data = (i for i in range(1000000)) print(f"List size: {sys.getsizeof(list_data)} bytes")print(f"Generator size: {sys.getsizeof(gen_data)} bytes") # Generators use constant memory regardless of size``` ### Pattern 7: String Concatenation ```pythonimport timeit def slow_concat(items): """Slow string concatenation.""" result = "" for item in items: result += str(item) return result def fast_concat(items): """Fast string concatenation with join.""" return "".join(str(item) for item in items) def faster_concat(items): """Even faster with list.""" parts = [str(item) for item in items] return "".join(parts) items = list(range(10000)) # Benchmarkslow = timeit.timeit(lambda: slow_concat(items), number=100)fast = timeit.timeit(lambda: fast_concat(items), number=100)faster = timeit.timeit(lambda: faster_concat(items), number=100) print(f"Concatenation (+): {slow:.4f}s")print(f"Join (generator): {fast:.4f}s")print(f"Join (list): {faster:.4f}s")``` ### Pattern 8: Dictionary Lookups vs List Searches ```pythonimport timeit # Create test datasize = 10000items = list(range(size))lookup_dict = {i: i for i in range(size)} def list_search(items, target): """O(n) search in list.""" return target in items def dict_search(lookup_dict, target): """O(1) search in dict.""" return target in lookup_dict target = size - 1 # Worst case for list # Benchmarklist_time = timeit.timeit( lambda: list_search(items, target), number=1000)dict_time = timeit.timeit( lambda: dict_search(lookup_dict, target), number=1000) print(f"List search: {list_time:.6f}s")print(f"Dict search: {dict_time:.6f}s")print(f"Speedup: {list_time/dict_time:.0f}x")``` ### Pattern 9: Local Variable Access ```pythonimport timeit # Global variable (slow)GLOBAL_VALUE = 100 def use_global(): """Access global variable.""" total = 0 for i in range(10000): total += GLOBAL_VALUE return total def use_local(): """Use local variable.""" local_value = 100 total = 0 for i in range(10000): total += local_value return total # Local is fasterglobal_time = timeit.timeit(use_global, number=1000)local_time = timeit.timeit(use_local, number=1000) print(f"Global access: {global_time:.4f}s")print(f"Local access: {local_time:.4f}s")print(f"Speedup: {global_time/local_time:.2f}x")``` ### Pattern 10: Function Call Overhead ```pythonimport timeit def calculate_inline(): """Inline calculation.""" total = 0 for i in range(10000): total += i * 2 + 1 return total def helper_function(x): """Helper function.""" return x * 2 + 1 def calculate_with_function(): """Calculation with function calls.""" total = 0 for i in range(10000): total += helper_function(i) return total # Inline is faster due to no call overheadinline_time = timeit.timeit(calculate_inline, number=1000)function_time = timeit.timeit(calculate_with_function, number=1000) print(f"Inline: {inline_time:.4f}s")print(f"Function calls: {function_time:.4f}s")``` For advanced optimization techniques including NumPy vectorization, caching, memory management, parallelization, async I/O, database optimization, and benchmarking tools, see [references/advanced-patterns.md](references/advanced-patterns.md) ## Best Practices 1. **Profile before optimizing** - Measure to find real bottlenecks2. **Focus on hot paths** - Optimize code that runs most frequently3. **Use appropriate data structures** - Dict for lookups, set for membership4. **Avoid premature optimization** - Clarity first, then optimize5. **Use built-in functions** - They're implemented in C6. **Cache expensive computations** - Use lru_cache7. **Batch I/O operations** - Reduce system calls8. **Use generators** for large datasets9. **Consider NumPy** for numerical operations10. **Profile production code** - Use py-spy for live systems ## Common Pitfalls - Optimizing without profiling- Using global variables unnecessarily- Not using appropriate data structures- Creating unnecessary copies of data- Not using connection pooling for databases- Ignoring algorithmic complexity- Over-optimizing rare code paths- Not considering memory usageAccessibility Compliance
This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns
If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration
Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app