Claude Agent Skill · by Wshobson

Sql Optimization Patterns

This walks you through the core SQL optimization techniques that actually move the needle on query performance. It covers reading EXPLAIN plans, building the ri

Install
Terminal · npx
$npx skills add https://github.com/wshobson/agents --skill sql-optimization-patterns
Works with Paperclip

How Sql Optimization Patterns fits into a Paperclip company.

Sql Optimization Patterns drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md499 lines
Expand
---name: sql-optimization-patternsdescription: Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.--- # SQL Optimization Patterns Transform slow database queries into lightning-fast operations through systematic optimization, proper indexing, and query plan analysis. ## When to Use This Skill - Debugging slow-running queries- Designing performant database schemas- Optimizing application response times- Reducing database load and costs- Improving scalability for growing datasets- Analyzing EXPLAIN query plans- Implementing efficient indexes- Resolving N+1 query problems ## Core Concepts ### 1. Query Execution Plans (EXPLAIN) Understanding EXPLAIN output is fundamental to optimization. **PostgreSQL EXPLAIN:** ```sql-- Basic explainEXPLAIN SELECT * FROM users WHERE email = 'user@example.com'; -- With actual execution statsEXPLAIN ANALYZESELECT * FROM users WHERE email = 'user@example.com'; -- Verbose output with more detailsEXPLAIN (ANALYZE, BUFFERS, VERBOSE)SELECT u.*, o.order_totalFROM users uJOIN orders o ON u.id = o.user_idWHERE u.created_at > NOW() - INTERVAL '30 days';``` **Key Metrics to Watch:** - **Seq Scan**: Full table scan (usually slow for large tables)- **Index Scan**: Using index (good)- **Index Only Scan**: Using index without touching table (best)- **Nested Loop**: Join method (okay for small datasets)- **Hash Join**: Join method (good for larger datasets)- **Merge Join**: Join method (good for sorted data)- **Cost**: Estimated query cost (lower is better)- **Rows**: Estimated rows returned- **Actual Time**: Real execution time ### 2. Index Strategies Indexes are the most powerful optimization tool. **Index Types:** - **B-Tree**: Default, good for equality and range queries- **Hash**: Only for equality (=) comparisons- **GIN**: Full-text search, array queries, JSONB- **GiST**: Geometric data, full-text search- **BRIN**: Block Range INdex for very large tables with correlation ```sql-- Standard B-Tree indexCREATE INDEX idx_users_email ON users(email); -- Composite index (order matters!)CREATE INDEX idx_orders_user_status ON orders(user_id, status); -- Partial index (index subset of rows)CREATE INDEX idx_active_users ON users(email)WHERE status = 'active'; -- Expression indexCREATE INDEX idx_users_lower_email ON users(LOWER(email)); -- Covering index (include additional columns)CREATE INDEX idx_users_email_covering ON users(email)INCLUDE (name, created_at); -- Full-text search indexCREATE INDEX idx_posts_search ON postsUSING GIN(to_tsvector('english', title || ' ' || body)); -- JSONB indexCREATE INDEX idx_metadata ON events USING GIN(metadata);``` ### 3. Query Optimization Patterns **Avoid SELECT \*:** ```sql-- Bad: Fetches unnecessary columnsSELECT * FROM users WHERE id = 123; -- Good: Fetch only what you needSELECT id, email, name FROM users WHERE id = 123;``` **Use WHERE Clause Efficiently:** ```sql-- Bad: Function prevents index usageSELECT * FROM users WHERE LOWER(email) = 'user@example.com'; -- Good: Create functional index or use exact matchCREATE INDEX idx_users_email_lower ON users(LOWER(email));-- Then:SELECT * FROM users WHERE LOWER(email) = 'user@example.com'; -- Or store normalized dataSELECT * FROM users WHERE email = 'user@example.com';``` **Optimize JOINs:** ```sql-- Bad: Cartesian product then filterSELECT u.name, o.totalFROM users u, orders oWHERE u.id = o.user_id AND u.created_at > '2024-01-01'; -- Good: Filter before joinSELECT u.name, o.totalFROM users uJOIN orders o ON u.id = o.user_idWHERE u.created_at > '2024-01-01'; -- Better: Filter both tablesSELECT u.name, o.totalFROM (SELECT * FROM users WHERE created_at > '2024-01-01') uJOIN orders o ON u.id = o.user_id;``` ## Optimization Patterns ### Pattern 1: Eliminate N+1 Queries **Problem: N+1 Query Anti-Pattern** ```python# Bad: Executes N+1 queriesusers = db.query("SELECT * FROM users LIMIT 10")for user in users:    orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)    # Process orders``` **Solution: Use JOINs or Batch Loading** ```sql-- Solution 1: JOINSELECT    u.id, u.name,    o.id as order_id, o.totalFROM users uLEFT JOIN orders o ON u.id = o.user_idWHERE u.id IN (1, 2, 3, 4, 5); -- Solution 2: Batch querySELECT * FROM ordersWHERE user_id IN (1, 2, 3, 4, 5);``` ```python# Good: Single query with JOIN or batch load# Using JOINresults = db.query("""    SELECT u.id, u.name, o.id as order_id, o.total    FROM users u    LEFT JOIN orders o ON u.id = o.user_id    WHERE u.id IN (1, 2, 3, 4, 5)""") # Or batch loadusers = db.query("SELECT * FROM users LIMIT 10")user_ids = [u.id for u in users]orders = db.query(    "SELECT * FROM orders WHERE user_id IN (?)",    user_ids)# Group orders by user_idorders_by_user = {}for order in orders:    orders_by_user.setdefault(order.user_id, []).append(order)``` ### Pattern 2: Optimize Pagination **Bad: OFFSET on Large Tables** ```sql-- Slow for large offsetsSELECT * FROM usersORDER BY created_at DESCLIMIT 20 OFFSET 100000;  -- Very slow!``` **Good: Cursor-Based Pagination** ```sql-- Much faster: Use cursor (last seen ID)SELECT * FROM usersWHERE created_at < '2024-01-15 10:30:00'  -- Last cursorORDER BY created_at DESCLIMIT 20; -- With composite sortingSELECT * FROM usersWHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)ORDER BY created_at DESC, id DESCLIMIT 20; -- Requires indexCREATE INDEX idx_users_cursor ON users(created_at DESC, id DESC);``` ### Pattern 3: Aggregate Efficiently **Optimize COUNT Queries:** ```sql-- Bad: Counts all rowsSELECT COUNT(*) FROM orders;  -- Slow on large tables -- Good: Use estimates for approximate countsSELECT reltuples::bigint AS estimateFROM pg_classWHERE relname = 'orders'; -- Good: Filter before countingSELECT COUNT(*) FROM ordersWHERE created_at > NOW() - INTERVAL '7 days'; -- Better: Use index-only scanCREATE INDEX idx_orders_created ON orders(created_at);SELECT COUNT(*) FROM ordersWHERE created_at > NOW() - INTERVAL '7 days';``` **Optimize GROUP BY:** ```sql-- Bad: Group by then filterSELECT user_id, COUNT(*) as order_countFROM ordersGROUP BY user_idHAVING COUNT(*) > 10; -- Better: Filter first, then group (if possible)SELECT user_id, COUNT(*) as order_countFROM ordersWHERE status = 'completed'GROUP BY user_idHAVING COUNT(*) > 10; -- Best: Use covering indexCREATE INDEX idx_orders_user_status ON orders(user_id, status);``` ### Pattern 4: Subquery Optimization **Transform Correlated Subqueries:** ```sql-- Bad: Correlated subquery (runs for each row)SELECT u.name, u.email,    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) as order_countFROM users u; -- Good: JOIN with aggregationSELECT u.name, u.email, COUNT(o.id) as order_countFROM users uLEFT JOIN orders o ON o.user_id = u.idGROUP BY u.id, u.name, u.email; -- Better: Use window functionsSELECT DISTINCT ON (u.id)    u.name, u.email,    COUNT(o.id) OVER (PARTITION BY u.id) as order_countFROM users uLEFT JOIN orders o ON o.user_id = u.id;``` **Use CTEs for Clarity:** ```sql-- Using Common Table ExpressionsWITH recent_users AS (    SELECT id, name, email    FROM users    WHERE created_at > NOW() - INTERVAL '30 days'),user_order_counts AS (    SELECT user_id, COUNT(*) as order_count    FROM orders    WHERE created_at > NOW() - INTERVAL '30 days'    GROUP BY user_id)SELECT ru.name, ru.email, COALESCE(uoc.order_count, 0) as ordersFROM recent_users ruLEFT JOIN user_order_counts uoc ON ru.id = uoc.user_id;``` ### Pattern 5: Batch Operations **Batch INSERT:** ```sql-- Bad: Multiple individual insertsINSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');INSERT INTO users (name, email) VALUES ('Carol', 'carol@example.com'); -- Good: Batch insertINSERT INTO users (name, email) VALUES    ('Alice', 'alice@example.com'),    ('Bob', 'bob@example.com'),    ('Carol', 'carol@example.com'); -- Better: Use COPY for bulk inserts (PostgreSQL)COPY users (name, email) FROM '/tmp/users.csv' CSV HEADER;``` **Batch UPDATE:** ```sql-- Bad: Update in loopUPDATE users SET status = 'active' WHERE id = 1;UPDATE users SET status = 'active' WHERE id = 2;-- ... repeat for many IDs -- Good: Single UPDATE with IN clauseUPDATE usersSET status = 'active'WHERE id IN (1, 2, 3, 4, 5, ...); -- Better: Use temporary table for large batchesCREATE TEMP TABLE temp_user_updates (id INT, new_status VARCHAR);INSERT INTO temp_user_updates VALUES (1, 'active'), (2, 'active'), ...; UPDATE users uSET status = t.new_statusFROM temp_user_updates tWHERE u.id = t.id;``` ## Advanced Techniques ### Materialized Views Pre-compute expensive queries. ```sql-- Create materialized viewCREATE MATERIALIZED VIEW user_order_summary ASSELECT    u.id,    u.name,    COUNT(o.id) as total_orders,    SUM(o.total) as total_spent,    MAX(o.created_at) as last_order_dateFROM users uLEFT JOIN orders o ON u.id = o.user_idGROUP BY u.id, u.name; -- Add index to materialized viewCREATE INDEX idx_user_summary_spent ON user_order_summary(total_spent DESC); -- Refresh materialized viewREFRESH MATERIALIZED VIEW user_order_summary; -- Concurrent refresh (PostgreSQL)REFRESH MATERIALIZED VIEW CONCURRENTLY user_order_summary; -- Query materialized view (very fast)SELECT * FROM user_order_summaryWHERE total_spent > 1000ORDER BY total_spent DESC;``` ### Partitioning Split large tables for better performance. ```sql-- Range partitioning by date (PostgreSQL)CREATE TABLE orders (    id SERIAL,    user_id INT,    total DECIMAL,    created_at TIMESTAMP) PARTITION BY RANGE (created_at); -- Create partitionsCREATE TABLE orders_2024_q1 PARTITION OF orders    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE orders_2024_q2 PARTITION OF orders    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); -- Queries automatically use appropriate partitionSELECT * FROM ordersWHERE created_at BETWEEN '2024-02-01' AND '2024-02-28';-- Only scans orders_2024_q1 partition``` ### Query Hints and Optimization ```sql-- Force index usage (MySQL)SELECT * FROM usersUSE INDEX (idx_users_email)WHERE email = 'user@example.com'; -- Parallel query (PostgreSQL)SET max_parallel_workers_per_gather = 4;SELECT * FROM large_table WHERE condition; -- Join hints (PostgreSQL)SET enable_nestloop = OFF;  -- Force hash or merge join``` ## Best Practices 1. **Index Selectively**: Too many indexes slow down writes2. **Monitor Query Performance**: Use slow query logs3. **Keep Statistics Updated**: Run ANALYZE regularly4. **Use Appropriate Data Types**: Smaller types = better performance5. **Normalize Thoughtfully**: Balance normalization vs performance6. **Cache Frequently Accessed Data**: Use application-level caching7. **Connection Pooling**: Reuse database connections8. **Regular Maintenance**: VACUUM, ANALYZE, rebuild indexes ```sql-- Update statisticsANALYZE users;ANALYZE VERBOSE orders; -- Vacuum (PostgreSQL)VACUUM ANALYZE users;VACUUM FULL users;  -- Reclaim space (locks table) -- ReindexREINDEX INDEX idx_users_email;REINDEX TABLE users;``` ## Common Pitfalls - **Over-Indexing**: Each index slows down INSERT/UPDATE/DELETE- **Unused Indexes**: Waste space and slow writes- **Missing Indexes**: Slow queries, full table scans- **Implicit Type Conversion**: Prevents index usage- **OR Conditions**: Can't use indexes efficiently- **LIKE with Leading Wildcard**: `LIKE '%abc'` can't use index- **Function in WHERE**: Prevents index usage unless functional index exists ## Monitoring Queries ```sql-- Find slow queries (PostgreSQL)SELECT query, calls, total_time, mean_timeFROM pg_stat_statementsORDER BY mean_time DESCLIMIT 10; -- Find missing indexes (PostgreSQL)SELECT    schemaname,    tablename,    seq_scan,    seq_tup_read,    idx_scan,    seq_tup_read / seq_scan AS avg_seq_tup_readFROM pg_stat_user_tablesWHERE seq_scan > 0ORDER BY seq_tup_read DESCLIMIT 10; -- Find unused indexes (PostgreSQL)SELECT    schemaname,    tablename,    indexname,    idx_scan,    idx_tup_read,    idx_tup_fetchFROM pg_stat_user_indexesWHERE idx_scan = 0ORDER BY pg_relation_size(indexrelid) DESC;```