How Vector Index Tuning fits into a Paperclip company.

Vector Index Tuning drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md517 linesmarkdown
Expand
1---2name: vector-index-tuning3description: Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.4---5 6# Vector Index Tuning7 8Guide to optimizing vector indexes for production performance.9 10## When to Use This Skill11 12- Tuning HNSW parameters13- Implementing quantization14- Optimizing memory usage15- Reducing search latency16- Balancing recall vs speed17- Scaling to billions of vectors18 19## Core Concepts20 21### 1. Index Type Selection22 23```24Data Size           Recommended Index25────────────────────────────────────────26< 10K vectors  →    Flat (exact search)2710K - 1M       →    HNSW281M - 100M      →    HNSW + Quantization29> 100M         →    IVF + PQ or DiskANN30```31 32### 2. HNSW Parameters33 34| Parameter          | Default | Effect                                               |35| ------------------ | ------- | ---------------------------------------------------- |36| **M**              | 16      | Connections per node, ↑ = better recall, more memory |37| **efConstruction** | 100     | Build quality, ↑ = better index, slower build        |38| **efSearch**       | 50      | Search quality, ↑ = better recall, slower search     |39 40### 3. Quantization Types41 42```43Full Precision (FP32): 4 bytes × dimensions44Half Precision (FP16): 2 bytes × dimensions45INT8 Scalar:           1 byte × dimensions46Product Quantization:  ~32-64 bytes total47Binary:                dimensions/8 bytes48```49 50## Templates51 52### Template 1: HNSW Parameter Tuning53 54```python55import numpy as np56from typing import List, Tuple57import time58 59def benchmark_hnsw_parameters(60    vectors: np.ndarray,61    queries: np.ndarray,62    ground_truth: np.ndarray,63    m_values: List[int] = [8, 16, 32, 64],64    ef_construction_values: List[int] = [64, 128, 256],65    ef_search_values: List[int] = [32, 64, 128, 256]66) -> List[dict]:67    """Benchmark different HNSW configurations."""68    import hnswlib69 70    results = []71    dim = vectors.shape[1]72    n = vectors.shape[0]73 74    for m in m_values:75        for ef_construction in ef_construction_values:76            # Build index77            index = hnswlib.Index(space='cosine', dim=dim)78            index.init_index(max_elements=n, M=m, ef_construction=ef_construction)79 80            build_start = time.time()81            index.add_items(vectors)82            build_time = time.time() - build_start83 84            # Get memory usage85            memory_bytes = index.element_count * (86                dim * 4 +  # Vector storage87                m * 2 * 4  # Graph edges (approximate)88            )89 90            for ef_search in ef_search_values:91                index.set_ef(ef_search)92 93                # Measure search94                search_start = time.time()95                labels, distances = index.knn_query(queries, k=10)96                search_time = time.time() - search_start97 98                # Calculate recall99                recall = calculate_recall(labels, ground_truth, k=10)100 101                results.append({102                    "M": m,103                    "ef_construction": ef_construction,104                    "ef_search": ef_search,105                    "build_time_s": build_time,106                    "search_time_ms": search_time * 1000 / len(queries),107                    "recall@10": recall,108                    "memory_mb": memory_bytes / 1024 / 1024109                })110 111    return results112 113 114def calculate_recall(predictions: np.ndarray, ground_truth: np.ndarray, k: int) -> float:115    """Calculate recall@k."""116    correct = 0117    for pred, truth in zip(predictions, ground_truth):118        correct += len(set(pred[:k]) & set(truth[:k]))119    return correct / (len(predictions) * k)120 121 122def recommend_hnsw_params(123    num_vectors: int,124    target_recall: float = 0.95,125    max_latency_ms: float = 10,126    available_memory_gb: float = 8127) -> dict:128    """Recommend HNSW parameters based on requirements."""129 130    # Base recommendations131    if num_vectors < 100_000:132        m = 16133        ef_construction = 100134    elif num_vectors < 1_000_000:135        m = 32136        ef_construction = 200137    else:138        m = 48139        ef_construction = 256140 141    # Adjust ef_search based on recall target142    if target_recall >= 0.99:143        ef_search = 256144    elif target_recall >= 0.95:145        ef_search = 128146    else:147        ef_search = 64148 149    return {150        "M": m,151        "ef_construction": ef_construction,152        "ef_search": ef_search,153        "notes": f"Estimated for {num_vectors:,} vectors, {target_recall:.0%} recall"154    }155```156 157### Template 2: Quantization Strategies158 159```python160import numpy as np161from typing import Optional162 163class VectorQuantizer:164    """Quantization strategies for vector compression."""165 166    @staticmethod167    def scalar_quantize_int8(168        vectors: np.ndarray,169        min_val: Optional[float] = None,170        max_val: Optional[float] = None171    ) -> Tuple[np.ndarray, dict]:172        """Scalar quantization to INT8."""173        if min_val is None:174            min_val = vectors.min()175        if max_val is None:176            max_val = vectors.max()177 178        # Scale to 0-255 range179        scale = 255.0 / (max_val - min_val)180        quantized = np.clip(181            np.round((vectors - min_val) * scale),182            0, 255183        ).astype(np.uint8)184 185        params = {"min_val": min_val, "max_val": max_val, "scale": scale}186        return quantized, params187 188    @staticmethod189    def dequantize_int8(190        quantized: np.ndarray,191        params: dict192    ) -> np.ndarray:193        """Dequantize INT8 vectors."""194        return quantized.astype(np.float32) / params["scale"] + params["min_val"]195 196    @staticmethod197    def product_quantize(198        vectors: np.ndarray,199        n_subvectors: int = 8,200        n_centroids: int = 256201    ) -> Tuple[np.ndarray, dict]:202        """Product quantization for aggressive compression."""203        from sklearn.cluster import KMeans204 205        n, dim = vectors.shape206        assert dim % n_subvectors == 0207        subvector_dim = dim // n_subvectors208 209        codebooks = []210        codes = np.zeros((n, n_subvectors), dtype=np.uint8)211 212        for i in range(n_subvectors):213            start = i * subvector_dim214            end = (i + 1) * subvector_dim215            subvectors = vectors[:, start:end]216 217            kmeans = KMeans(n_clusters=n_centroids, random_state=42)218            codes[:, i] = kmeans.fit_predict(subvectors)219            codebooks.append(kmeans.cluster_centers_)220 221        params = {222            "codebooks": codebooks,223            "n_subvectors": n_subvectors,224            "subvector_dim": subvector_dim225        }226        return codes, params227 228    @staticmethod229    def binary_quantize(vectors: np.ndarray) -> np.ndarray:230        """Binary quantization (sign of each dimension)."""231        # Convert to binary: positive = 1, negative = 0232        binary = (vectors > 0).astype(np.uint8)233 234        # Pack bits into bytes235        n, dim = vectors.shape236        packed_dim = (dim + 7) // 8237 238        packed = np.zeros((n, packed_dim), dtype=np.uint8)239        for i in range(dim):240            byte_idx = i // 8241            bit_idx = i % 8242            packed[:, byte_idx] |= (binary[:, i] << bit_idx)243 244        return packed245 246 247def estimate_memory_usage(248    num_vectors: int,249    dimensions: int,250    quantization: str = "fp32",251    index_type: str = "hnsw",252    hnsw_m: int = 16253) -> dict:254    """Estimate memory usage for different configurations."""255 256    # Vector storage257    bytes_per_dimension = {258        "fp32": 4,259        "fp16": 2,260        "int8": 1,261        "pq": 0.05,  # Approximate262        "binary": 0.125263    }264 265    vector_bytes = num_vectors * dimensions * bytes_per_dimension[quantization]266 267    # Index overhead268    if index_type == "hnsw":269        # Each node has ~M*2 edges, each edge is 4 bytes (int32)270        index_bytes = num_vectors * hnsw_m * 2 * 4271    elif index_type == "ivf":272        # Inverted lists + centroids273        index_bytes = num_vectors * 8 + 65536 * dimensions * 4274    else:275        index_bytes = 0276 277    total_bytes = vector_bytes + index_bytes278 279    return {280        "vector_storage_mb": vector_bytes / 1024 / 1024,281        "index_overhead_mb": index_bytes / 1024 / 1024,282        "total_mb": total_bytes / 1024 / 1024,283        "total_gb": total_bytes / 1024 / 1024 / 1024284    }285```286 287### Template 3: Qdrant Index Configuration288 289```python290from qdrant_client import QdrantClient291from qdrant_client.http import models292 293def create_optimized_collection(294    client: QdrantClient,295    collection_name: str,296    vector_size: int,297    num_vectors: int,298    optimize_for: str = "balanced"  # "recall", "speed", "memory"299) -> None:300    """Create collection with optimized settings."""301 302    # HNSW configuration based on optimization target303    hnsw_configs = {304        "recall": models.HnswConfigDiff(m=32, ef_construct=256),305        "speed": models.HnswConfigDiff(m=16, ef_construct=64),306        "balanced": models.HnswConfigDiff(m=16, ef_construct=128),307        "memory": models.HnswConfigDiff(m=8, ef_construct=64)308    }309 310    # Quantization configuration311    quantization_configs = {312        "recall": None,  # No quantization for max recall313        "speed": models.ScalarQuantization(314            scalar=models.ScalarQuantizationConfig(315                type=models.ScalarType.INT8,316                quantile=0.99,317                always_ram=True318            )319        ),320        "balanced": models.ScalarQuantization(321            scalar=models.ScalarQuantizationConfig(322                type=models.ScalarType.INT8,323                quantile=0.99,324                always_ram=False325            )326        ),327        "memory": models.ProductQuantization(328            product=models.ProductQuantizationConfig(329                compression=models.CompressionRatio.X16,330                always_ram=False331            )332        )333    }334 335    # Optimizer configuration336    optimizer_configs = {337        "recall": models.OptimizersConfigDiff(338            indexing_threshold=10000,339            memmap_threshold=50000340        ),341        "speed": models.OptimizersConfigDiff(342            indexing_threshold=5000,343            memmap_threshold=20000344        ),345        "balanced": models.OptimizersConfigDiff(346            indexing_threshold=20000,347            memmap_threshold=50000348        ),349        "memory": models.OptimizersConfigDiff(350            indexing_threshold=50000,351            memmap_threshold=10000  # Use disk sooner352        )353    }354 355    client.create_collection(356        collection_name=collection_name,357        vectors_config=models.VectorParams(358            size=vector_size,359            distance=models.Distance.COSINE360        ),361        hnsw_config=hnsw_configs[optimize_for],362        quantization_config=quantization_configs[optimize_for],363        optimizers_config=optimizer_configs[optimize_for]364    )365 366 367def tune_search_parameters(368    client: QdrantClient,369    collection_name: str,370    target_recall: float = 0.95371) -> dict:372    """Tune search parameters for target recall."""373 374    # Search parameter recommendations375    if target_recall >= 0.99:376        search_params = models.SearchParams(377            hnsw_ef=256,378            exact=False,379            quantization=models.QuantizationSearchParams(380                ignore=True,  # Don't use quantization for search381                rescore=True382            )383        )384    elif target_recall >= 0.95:385        search_params = models.SearchParams(386            hnsw_ef=128,387            exact=False,388            quantization=models.QuantizationSearchParams(389                ignore=False,390                rescore=True,391                oversampling=2.0392            )393        )394    else:395        search_params = models.SearchParams(396            hnsw_ef=64,397            exact=False,398            quantization=models.QuantizationSearchParams(399                ignore=False,400                rescore=False401            )402        )403 404    return search_params405```406 407### Template 4: Performance Monitoring408 409```python410import time411from dataclasses import dataclass412from typing import List413import numpy as np414 415@dataclass416class SearchMetrics:417    latency_p50_ms: float418    latency_p95_ms: float419    latency_p99_ms: float420    recall: float421    qps: float422 423 424class VectorSearchMonitor:425    """Monitor vector search performance."""426 427    def __init__(self, ground_truth_fn=None):428        self.latencies = []429        self.recalls = []430        self.ground_truth_fn = ground_truth_fn431 432    def measure_search(433        self,434        search_fn,435        query_vectors: np.ndarray,436        k: int = 10,437        num_iterations: int = 100438    ) -> SearchMetrics:439        """Benchmark search performance."""440        latencies = []441 442        for _ in range(num_iterations):443            for query in query_vectors:444                start = time.perf_counter()445                results = search_fn(query, k=k)446                latency = (time.perf_counter() - start) * 1000447                latencies.append(latency)448 449        latencies = np.array(latencies)450        total_queries = num_iterations * len(query_vectors)451        total_time = sum(latencies) / 1000  # seconds452 453        return SearchMetrics(454            latency_p50_ms=np.percentile(latencies, 50),455            latency_p95_ms=np.percentile(latencies, 95),456            latency_p99_ms=np.percentile(latencies, 99),457            recall=self._calculate_recall(search_fn, query_vectors, k) if self.ground_truth_fn else 0,458            qps=total_queries / total_time459        )460 461    def _calculate_recall(self, search_fn, queries: np.ndarray, k: int) -> float:462        """Calculate recall against ground truth."""463        if not self.ground_truth_fn:464            return 0465 466        correct = 0467        total = 0468 469        for query in queries:470            predicted = set(search_fn(query, k=k))471            actual = set(self.ground_truth_fn(query, k=k))472            correct += len(predicted & actual)473            total += k474 475        return correct / total476 477 478def profile_index_build(479    build_fn,480    vectors: np.ndarray,481    batch_sizes: List[int] = [1000, 10000, 50000]482) -> dict:483    """Profile index build performance."""484    results = {}485 486    for batch_size in batch_sizes:487        times = []488        for i in range(0, len(vectors), batch_size):489            batch = vectors[i:i + batch_size]490            start = time.perf_counter()491            build_fn(batch)492            times.append(time.perf_counter() - start)493 494        results[batch_size] = {495            "avg_batch_time_s": np.mean(times),496            "vectors_per_second": batch_size / np.mean(times)497        }498 499    return results500```501 502## Best Practices503 504### Do's505 506- **Benchmark with real queries** - Synthetic may not represent production507- **Monitor recall continuously** - Can degrade with data drift508- **Start with defaults** - Tune only when needed509- **Use quantization** - Significant memory savings510- **Consider tiered storage** - Hot/cold data separation511 512### Don'ts513 514- **Don't over-optimize early** - Profile first515- **Don't ignore build time** - Index updates have cost516- **Don't forget reindexing** - Plan for maintenance517- **Don't skip warming** - Cold indexes are slow
Related skills
Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app