Product Quantization (PQ) for Vector Compression

Storing 100 million dense vectors (like OpenAI's 1536-dimensional float32 arrays) requires hundreds of gigabytes of expensive RAM. Product Quantization (PQ) is the mathematical solution to this infrastructure bottleneck. Instead of storing the exact 1536-dimensional point in space, PQ splits the vector into smaller sub-vectors. It then runs a clustering algorithm (like k-means) on these sub-vectors to find a predefined set of 'centroids' (a codebook). Every vector is then replaced by the short ID of the centroid it is closest to. This reduces the memory footprint of a vector by up to 97%, allowing massive vector indexes to fit entirely in memory for millisecond retrieval times, at the cost of a slight drop in recall accuracy.

How It Works

Vector Splitting: A high-dimensional vector is divided into equal-sized sub-vectors.
Codebook Generation: The system trains a k-means clustering model on the sub-vectors to create a 'codebook' of representative points.
Quantization: The actual vector values are discarded and replaced with the integer ID of the nearest codebook entry.
Asymmetric Distance Computation (ADC): During a search, the uncompressed user query vector is compared against the compressed document vectors using a pre-calculated distance lookup table, ensuring rapid execution.

Common Use Cases

Scaling vector databases from millions to billions of documents without proportionally scaling RAM costs.
Deploying semantic search capabilities on edge devices or mobile phones.

How It Works

Common Use Cases

Related Terms