Storing 100 million dense vectors (like OpenAI's 1536-dimensional float32 arrays) requires hundreds of gigabytes of expensive RAM. Product Quantization (PQ) is the mathematical solution to this infrastructure bottleneck. Instead of storing the exact 1536-dimensional point in space, PQ splits the vector into smaller sub-vectors. It then runs a clustering algorithm (like k-means) on these sub-vectors to find a predefined set of 'centroids' (a codebook). Every vector is then replaced by the short ID of the centroid it is closest to. This reduces the memory footprint of a vector by up to 97%, allowing massive vector indexes to fit entirely in memory for millisecond retrieval times, at the cost of a slight drop in recall accuracy.
How It Works
- Vector Splitting: A high-dimensional vector is divided into equal-sized sub-vectors.
- Codebook Generation: The system trains a k-means clustering model on the sub-vectors to create a 'codebook' of representative points.
- Quantization: The actual vector values are discarded and replaced with the integer ID of the nearest codebook entry.
- Asymmetric Distance Computation (ADC): During a search, the uncompressed user query vector is compared against the compressed document vectors using a pre-calculated distance lookup table, ensuring rapid execution.
Common Use Cases
- Scaling vector databases from millions to billions of documents without proportionally scaling RAM costs.
- Deploying semantic search capabilities on edge devices or mobile phones.