The foundation of modern semantic search is the Bi-Encoder. Unlike Cross-Encoders which process two texts together, a Bi-Encoder processes text in isolation. You pass a Wikipedia article through the model, and it outputs a vector. Months later, a user types a query; that query is passed through the same model to output a second vector. The system then calculates the cosine similarity between the two vectors. Because the document vectors can be pre-computed and stored offline in a database (like Pinecone or Milvus), Bi-Encoders can search through millions of documents in milliseconds. The trade-off is accuracy: because the query and document never 'see' each other during the neural network processing, deep contextual relationships are lost.
How It Works
- Offline Indexing: Millions of documents are individually passed through the Transformer model to generate dense vectors. These are stored in a vector index.
- Online Querying: A user query is passed through the same Transformer model to generate a single vector.
- Distance Calculation: The database performs Approximate Nearest Neighbor (ANN) search to find the document vectors with the highest cosine similarity to the query vector.
Common Use Cases
- The primary retrieval mechanism (Stage 1) in almost all RAG applications.
- Semantic deduplication of massive datasets.