Vector similarity is terrible at exact matches. If a user asks, 'Find me marketing reports from 2024 about AI,' a pure vector search might return a highly relevant AI report from 2022 because '2024' is just a minor token. Self-Querying Retrieval solves this by placing an LLM in front of the database. The LLM reads the user's intent and translates it into a structured query. It strips out 'from 2024' to build a deterministic metadata filter (e.g., `WHERE year = 2024`) and uses the remaining phrase ('marketing reports about AI') to perform the semantic vector search. The results must pass both the hard filter and the semantic similarity threshold.
How It Works
- Schema Injection: The LLM is provided with the exact schema of the vector database's metadata (e.g., `year: int`, `department: string`).
- Parsing: The LLM analyzes the user prompt and extracts relevant constraints based on the schema.
- Query Construction: The system builds a composite database query utilizing both the structured filters and the dense vector representation of the semantic intent.
Common Use Cases
- E-commerce search where users mix semantic desires ('a comfortable shoe') with hard constraints ('under $100', 'size 10').
- Enterprise document retrieval requiring strict access control and date-range filtering.