What is Test-Time Compute in AI Reasoning? | AI Memory & Agent Glossary

Historically, the intelligence of an AI model was determined entirely by 'Training Compute'—the amount of processing power used to train the base model. Test-Time Compute (TTC), popularized by models like OpenAI's o1, shifts this paradigm by scaling compute dynamically when a question is asked. By allowing the model to generate thousands of internal 'Chain of Thought' tokens, simulate different strategies, backtrack from dead ends, and verify its own logic *before* returning a response to the user, TTC dramatically improves performance on hard logic, math, and coding problems without requiring a larger base model.

How It Works

TTC is typically implemented through:

Hidden Chain of Thought: The model generates a long, internal monologue of reasoning steps that the end-user does not see.
Tree of Thoughts (ToT): The model branches out into multiple possible solution paths, evaluates the viability of each, and prunes the paths that lead to dead ends.
Dynamic Compute Allocation: Easy questions bypass TTC and return instantly, while hard questions trigger massive internal token generation.

Common Use Cases

Solving advanced mathematical proofs and competitive programming challenges.
Medical and legal diagnostics where step-by-step logical deduction is required.

How It Works

Common Use Cases

Related Terms