Every ML model, regardless of how it was trained or what framework built it, eventually does the same thing: it takes input and produces output. In Python, that operation is model.predict(). It looks simple. It is simple—until it isn't.
The same method name appears across scikit-learn, Keras, TensorFlow, PyTorch, XGBoost, LightGBM, and most other ML frameworks. But "predict" means slightly different things in each. The return shapes differ. The input expectations differ. The performance characteristics differ. And the ways it can fail differ too.
This guide covers what predict() actually does, how it behaves across the major frameworks, and the practical issues you'll hit when running it in production.
What model.predict() Actually Does
predict() runs the model in inference mode. It passes your input data through the forward pass and returns the model's predictions. Unlike fit() or train(), it does not update any weights. It is a pure computation.
At a high level:
- The input is preprocessed and formatted to match what the model expects
- The model runs its forward pass
- Raw outputs (logits, probabilities, regression values) are returned
The critical thing to understand: predict() does not apply your final activation function in some frameworks, and it does in others. This is where most confusion starts.
scikit-learn: The Baseline
scikit-learn has the most consistent and predictable predict() behavior. It is the reference implementation that most other frameworks loosely follow.
Classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(predictions.shape) # (200,)
print(predictions[:10]) # array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0])
predict() returns a NumPy array of class labels. For binary classification, these are 0 or 1. For multiclass, these are integer indices.
Getting Probabilities in sklearn
If you want probabilities, you need predict_proba():
probabilities = model.predict_proba(X_test)
print(probabilities.shape) # (200, 2) — two classes
print(probabilities[:3])
# [[0.85, 0.15],
# [0.12, 0.88],
# [0.73, 0.27]]
Note that predict_proba() returns the probability for each class. The order matches model.classes_. If you need just the positive class probability in binary classification, use predict_proba(X_test)[:, 1].
Regression
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(predictions.shape) # (200,)
print(predictions[:5]) # array([2.34, -0.87, 1.56, 3.21, 0.12])
For regressors, predict() returns floating-point values directly. No separate method for raw scores vs. final output—the returned value is always the final prediction.
sklearn Summary
| Model Type | Return Shape | Return Type |
|---|---|---|
| Classifier | (n_samples,) |
Integer labels |
| Regressor | (n_samples,) |
Float values |
predict_proba() |
(n_samples, n_classes) |
Float probabilities |
Keras / TensorFlow: Classification Requires Sigmoid or Softmax
Keras is where most developers hit their first predict() surprise. The predict() method returns logits for classification models by default, not probabilities.
Binary Classification
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Binary classification model
model = Sequential([
Dense(64, activation='relu', input_shape=(20,)),
Dense(1, activation='sigmoid') # sigmoid on output layer
])
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train, y_train, epochs=10, verbose=0)
# Raw predictions — still logits because of how Keras works
# Actually, sigmoid is applied, so you get probabilities
predictions = model.predict(X_test, verbose=0)
print(predictions.shape) # (n_samples, 1)
print(predictions[:5].flatten())
# [0.87, 0.12, 0.65, 0.91, 0.34]
Here's the gotcha: if you build a binary classification model without an activation function on the final layer (i.e., you plan to apply sigmoid manually), then predict() returns raw logits. If you use activation='sigmoid' on the final layer, predict() returns probabilities.
# Without sigmoid on final layer — returns logits
model_logits = Sequential([
Dense(64, activation='relu', input_shape=(20,)),
Dense(1) # linear output — raw logits
])
model_logits.compile(optimizer='adam', loss='binary_crossentropy')
model_logits.fit(X_train, y_train, epochs=10, verbose=0)
raw_output = model_logits.predict(X_test, verbose=0)
# raw_output is logits, not probabilities
# Apply sigmoid manually to convert: 1 / (1 + np.exp(-raw_output))
Multiclass Classification
from tensorflow.keras.utils import to_categorical
# One-hot encode labels for multiclass
y_train_cat = to_categorical(y_train, num_classes=3)
y_test_cat = to_categorical(y_test, num_classes=3)
model = Sequential([
Dense(64, activation='relu', input_shape=(20,)),
Dense(3, activation='softmax') # softmax for multiclass
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_train, y_train_cat, epochs=10, verbose=0)
predictions = model.predict(X_test, verbose=0)
print(predictions.shape) # (n_samples, 3)
print(predictions[:3])
# [[0.05, 0.12, 0.83],
# [0.71, 0.22, 0.07],
# [0.33, 0.45, 0.22]]
With softmax on the final layer, predict() returns probabilities that sum to 1.0 across each row.
Using predict() with Models Without Output Activation
If you are doing custom training loops or using logits directly, you need to know how to handle raw outputs:
# Raw logits from a model without softmax
logits = model_logits.predict(X_test, verbose=0)
# Convert to probabilities
probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
# Or simply:
from scipy.special import softmax
probabilities = softmax(logits, axis=1)
predict() vs predict_on_batch()
Keras predict() is designed to handle large datasets by processing in batches internally. For small datasets, this overhead can actually slow things down. Use predict_on_batch() when you know your input size and want to avoid the batch-scheduling overhead:
# Standard predict — handles batching internally
predictions = model.predict(X_test, batch_size=32, verbose=1)
# Manual batch processing for small data
for i in range(0, len(X_test), 32):
batch = X_test[i:i+32]
batch_preds = model.predict_on_batch(batch)
predict() with verbose=1 shows a progress bar, which is useful for large datasets. predict_on_batch() has no progress output—it is a direct computation call.
PyTorch: No Built-In predict() Method
PyTorch does not have a model.predict() method. This trips up developers coming from sklearn or Keras.
Instead, you put the model in evaluation mode and call the model directly:
import torch
import torch.nn as nn
class SimpleClassifier(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, 64)
self.fc2 = nn.Linear(64, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.sigmoid(self.fc2(x))
return x
model = SimpleClassifier(input_dim=20)
model.eval() # Critical: set to evaluation mode
# Inference
with torch.no_grad():
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
predictions = model(X_test_tensor)
print(predictions.shape) # (200, 1)
print(predictions[:5].numpy().flatten())
The eval() Mode Matters
Dropout layers, batch normalization, and other stochastic layers behave differently in training vs. evaluation. Always call model.eval() before inference:
model.eval() # Disables dropout, uses running stats for BatchNorm
with torch.no_grad(): # Disables gradient computation
predictions = model(X_test_tensor)
Common PyTorch Inference Patterns
# Batch inference
def predict_batch(model, X, batch_size=64):
model.eval()
predictions = []
with torch.no_grad():
for i in range(0, len(X), batch_size):
batch = torch.tensor(X[i:i+batch_size], dtype=torch.float32)
preds = model(batch)
predictions.append(preds.numpy())
return np.concatenate(predictions)
# CPU vs GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
with torch.no_grad():
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
predictions = model(X_test_tensor).cpu().numpy()
PyTorch with torch.compile (PyTorch 2.0+)
PyTorch 2.0 introduced torch.compile(), which JIT-compiles the model for faster inference:
model = SimpleClassifier(input_dim=20)
model.eval()
# Compile for ~20-30% speedup on inference
compiled_model = torch.compile(model)
with torch.no_grad():
predictions = compiled_model(X_test_tensor)
XGBoost and LightGBM: Native Gradient Boosting
XGBoost and LightGBM have their own predict() methods that behave similarly to sklearn but with important differences.
XGBoost
import xgboost as xgb
model = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)
# Default: returns class predictions
predictions = model.predict(X_test)
print(predictions.shape) # (200,)
# Probabilities
probabilities = model.predict_proba(X_test)
print(probabilities.shape) # (200, 2)
XGBoost with raw_score and pred_leaf
XGBoost exposes additional prediction types:
# Raw margin scores (before the global link function)
raw_scores = model.predict(X_test, output_margin=True)
# Leaf indices (useful for tree interpretation)
leaf_indices = model.predict(X_test, pred_leaf=True)
print(leaf_indices.shape) # (200, n_trees) — which leaf each tree puts the sample in
LightGBM
import lightgbm as lgb
model = lgb.LGBMClassifier(n_estimators=100)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
# Raw scores
raw_scores = model.predict(X_test, raw_score=True)
# Leaf indices
leaf_preds = model.predict(X_test, pred_leaf=True)
Common Pitfalls Across All Frameworks
1. Input Shape Mismatches
The single most common error. model.predict() almost always expects 2D input (n_samples, n_features), even if you are predicting a single sample.
# Wrong — 1D array
single_sample = X_test[0]
predictions = model.predict(single_sample) # Shape mismatch error
# Correct — 2D array
single_sample = X_test[0:1] # Shape (1, n_features)
predictions = model.predict(single_sample)
This is especially tricky because most frameworks can broadcast 1D to 2D in other contexts, but predict() is strict about shape.
2. Not Setting the Model to Evaluation Mode (PyTorch)
Dropout being active during inference will randomly zero out neurons, producing different outputs every call. Always:
model.eval() # Before inference
3. Forgetting That predict() Returns Indices, Not Probabilities (sklearn)
# Wrong assumption
if model.predict(X_test) > 0.5: # comparing array to scalar does element-wise comparison
...
# Correct — for binary classification
proba = model.predict_proba(X_test)[:, 1]
predictions = (proba > 0.5).astype(int)
4. Keras predict() Batching Overhead for Small Inputs
For small test sets, Keras predict() can be slower than expected due to internal batch scheduling:
# Slow for small data — batch scheduling overhead
predictions = model.predict(X_small, verbose=0)
# Faster for small data
predictions = model.predict_on_batch(X_small)
5. Ignoring the Dtype of Your Input
# If your training data was float32 but inference is float64
X_test_wrong = np.array(X_test, dtype=np.float64)
predictions = model.predict(X_test_wrong) # May work or may cast unexpectedly
# Ensure matching dtype
X_test_correct = np.array(X_test, dtype=np.float32)
predictions = model.predict(X_test_correct)
6. XGBoost/LightGBM Using Wrong Input Type After sklearn
sklearn models accept pandas DataFrames. XGBoost and LightGBM often work better with their native data structures for large datasets:
import xgboost as xgb
# DMatrix is XGBoost's native data structure — faster for large data
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)
model = xgb.train(params, dtrain, num_boost_round=100)
predictions = model.predict(dtest) # Note: different API — model is Booster, not Classifier
Batch Prediction Performance
When you need to predict on large datasets, how you batch matters:
def batch_predict(model, X, framework='sklearn', batch_size=1000):
n_samples = len(X)
predictions = []
for start in range(0, n_samples, batch_size):
end = min(start + batch_size, n_samples)
batch = X[start:end]
if framework == 'sklearn':
preds = model.predict(batch)
elif framework == 'keras':
preds = model.predict(batch, verbose=0)
elif framework == 'pytorch':
with torch.no_grad():
batch_tensor = torch.tensor(batch, dtype=torch.float32)
preds = model(batch_tensor).numpy()
predictions.append(preds)
return np.concatenate(predictions)
Key points:
- sklearn: internal batching is usually sufficient, pass the whole array
- Keras: batch_size parameter in predict() controls internal batching; set it based on your memory constraints
- PyTorch: manual batching gives you full control
What About predict_proba() and Other Variants?
Most frameworks provide variant methods:
| Method | Returns | Available In |
|---|---|---|
predict() |
Class labels (sklearn) or probabilities (Keras with activation) | All |
predict_proba() |
Class membership probabilities | sklearn, Keras (wraps predict), XGBoost, LightGBM |
predict_log_proba() |
Log probabilities | sklearn |
predict_on_batch() |
Same as predict, explicit batch | Keras |
predict_async() |
Async version | Some frameworks (e.g., TensorFlow.js) |
Use predict_proba() when you need the uncertainty of a prediction, not just the label. This is essential for:
- Threshold tuning (choosing your own classification threshold)
- Calibrated probabilities
- Ensemble methods that weight predictions by confidence
Putting It Together: A Framework-Agnostic predict() Wrapper
If you are working with multiple frameworks in the same codebase, a thin wrapper can smooth over the differences:
import numpy as np
def predict(model, X, framework='sklearn', proba=False):
X = np.asarray(X)
if X.ndim == 1:
X = X.reshape(1, -1) # Ensure 2D
if framework == 'sklearn':
if proba:
return model.predict_proba(X)
return model.predict(X)
elif framework == 'keras':
preds = model.predict(X, verbose=0)
if proba:
return preds
return (preds > 0.5).astype(int).flatten()
elif framework == 'pytorch':
model.eval()
with torch.no_grad():
X_tensor = torch.tensor(X, dtype=torch.float32)
preds = model(X_tensor).numpy()
if proba:
return preds
return (preds > 0.5).astype(int).flatten()
elif framework in ('xgboost', 'lightgbm'):
if proba:
return model.predict_proba(X)
return model.predict(X)
else:
raise ValueError(f"Unknown framework: {framework}")
The Core Principle
model.predict() is a framework-specific inference call that:
- Takes your preprocessed input data
- Runs the forward pass without updating weights
- Returns predictions in framework-specific format (labels, probabilities, or raw scores)
The surface similarity across frameworks masks important differences in return types, input shape requirements, and behavior in training vs. evaluation mode. Understanding these differences is what separates code that works in a notebook from code that works in production.