Every ML model, regardless of how it was trained or what framework built it, eventually does the same thing: it takes input and produces output. In Python, that operation is model.predict(). It looks simple. It is simple—until it isn't.

The same method name appears across scikit-learn, Keras, TensorFlow, PyTorch, XGBoost, LightGBM, and most other ML frameworks. But "predict" means slightly different things in each. The return shapes differ. The input expectations differ. The performance characteristics differ. And the ways it can fail differ too.

This guide covers what predict() actually does, how it behaves across the major frameworks, and the practical issues you'll hit when running it in production.

What model.predict() Actually Does

predict() runs the model in inference mode. It passes your input data through the forward pass and returns the model's predictions. Unlike fit() or train(), it does not update any weights. It is a pure computation.

At a high level:

  1. The input is preprocessed and formatted to match what the model expects
  2. The model runs its forward pass
  3. Raw outputs (logits, probabilities, regression values) are returned

The critical thing to understand: predict() does not apply your final activation function in some frameworks, and it does in others. This is where most confusion starts.

scikit-learn: The Baseline

scikit-learn has the most consistent and predictable predict() behavior. It is the reference implementation that most other frameworks loosely follow.

Classification

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions.shape)  # (200,)
print(predictions[:10])  # array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0])

predict() returns a NumPy array of class labels. For binary classification, these are 0 or 1. For multiclass, these are integer indices.

Getting Probabilities in sklearn

If you want probabilities, you need predict_proba():

probabilities = model.predict_proba(X_test)
print(probabilities.shape)  # (200, 2) — two classes
print(probabilities[:3])
# [[0.85, 0.15],
#  [0.12, 0.88],
#  [0.73, 0.27]]

Note that predict_proba() returns the probability for each class. The order matches model.classes_. If you need just the positive class probability in binary classification, use predict_proba(X_test)[:, 1].

Regression

from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions.shape)  # (200,)
print(predictions[:5])  # array([2.34, -0.87, 1.56, 3.21, 0.12])

For regressors, predict() returns floating-point values directly. No separate method for raw scores vs. final output—the returned value is always the final prediction.

sklearn Summary

Model Type Return Shape Return Type
Classifier (n_samples,) Integer labels
Regressor (n_samples,) Float values
predict_proba() (n_samples, n_classes) Float probabilities

Keras / TensorFlow: Classification Requires Sigmoid or Softmax

Keras is where most developers hit their first predict() surprise. The predict() method returns logits for classification models by default, not probabilities.

Binary Classification

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Binary classification model
model = Sequential([
    Dense(64, activation='relu', input_shape=(20,)),
    Dense(1, activation='sigmoid')  # sigmoid on output layer
])

model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train, y_train, epochs=10, verbose=0)

# Raw predictions — still logits because of how Keras works
# Actually, sigmoid is applied, so you get probabilities
predictions = model.predict(X_test, verbose=0)
print(predictions.shape)  # (n_samples, 1)
print(predictions[:5].flatten())
# [0.87, 0.12, 0.65, 0.91, 0.34]

Here's the gotcha: if you build a binary classification model without an activation function on the final layer (i.e., you plan to apply sigmoid manually), then predict() returns raw logits. If you use activation='sigmoid' on the final layer, predict() returns probabilities.

# Without sigmoid on final layer — returns logits
model_logits = Sequential([
    Dense(64, activation='relu', input_shape=(20,)),
    Dense(1)  # linear output — raw logits
])
model_logits.compile(optimizer='adam', loss='binary_crossentropy')
model_logits.fit(X_train, y_train, epochs=10, verbose=0)

raw_output = model_logits.predict(X_test, verbose=0)
# raw_output is logits, not probabilities
# Apply sigmoid manually to convert: 1 / (1 + np.exp(-raw_output))

Multiclass Classification

from tensorflow.keras.utils import to_categorical

# One-hot encode labels for multiclass
y_train_cat = to_categorical(y_train, num_classes=3)
y_test_cat = to_categorical(y_test, num_classes=3)

model = Sequential([
    Dense(64, activation='relu', input_shape=(20,)),
    Dense(3, activation='softmax')  # softmax for multiclass
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_train, y_train_cat, epochs=10, verbose=0)

predictions = model.predict(X_test, verbose=0)
print(predictions.shape)  # (n_samples, 3)
print(predictions[:3])
# [[0.05, 0.12, 0.83],
#  [0.71, 0.22, 0.07],
#  [0.33, 0.45, 0.22]]

With softmax on the final layer, predict() returns probabilities that sum to 1.0 across each row.

Using predict() with Models Without Output Activation

If you are doing custom training loops or using logits directly, you need to know how to handle raw outputs:

# Raw logits from a model without softmax
logits = model_logits.predict(X_test, verbose=0)

# Convert to probabilities
probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
# Or simply:
from scipy.special import softmax
probabilities = softmax(logits, axis=1)

predict() vs predict_on_batch()

Keras predict() is designed to handle large datasets by processing in batches internally. For small datasets, this overhead can actually slow things down. Use predict_on_batch() when you know your input size and want to avoid the batch-scheduling overhead:

# Standard predict — handles batching internally
predictions = model.predict(X_test, batch_size=32, verbose=1)

# Manual batch processing for small data
for i in range(0, len(X_test), 32):
    batch = X_test[i:i+32]
    batch_preds = model.predict_on_batch(batch)

predict() with verbose=1 shows a progress bar, which is useful for large datasets. predict_on_batch() has no progress output—it is a direct computation call.

PyTorch: No Built-In predict() Method

PyTorch does not have a model.predict() method. This trips up developers coming from sklearn or Keras.

Instead, you put the model in evaluation mode and call the model directly:

import torch
import torch.nn as nn

class SimpleClassifier(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

model = SimpleClassifier(input_dim=20)
model.eval()  # Critical: set to evaluation mode

# Inference
with torch.no_grad():
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    predictions = model(X_test_tensor)

print(predictions.shape)  # (200, 1)
print(predictions[:5].numpy().flatten())

The eval() Mode Matters

Dropout layers, batch normalization, and other stochastic layers behave differently in training vs. evaluation. Always call model.eval() before inference:

model.eval()  # Disables dropout, uses running stats for BatchNorm

with torch.no_grad():  # Disables gradient computation
    predictions = model(X_test_tensor)

Common PyTorch Inference Patterns

# Batch inference
def predict_batch(model, X, batch_size=64):
    model.eval()
    predictions = []
    with torch.no_grad():
        for i in range(0, len(X), batch_size):
            batch = torch.tensor(X[i:i+batch_size], dtype=torch.float32)
            preds = model(batch)
            predictions.append(preds.numpy())
    return np.concatenate(predictions)

# CPU vs GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

with torch.no_grad():
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
    predictions = model(X_test_tensor).cpu().numpy()

PyTorch with torch.compile (PyTorch 2.0+)

PyTorch 2.0 introduced torch.compile(), which JIT-compiles the model for faster inference:

model = SimpleClassifier(input_dim=20)
model.eval()

# Compile for ~20-30% speedup on inference
compiled_model = torch.compile(model)

with torch.no_grad():
    predictions = compiled_model(X_test_tensor)

XGBoost and LightGBM: Native Gradient Boosting

XGBoost and LightGBM have their own predict() methods that behave similarly to sklearn but with important differences.

XGBoost

import xgboost as xgb

model = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Default: returns class predictions
predictions = model.predict(X_test)
print(predictions.shape)  # (200,)

# Probabilities
probabilities = model.predict_proba(X_test)
print(probabilities.shape)  # (200, 2)

XGBoost with raw_score and pred_leaf

XGBoost exposes additional prediction types:

# Raw margin scores (before the global link function)
raw_scores = model.predict(X_test, output_margin=True)

# Leaf indices (useful for tree interpretation)
leaf_indices = model.predict(X_test, pred_leaf=True)
print(leaf_indices.shape)  # (200, n_trees) — which leaf each tree puts the sample in

LightGBM

import lightgbm as lgb

model = lgb.LGBMClassifier(n_estimators=100)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

# Raw scores
raw_scores = model.predict(X_test, raw_score=True)

# Leaf indices
leaf_preds = model.predict(X_test, pred_leaf=True)

Common Pitfalls Across All Frameworks

1. Input Shape Mismatches

The single most common error. model.predict() almost always expects 2D input (n_samples, n_features), even if you are predicting a single sample.

# Wrong — 1D array
single_sample = X_test[0]
predictions = model.predict(single_sample)  # Shape mismatch error

# Correct — 2D array
single_sample = X_test[0:1]  # Shape (1, n_features)
predictions = model.predict(single_sample)

This is especially tricky because most frameworks can broadcast 1D to 2D in other contexts, but predict() is strict about shape.

2. Not Setting the Model to Evaluation Mode (PyTorch)

Dropout being active during inference will randomly zero out neurons, producing different outputs every call. Always:

model.eval()  # Before inference

3. Forgetting That predict() Returns Indices, Not Probabilities (sklearn)

# Wrong assumption
if model.predict(X_test) > 0.5:  # comparing array to scalar does element-wise comparison
    ...

# Correct — for binary classification
proba = model.predict_proba(X_test)[:, 1]
predictions = (proba > 0.5).astype(int)

4. Keras predict() Batching Overhead for Small Inputs

For small test sets, Keras predict() can be slower than expected due to internal batch scheduling:

# Slow for small data — batch scheduling overhead
predictions = model.predict(X_small, verbose=0)

# Faster for small data
predictions = model.predict_on_batch(X_small)

5. Ignoring the Dtype of Your Input

# If your training data was float32 but inference is float64
X_test_wrong = np.array(X_test, dtype=np.float64)
predictions = model.predict(X_test_wrong)  # May work or may cast unexpectedly

# Ensure matching dtype
X_test_correct = np.array(X_test, dtype=np.float32)
predictions = model.predict(X_test_correct)

6. XGBoost/LightGBM Using Wrong Input Type After sklearn

sklearn models accept pandas DataFrames. XGBoost and LightGBM often work better with their native data structures for large datasets:

import xgboost as xgb

# DMatrix is XGBoost's native data structure — faster for large data
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)

model = xgb.train(params, dtrain, num_boost_round=100)
predictions = model.predict(dtest)  # Note: different API — model is Booster, not Classifier

Batch Prediction Performance

When you need to predict on large datasets, how you batch matters:

def batch_predict(model, X, framework='sklearn', batch_size=1000):
    n_samples = len(X)
    predictions = []

    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)
        batch = X[start:end]

        if framework == 'sklearn':
            preds = model.predict(batch)
        elif framework == 'keras':
            preds = model.predict(batch, verbose=0)
        elif framework == 'pytorch':
            with torch.no_grad():
                batch_tensor = torch.tensor(batch, dtype=torch.float32)
                preds = model(batch_tensor).numpy()

        predictions.append(preds)

    return np.concatenate(predictions)

Key points: - sklearn: internal batching is usually sufficient, pass the whole array - Keras: batch_size parameter in predict() controls internal batching; set it based on your memory constraints - PyTorch: manual batching gives you full control

What About predict_proba() and Other Variants?

Most frameworks provide variant methods:

Method Returns Available In
predict() Class labels (sklearn) or probabilities (Keras with activation) All
predict_proba() Class membership probabilities sklearn, Keras (wraps predict), XGBoost, LightGBM
predict_log_proba() Log probabilities sklearn
predict_on_batch() Same as predict, explicit batch Keras
predict_async() Async version Some frameworks (e.g., TensorFlow.js)

Use predict_proba() when you need the uncertainty of a prediction, not just the label. This is essential for: - Threshold tuning (choosing your own classification threshold) - Calibrated probabilities - Ensemble methods that weight predictions by confidence

Putting It Together: A Framework-Agnostic predict() Wrapper

If you are working with multiple frameworks in the same codebase, a thin wrapper can smooth over the differences:

import numpy as np

def predict(model, X, framework='sklearn', proba=False):
    X = np.asarray(X)

    if X.ndim == 1:
        X = X.reshape(1, -1)  # Ensure 2D

    if framework == 'sklearn':
        if proba:
            return model.predict_proba(X)
        return model.predict(X)

    elif framework == 'keras':
        preds = model.predict(X, verbose=0)
        if proba:
            return preds
        return (preds > 0.5).astype(int).flatten()

    elif framework == 'pytorch':
        model.eval()
        with torch.no_grad():
            X_tensor = torch.tensor(X, dtype=torch.float32)
            preds = model(X_tensor).numpy()
        if proba:
            return preds
        return (preds > 0.5).astype(int).flatten()

    elif framework in ('xgboost', 'lightgbm'):
        if proba:
            return model.predict_proba(X)
        return model.predict(X)

    else:
        raise ValueError(f"Unknown framework: {framework}")

The Core Principle

model.predict() is a framework-specific inference call that:

  • Takes your preprocessed input data
  • Runs the forward pass without updating weights
  • Returns predictions in framework-specific format (labels, probabilities, or raw scores)

The surface similarity across frameworks masks important differences in return types, input shape requirements, and behavior in training vs. evaluation mode. Understanding these differences is what separates code that works in a notebook from code that works in production.