Model Inference#

Run inference on new data using trained models via CLI or programmatic API.

Programmatic API#

scXpand provides a unified inference API that supports local models, registry models, and direct URL models:

Unified Inference API#

Use the high-level scxpand.run_inference function for all model types:

import scxpand
import anndata as ad

# Local model inference (file-based)
results = scxpand.run_inference(
    model_path='results/autoencoder',
    data_path='test_data.h5ad',
    save_path='predictions/',
    batch_size=1024
)

# Registry model inference (pre-trained models)
results = scxpand.run_inference(
    model_name='pan_cancer_autoencoder',
    data_path='test_data.h5ad',
    save_path='predictions/',
    batch_size=1024
)

# Direct URL model inference (any external model)
results = scxpand.run_inference(
    model_url='https://your-platform.com/model.zip',
    data_path='test_data.h5ad',
    save_path='predictions/',
    batch_size=1024
)

# In-memory inference (any model type)
adata = ad.read_h5ad('test_data.h5ad')
results = scxpand.run_inference(
    model_path='results/autoencoder',
    adata=adata,  # In-memory AnnData object
    save_path='predictions/',
    batch_size=1024
)

# Inference without saving results (save_path=None)
results = scxpand.run_inference(
    model_name='pan_cancer_autoencoder',
    data_path='test_data.h5ad',
    save_path=None  # Results returned but not saved to disk
)

Note

Unified Inference API Features:

scxpand.run_inference provides a single function for all model types (local, registry, URL)
Automatic model type detection: model_type parameter is automatically detected
Multiple model sources: Local models (model_path), registry models (model_name), and direct URLs (model_url)
Flexible data handling: Both data_path (file-based) and adata (in-memory) are supported
Automatic device detection: Device is automatically detected for optimal performance
Automatic caching: Pre-trained models are cached automatically using Pooch
Seamless model sharing: Use any ZIP file URL for instant model sharing
Flexible saving: Set save_path=None to return results without saving to disk
All model types (autoencoder, mlp, lightgbm, logistic, svm) use the same unified API
Results include evaluation metrics (when ground truth is available) and optionally saved prediction files

Handling Data Format Mismatches#

scXpand automatically handles common data format mismatches between training and inference data:

Gene Set Flexibility:

Missing genes: Automatically filled with zeros
Extra genes: Ignored during inference, only training genes are used
Reordered genes: Automatically reordered to match training format
Mixed scenarios: Handles combination of missing, extra, and reordered genes

Data Format Consistency:

Test data is automatically transformed to match training data format
Identical preprocessing pipeline as training (row norm → log → z-score)
Works consistently across all model types

For detailed technical information about the data transformation pipeline, see Data Pipeline & Normalization.

Inference API Reference#

Run predictions on new data using trained models:

Main public inference API for scXpand models.

This module provides the primary public interface for running inference with any type of scXpand model (local, registry, or URL-based).

scxpand.core.inference.run_inference(data_path=None, adata=None, model_path=None, model_name=None, model_url=None, save_path=None, batch_size=1024, num_workers=4, eval_row_inds=None)#

Main public API for running inference with scXpand models.

This is the primary entry point for running inference with any type of scXpand model. It automatically detects the model source and routes to the appropriate inference pipeline. Supports local models, registry models, and external models via URL. Metrics are automatically computed when ground truth labels are available in the data.

Parameters:

data_path (str | Path | None (default: None)) – Path to input data file (h5ad format). Alternative to adata.
adata (AnnData | None (default: None)) – In-memory AnnData object. Alternative to data_path.
model_path (str | Path | None (default: None)) – Path to local trained model directory (for local models).
model_name (str | None (default: None)) – Name of pre-trained model from registry (for registry models).
model_url (str | None (default: None)) – Direct URL to model ZIP file (for any external model).
save_path (str | Path | None (default: None)) – Directory to save prediction results (None to skip saving, just return results).
batch_size (int (default: 1024)) – Batch size for inference.
num_workers (int (default: 4)) – Number of workers for data loading.
eval_row_inds (default: None) – Specific cell indices to evaluate (None for all cells, only supported for local models).

Return type:

InferenceResults

Returns:

Structured results containing predictions, metrics (if available), and model info.

Raises:

ValueError – If model source is not specified or multiple sources are specified.
ValueError – If neither data_path nor adata is provided.
FileNotFoundError – If specified files do not exist.

Examples

>>> import scxpand
>>> # Local model inference
>>> results = scxpand.run_inference(
...     data_path="my_data.h5ad", model_path="results/mlp"
... )
>>> print(f"Generated {len(results.predictions)} predictions")
>>> if results.has_metrics:
...     print(f"AUROC: {results.get_auroc():.3f}")
>>> # Registry model inference
>>> results = scxpand.run_inference(
...     data_path="my_data.h5ad", model_name="pan_cancer_autoencoder"
... )
>>> if results.has_model_info:
...     print(f"Model type: {results.model_info.model_type}")
>>> # Direct URL inference (seamless model sharing!)
>>> results = scxpand.run_inference(
...     data_path="my_data.h5ad",
...     model_url="https://your-platform.com/model.zip",
... )
>>> # In-memory inference with any model type (no saving)
>>> import scanpy as sc
>>> adata = sc.read_h5ad("my_data.h5ad")
>>> results = scxpand.run_inference(
...     adata=adata, model_name="pan_cancer_autoencoder", save_path=None
... )
>>> # Results are returned but not saved to disk

Command-line interface for inference:

scxpand.main.inference(data_path, model_path=None, model_name=None, model_url=None, save_path=None, batch_size=1024, num_workers=4, eval_row_inds=None)#

Command-line interface for running inference with scXpand models.

This is a convenience wrapper around run_inference() for command-line usage. For programmatic usage, use scxpand.run_inference() directly.

Parameters:

data_path (str) – Path to input data file (h5ad format).
model_path (str | None (default: None)) – Path to directory containing the trained model (for local models).
model_name (str | None (default: None)) – Name of pre-trained model from registry (for pre-trained models).
model_url (str | None (default: None)) – Direct URL to model ZIP file (for any external model).
save_path (str | None (default: None)) – Directory to save prediction results.
batch_size (int (default: 1024)) – Batch size for inference.
num_workers (int (default: 4)) – Number of workers for data loading.
eval_row_inds (str | None (default: None)) – Path to file containing cell indices to evaluate (one per line), or None for all cells.

Examples

>>> # Local model inference
>>> python -m scxpand.main inference --data_path my_data.h5ad --model_path results/mlp
>>>
>>> # Registry model inference
>>> python -m scxpand.main inference --data_path my_data.h5ad --model_name pan_cancer_autoencoder
>>>
>>> # Direct URL inference (any external model)
>>> python -m scxpand.main inference --data_path my_data.h5ad --model_url "https://your-platform.com/model.zip"

Return type:: None
Returns:: None.

For detailed CLI examples and usage, see the scxpand.main module documentation.

Pre-trained Models#

Download and use pre-trained models:

Download manager for pre-trained models using Pooch.

This module handles downloading pre-trained models using the Pooch library, which provides robust caching, integrity checking, and progress tracking.

scxpand.pretrained.download_manager.download_model(model_name, cache_dir=None)#

Download a pre-trained model by name from the registry.

This is a convenience function that wraps the download_pretrained_model functionality to make it easier to download models using just the model name.

Parameters:

model_name (str) – Name of the pre-trained model to download
cache_dir (Path | None (default: None)) – Custom cache directory (uses .scxpand_cache in current dir if None)

Return type:

Path

Returns:

Path to the downloaded model directory

Raises:

ValueError – If model_name is not found in registry

Examples

>>> # Download autoencoder model
>>> model_path = download_model("pan_cancer_autoencoder")
>>>
>>> # Download with custom cache directory
>>> model_path = download_model(
...     "pan_cancer_mlp", cache_dir=Path("/my/cache")
... )

scxpand.pretrained.download_manager.download_pretrained_model(model_name=None, model_url=None, cache_dir=None)#

Download a pre-trained model and return the path to the extracted model.

Uses Pooch for robust caching, automatic hash verification, and extraction. Pooch automatically computes SHA256 hashes on first download and verifies them on subsequent accesses for integrity checking. When a model is updated (different hash), Pooch automatically downloads the new version to a fresh cache directory, ensuring version updates work seamlessly. Supports both registry models and direct URLs, including DOI URLs.

By default, downloads to a .scxpand_cache directory in the current working directory, making it easy for users to manage and clean up downloaded models.

Parameters:

model_name (str | None (default: None)) – Name of pre-trained model from registry (alternative to model_url)
model_url (str | None (default: None)) – Direct URL to model file (alternative to model_name)
downloads (Supports HTTP/HTTPS URLs for direct)
cache_dir (Path | None (default: None)) – Custom cache directory (uses .scxpand_cache in current dir if None)

Return type:

Path

Returns:

Path to the extracted model directory or file

Raises:

ValueError – If neither model_name nor model_url is provided, or if both are provided

Examples

>>> # Registry model (downloads to ./.scxpand_cache/)
>>> model_path = download_pretrained_model(
...     model_name="pan_cancer_autoencoder"
... )
>>>
>>> # Direct URL (downloads to ./.scxpand_cache/)
>>> model_path = download_pretrained_model(
...     model_url="https://your-platform.com/model.zip"
... )
>>>
>>> # Custom cache directory
>>> model_path = download_pretrained_model(
...     model_url="https://figshare.com/ndownloader/files/model.zip",
...     cache_dir=Path("/my/custom/cache"),
... )

Utility functions for managing the model registry.

scxpand.util.model_registry.list_pretrained_models()#

List all available pre-trained models with their information.

Return type:: None

Model Inference

Contents

Model Inference#

Programmatic API#

Unified Inference API#

Handling Data Format Mismatches#

Inference API Reference#

Pre-trained Models#