scxpand#
scXpand: Pan-cancer detection of T-cell clonal expansion from single-cell RNA sequencing.
A framework for predicting T-cell clonal expansion from single-cell RNA sequencing data using multiple machine learning approaches including autoencoders, MLPs, LightGBM, and linear models.
- class scxpand.ModelType(*values)#
Enumeration of supported model types.
- AUTOENCODER = 'autoencoder'#
- LIGHTGBM = 'lightgbm'#
- LOGISTIC = 'logistic'#
- MLP = 'mlp'#
- SVM = 'svm'#
- scxpand.download_pretrained_model(model_name=None, model_url=None, cache_dir=None)#
Download a pre-trained model and return the path to the extracted model.
Uses Pooch for robust caching, automatic hash verification, and extraction. Pooch automatically computes SHA256 hashes on first download and verifies them on subsequent accesses for integrity checking. When a model is updated (different hash), Pooch automatically downloads the new version to a fresh cache directory, ensuring version updates work seamlessly. Supports both registry models and direct URLs, including DOI URLs.
By default, downloads to a
.scxpand_cachedirectory in the current working directory, making it easy for users to manage and clean up downloaded models.- Parameters:
model_name (
str|None(default:None)) – Name of pre-trained model from registry (alternative to model_url)model_url (
str|None(default:None)) – Direct URL to model file (alternative to model_name)downloads (Supports HTTP/HTTPS URLs for direct)
cache_dir (
Path|None(default:None)) – Custom cache directory (uses.scxpand_cachein current dir if None)
- Return type:
- Returns:
Path to the extracted model directory or file
- Raises:
ValueError – If neither model_name nor model_url is provided, or if both are provided
Examples
>>> # Registry model (downloads to ./.scxpand_cache/) >>> model_path = download_pretrained_model( ... model_name="pan_cancer_autoencoder" ... ) >>> >>> # Direct URL (downloads to ./.scxpand_cache/) >>> model_path = download_pretrained_model( ... model_url="https://your-platform.com/model.zip" ... ) >>> >>> # Custom cache directory >>> model_path = download_pretrained_model( ... model_url="https://figshare.com/ndownloader/files/model.zip", ... cache_dir=Path("/my/custom/cache"), ... )
- scxpand.get_pretrained_model_info(model_name)#
Get information about a pre-trained model.
- Parameters:
model_name (
str) – Name of the pre-trained model- Return type:
- Returns:
PretrainedModelInfo object containing model metadata
- Raises:
ValueError – If model_name is not found in registry
- scxpand.list_pretrained_models()#
List all available pre-trained models with their information.
- Return type:
- scxpand.run_inference(data_path=None, adata=None, model_path=None, model_name=None, model_url=None, save_path=None, batch_size=1024, num_workers=4, eval_row_inds=None)#
Main public API for running inference with scXpand models.
This is the primary entry point for running inference with any type of scXpand model. It automatically detects the model source and routes to the appropriate inference pipeline. Supports local models, registry models, and external models via URL. Metrics are automatically computed when ground truth labels are available in the data.
- Parameters:
data_path (
str|Path|None(default:None)) – Path to input data file (h5ad format). Alternative to adata.adata (
AnnData|None(default:None)) – In-memory AnnData object. Alternative to data_path.model_path (
str|Path|None(default:None)) – Path to local trained model directory (for local models).model_name (
str|None(default:None)) – Name of pre-trained model from registry (for registry models).model_url (
str|None(default:None)) – Direct URL to model ZIP file (for any external model).save_path (
str|Path|None(default:None)) – Directory to save prediction results (None to skip saving, just return results).batch_size (
int(default:1024)) – Batch size for inference.num_workers (
int(default:4)) – Number of workers for data loading.eval_row_inds (default:
None) – Specific cell indices to evaluate (None for all cells, only supported for local models).
- Return type:
- Returns:
Structured results containing predictions, metrics (if available), and model info.
- Raises:
ValueError – If model source is not specified or multiple sources are specified.
ValueError – If neither data_path nor adata is provided.
FileNotFoundError – If specified files do not exist.
Examples
>>> import scxpand >>> # Local model inference >>> results = scxpand.run_inference( ... data_path="my_data.h5ad", model_path="results/mlp" ... ) >>> print(f"Generated {len(results.predictions)} predictions") >>> if results.has_metrics: ... print(f"AUROC: {results.get_auroc():.3f}") >>> # Registry model inference >>> results = scxpand.run_inference( ... data_path="my_data.h5ad", model_name="pan_cancer_autoencoder" ... ) >>> if results.has_model_info: ... print(f"Model type: {results.model_info.model_type}") >>> # Direct URL inference (seamless model sharing!) >>> results = scxpand.run_inference( ... data_path="my_data.h5ad", ... model_url="https://your-platform.com/model.zip", ... ) >>> # In-memory inference with any model type (no saving) >>> import scanpy as sc >>> adata = sc.read_h5ad("my_data.h5ad") >>> results = scxpand.run_inference( ... adata=adata, model_name="pan_cancer_autoencoder", save_path=None ... ) >>> # Results are returned but not saved to disk
- scxpand.run_prediction_pipeline(model_path, model_type=None, adata=None, data_path=None, save_path=None, batch_size=1024, num_workers=0, eval_row_inds=None)#
Complete prediction pipeline from model loading to evaluation.
This is the main orchestration function that coordinates the entire prediction workflow. It follows the dependency inversion principle by depending on abstractions (interfaces) rather than concrete implementations.
- Parameters:
model_path (
str|Path) – Path to directory containing the trained model.model_type (
ModelType|str|None(default:None)) – Type of model to use for prediction. If None, automatically detected from model_type.txt file in model_path.adata (
AnnData|None(default:None)) – In-memory AnnData object (alternative to data_path).data_path (
str|Path|None(default:None)) – Path to data file (alternative to adata).save_path (
str|Path|None(default:None)) – Directory to save prediction results.batch_size (
int(default:1024)) – Batch size for inference.num_workers (
int(default:0)) – Number of workers for data loading.eval_row_inds (
ndarray|None(default:None)) – Specific cell indices to evaluate (None for all).
- Return type:
- Returns:
Structured results containing predictions and metrics (if available).
- Raises:
ValueError – If neither adata nor data_path is provided.
FileNotFoundError – If model_path doesn’t exist.
Modules
Data utilities for scXpand. |
|
Single entry point for all scXpand operations. |
|
Pre-trained model management for scXpand. |
|