scxpand.main#
Single entry point for all scXpand operations.
This module provides the main command-line interface for scXpand, including training models, hyperparameter optimization, and running inference.
- Available commands:
train: Train a single model
optimize: Run hyperparameter optimization for a specified model type
optimize-all: Run hyperparameter optimization for all supported model types
inference: Run inference with trained models
list-models: List available pre-trained models
See individual function docstrings for detailed usage examples.
Functions
|
Command-line interface for running inference with scXpand models. |
|
Main entry point for the scxpand CLI. |
|
Run hyperparameter optimization for a specified model type. |
|
Run hyperparameter optimization for all supported model types or a specified subset. |
|
Train a single model. |
- scxpand.main.inference(data_path, model_path=None, model_name=None, model_url=None, save_path=None, batch_size=1024, num_workers=4, eval_row_inds=None)#
Command-line interface for running inference with scXpand models.
This is a convenience wrapper around run_inference() for command-line usage. For programmatic usage, use scxpand.run_inference() directly.
- Parameters:
data_path (
str) – Path to input data file (h5ad format).model_path (
str|None(default:None)) – Path to directory containing the trained model (for local models).model_name (
str|None(default:None)) – Name of pre-trained model from registry (for pre-trained models).model_url (
str|None(default:None)) – Direct URL to model ZIP file (for any external model).save_path (
str|None(default:None)) – Directory to save prediction results.batch_size (
int(default:1024)) – Batch size for inference.num_workers (
int(default:4)) – Number of workers for data loading.eval_row_inds (
str|None(default:None)) – Path to file containing cell indices to evaluate (one per line), or None for all cells.
Examples
>>> # Local model inference >>> python -m scxpand.main inference --data_path my_data.h5ad --model_path results/mlp >>> >>> # Registry model inference >>> python -m scxpand.main inference --data_path my_data.h5ad --model_name pan_cancer_autoencoder >>> >>> # Direct URL inference (any external model) >>> python -m scxpand.main inference --data_path my_data.h5ad --model_url "https://your-platform.com/model.zip"
- Return type:
- Returns:
None.
- scxpand.main.optimize(model_type, data_path='data/example_data.h5ad', n_trials=100, study_name=None, storage_path='results/optuna_studies', score_metric='harmonic_avg/AUROC', resume=True, seed_base=42, num_workers=4, config_path=None, fail_fast=False, **kwargs)#
Run hyperparameter optimization for a specified model type.
- Parameters:
model_type (
ModelType|str) – Type of model to optimize (autoencoder, mlp, lightgbm, logistic, svm).data_path (
str(default:'data/example_data.h5ad')) – Path to the input data file (h5ad format).n_trials (
int(default:100)) – Number of optimization trials to run.study_name (
str|None(default:None)) – Name of the optimization study (defaults to model_type).storage_path (
str(default:'results/optuna_studies')) – Directory to store optimization results.score_metric (
str(default:'harmonic_avg/AUROC')) – Metric to optimize (e.g., “harmonic_avg/AUROC”, “AUROC”, “AUPRC”).resume (
bool(default:True)) – Whether to resume from existing study (False = start fresh).seed_base (
int(default:42)) – Base seed for reproducibility across trials.num_workers (
int(default:4)) – Number of workers for parallel processing.config_path (
str|None(default:None)) – Path to configuration file for base parameters.fail_fast (
bool(default:False)) – Whether to fail immediately on any exception (for testing).**kwargs (
Any) – Additional parameters to override config.
- Raises:
ValueError – If model_type is not supported for optimization.
FileNotFoundError – If data_path does not exist.
ValueError – If study already exists and resume=False (with instructions to delete manually).
- Return type:
- Returns:
None.
Examples
>>> # Single model optimization >>> python -m scxpand.main optimize --model_type autoencoder --n_trials 100 --data_path data/example_data.h5ad >>> python -m scxpand.main optimize --model_type mlp --n_trials 100 --data_path data/example_data.h5ad --n_epochs 10
- scxpand.main.optimize_all(data_path='data/example_data.h5ad', n_trials=100, storage_path='results/optuna_studies', score_metric='harmonic_avg/AUROC', resume=True, num_workers=4, model_types=None, **kwargs)#
Run hyperparameter optimization for all supported model types or a specified subset.
- Parameters:
data_path (
str(default:'data/example_data.h5ad')) – Path to the input data file (h5ad format).n_trials (
int(default:100)) – Number of optimization trials per model type.storage_path (
str(default:'results/optuna_studies')) – Directory to store optimization results.score_metric (
str(default:'harmonic_avg/AUROC')) – Metric to optimize (e.g., “harmonic_avg/AUROC”, “AUROC”, “AUPRC”).resume (
bool(default:True)) – Whether to resume existing studies (False = start fresh for all models).num_workers (
int(default:4)) – Number of workers for parallel processing.model_types (
list[ModelType] |None(default:None)) – List of model types to optimize in order. If None, optimizes all supported models. Supported types: [“autoencoder”, “mlp”, “lightgbm”, “logistic”, “svm”].**kwargs (
Any) – Additional parameters to override config for all models.
- Return type:
- Returns:
None.
Examples
>>> # Optimize all models (parallel processing) >>> python -m scxpand.main optimize-all --n_trials 10 --data_path data/example_data.h5ad --num_workers 6 >>> >>> # Optimize specific model types only >>> python -m scxpand.main optimize-all --n_trials 100 --data_path data/example_data.h5ad --model_types mlp,autoencoder
- scxpand.main.train(model_type, data_path='data/example_data.h5ad', save_dir=None, config_path=None, resume=False, num_workers=4, **kwargs)#
Train a single model.
- Parameters:
model_type (
ModelType|str) – Type of model to train (autoencoder, mlp, lightgbm, logistic, svm).data_path (
str(default:'data/example_data.h5ad')) – Path to input data file.save_dir (
str|None(default:None)) – Directory to save results (if None, uses default for model type).config_path (
str|None(default:None)) – Path to configuration file.resume (
bool(default:False)) – Whether to resume from existing checkpoint.num_workers (
int(default:4)) – Number of workers for data loading.**kwargs (
Any) – Additional parameters to override config.
- Return type:
- Returns:
None.
Examples
>>> # Autoencoder training >>> python -m scxpand.main train --model_type autoencoder --data_path data/example_data.h5ad --n_epochs 100 >>> >>> # MLP training >>> python -m scxpand.main train --model_type mlp --data_path data/example_data.h5ad --n_epochs 50 >>> >>> # LightGBM training (no epochs needed) >>> python -m scxpand.main train --model_type lightgbm --data_path data/example_data.h5ad >>> >>> # Linear model training >>> python -m scxpand.main train --model_type linear --data_path data/example_data.h5ad >>> >>> # SVM training with custom config >>> python -m scxpand.main train --model_type svm --data_path data/example_data.h5ad --config_path config/svm_config.json