scxpand.main#

Single entry point for all scXpand operations.

This module provides the main command-line interface for scXpand, including training models, hyperparameter optimization, and running inference.

Available commands:
  • train: Train a single model

  • optimize: Run hyperparameter optimization for a specified model type

  • optimize-all: Run hyperparameter optimization for all supported model types

  • inference: Run inference with trained models

  • list-models: List available pre-trained models

See individual function docstrings for detailed usage examples.

Functions

inference(data_path[, model_path, ...])

Command-line interface for running inference with scXpand models.

main()

Main entry point for the scxpand CLI.

optimize(model_type[, data_path, n_trials, ...])

Run hyperparameter optimization for a specified model type.

optimize_all([data_path, n_trials, ...])

Run hyperparameter optimization for all supported model types or a specified subset.

train(model_type[, data_path, save_dir, ...])

Train a single model.

scxpand.main.inference(data_path, model_path=None, model_name=None, model_url=None, save_path=None, batch_size=1024, num_workers=4, eval_row_inds=None)#

Command-line interface for running inference with scXpand models.

This is a convenience wrapper around run_inference() for command-line usage. For programmatic usage, use scxpand.run_inference() directly.

Parameters:
  • data_path (str) – Path to input data file (h5ad format).

  • model_path (str | None (default: None)) – Path to directory containing the trained model (for local models).

  • model_name (str | None (default: None)) – Name of pre-trained model from registry (for pre-trained models).

  • model_url (str | None (default: None)) – Direct URL to model ZIP file (for any external model).

  • save_path (str | None (default: None)) – Directory to save prediction results.

  • batch_size (int (default: 1024)) – Batch size for inference.

  • num_workers (int (default: 4)) – Number of workers for data loading.

  • eval_row_inds (str | None (default: None)) – Path to file containing cell indices to evaluate (one per line), or None for all cells.

Examples

>>> # Local model inference
>>> python -m scxpand.main inference --data_path my_data.h5ad --model_path results/mlp
>>>
>>> # Registry model inference
>>> python -m scxpand.main inference --data_path my_data.h5ad --model_name pan_cancer_autoencoder
>>>
>>> # Direct URL inference (any external model)
>>> python -m scxpand.main inference --data_path my_data.h5ad --model_url "https://your-platform.com/model.zip"
Return type:

None

Returns:

None.

scxpand.main.main()#

Main entry point for the scxpand CLI.

Return type:

None

Returns:

None

scxpand.main.optimize(model_type, data_path='data/example_data.h5ad', n_trials=100, study_name=None, storage_path='results/optuna_studies', score_metric='harmonic_avg/AUROC', resume=True, seed_base=42, num_workers=4, config_path=None, fail_fast=False, **kwargs)#

Run hyperparameter optimization for a specified model type.

Parameters:
  • model_type (ModelType | str) – Type of model to optimize (autoencoder, mlp, lightgbm, logistic, svm).

  • data_path (str (default: 'data/example_data.h5ad')) – Path to the input data file (h5ad format).

  • n_trials (int (default: 100)) – Number of optimization trials to run.

  • study_name (str | None (default: None)) – Name of the optimization study (defaults to model_type).

  • storage_path (str (default: 'results/optuna_studies')) – Directory to store optimization results.

  • score_metric (str (default: 'harmonic_avg/AUROC')) – Metric to optimize (e.g., “harmonic_avg/AUROC”, “AUROC”, “AUPRC”).

  • resume (bool (default: True)) – Whether to resume from existing study (False = start fresh).

  • seed_base (int (default: 42)) – Base seed for reproducibility across trials.

  • num_workers (int (default: 4)) – Number of workers for parallel processing.

  • config_path (str | None (default: None)) – Path to configuration file for base parameters.

  • fail_fast (bool (default: False)) – Whether to fail immediately on any exception (for testing).

  • **kwargs (Any) – Additional parameters to override config.

Raises:
  • ValueError – If model_type is not supported for optimization.

  • FileNotFoundError – If data_path does not exist.

  • ValueError – If study already exists and resume=False (with instructions to delete manually).

Return type:

None

Returns:

None.

Examples

>>> # Single model optimization
>>> python -m scxpand.main optimize --model_type autoencoder --n_trials 100 --data_path data/example_data.h5ad
>>> python -m scxpand.main optimize --model_type mlp --n_trials 100 --data_path data/example_data.h5ad --n_epochs 10
scxpand.main.optimize_all(data_path='data/example_data.h5ad', n_trials=100, storage_path='results/optuna_studies', score_metric='harmonic_avg/AUROC', resume=True, num_workers=4, model_types=None, **kwargs)#

Run hyperparameter optimization for all supported model types or a specified subset.

Parameters:
  • data_path (str (default: 'data/example_data.h5ad')) – Path to the input data file (h5ad format).

  • n_trials (int (default: 100)) – Number of optimization trials per model type.

  • storage_path (str (default: 'results/optuna_studies')) – Directory to store optimization results.

  • score_metric (str (default: 'harmonic_avg/AUROC')) – Metric to optimize (e.g., “harmonic_avg/AUROC”, “AUROC”, “AUPRC”).

  • resume (bool (default: True)) – Whether to resume existing studies (False = start fresh for all models).

  • num_workers (int (default: 4)) – Number of workers for parallel processing.

  • model_types (list[ModelType] | None (default: None)) – List of model types to optimize in order. If None, optimizes all supported models. Supported types: [“autoencoder”, “mlp”, “lightgbm”, “logistic”, “svm”].

  • **kwargs (Any) – Additional parameters to override config for all models.

Return type:

None

Returns:

None.

Examples

>>> # Optimize all models (parallel processing)
>>> python -m scxpand.main optimize-all --n_trials 10 --data_path data/example_data.h5ad --num_workers 6
>>>
>>> # Optimize specific model types only
>>> python -m scxpand.main optimize-all --n_trials 100 --data_path data/example_data.h5ad --model_types mlp,autoencoder
scxpand.main.train(model_type, data_path='data/example_data.h5ad', save_dir=None, config_path=None, resume=False, num_workers=4, **kwargs)#

Train a single model.

Parameters:
  • model_type (ModelType | str) – Type of model to train (autoencoder, mlp, lightgbm, logistic, svm).

  • data_path (str (default: 'data/example_data.h5ad')) – Path to input data file.

  • save_dir (str | None (default: None)) – Directory to save results (if None, uses default for model type).

  • config_path (str | None (default: None)) – Path to configuration file.

  • resume (bool (default: False)) – Whether to resume from existing checkpoint.

  • num_workers (int (default: 4)) – Number of workers for data loading.

  • **kwargs (Any) – Additional parameters to override config.

Return type:

None

Returns:

None.

Examples

>>> # Autoencoder training
>>> python -m scxpand.main train --model_type autoencoder --data_path data/example_data.h5ad --n_epochs 100
>>>
>>> # MLP training
>>> python -m scxpand.main train --model_type mlp --data_path data/example_data.h5ad --n_epochs 50
>>>
>>> # LightGBM training (no epochs needed)
>>> python -m scxpand.main train --model_type lightgbm --data_path data/example_data.h5ad
>>>
>>> # Linear model training
>>> python -m scxpand.main train --model_type linear --data_path data/example_data.h5ad
>>>
>>> # SVM training with custom config
>>> python -m scxpand.main train --model_type svm --data_path data/example_data.h5ad --config_path config/svm_config.json