scxpand.pretrained.download_manager

scxpand.pretrained.download_manager#

Download manager for pre-trained models using Pooch.

This module handles downloading pre-trained models using the Pooch library, which provides robust caching, integrity checking, and progress tracking.

Functions

download_model(model_name[, cache_dir])

Download a pre-trained model by name from the registry.

download_pretrained_model([model_name, ...])

Download a pre-trained model and return the path to the extracted model.

scxpand.pretrained.download_manager.download_model(model_name, cache_dir=None)#

Download a pre-trained model by name from the registry.

This is a convenience function that wraps the download_pretrained_model functionality to make it easier to download models using just the model name.

Parameters:
  • model_name (str) – Name of the pre-trained model to download

  • cache_dir (Path | None (default: None)) – Custom cache directory (uses .scxpand_cache in current dir if None)

Return type:

Path

Returns:

Path to the downloaded model directory

Raises:

ValueError – If model_name is not found in registry

Examples

>>> # Download autoencoder model
>>> model_path = download_model("pan_cancer_autoencoder")
>>>
>>> # Download with custom cache directory
>>> model_path = download_model(
...     "pan_cancer_mlp", cache_dir=Path("/my/cache")
... )
scxpand.pretrained.download_manager.download_pretrained_model(model_name=None, model_url=None, cache_dir=None)#

Download a pre-trained model and return the path to the extracted model.

Uses Pooch for robust caching, automatic hash verification, and extraction. Pooch automatically computes SHA256 hashes on first download and verifies them on subsequent accesses for integrity checking. When a model is updated (different hash), Pooch automatically downloads the new version to a fresh cache directory, ensuring version updates work seamlessly. Supports both registry models and direct URLs, including DOI URLs.

By default, downloads to a .scxpand_cache directory in the current working directory, making it easy for users to manage and clean up downloaded models.

Parameters:
  • model_name (str | None (default: None)) – Name of pre-trained model from registry (alternative to model_url)

  • model_url (str | None (default: None)) – Direct URL to model file (alternative to model_name)

  • downloads (Supports HTTP/HTTPS URLs for direct)

  • cache_dir (Path | None (default: None)) – Custom cache directory (uses .scxpand_cache in current dir if None)

Return type:

Path

Returns:

Path to the extracted model directory or file

Raises:

ValueError – If neither model_name nor model_url is provided, or if both are provided

Examples

>>> # Registry model (downloads to ./.scxpand_cache/)
>>> model_path = download_pretrained_model(
...     model_name="pan_cancer_autoencoder"
... )
>>>
>>> # Direct URL (downloads to ./.scxpand_cache/)
>>> model_path = download_pretrained_model(
...     model_url="https://your-platform.com/model.zip"
... )
>>>
>>> # Custom cache directory
>>> model_path = download_pretrained_model(
...     model_url="https://figshare.com/ndownloader/files/model.zip",
...     cache_dir=Path("/my/custom/cache"),
... )