scxpand.util.io#

File I/O operations for scXpand.

This module handles all file input/output operations including: - Saving predictions to CSV files - Loading evaluation indices - File path validation and creation - Multiprocessing-safe AnnData file operations with retry mechanisms

This module has no dependencies on model-specific code.

Functions

`close_adata_file_safely`(adata_obj)	Safely close AnnData file handle without raising exceptions.
`ensure_directory_exists`(path)	Ensure a directory exists, creating it if necessary.
`exponential_backoff_delay`(attempt[, ...])	Calculate exponential backoff delay for retry attempts with optional jitter.
`is_hdf5_error`(error)	Check if an error is HDF5-related and should be retried.
`load_eval_indices`(eval_row_inds_path)	Load evaluation indices from a file.
`open_adata_file_with_retry`(data_path)	Open AnnData file with retry mechanism.
`open_adata_multiprocessing_safe`(data_path[, ...])	Context manager for multiprocessing-safe AnnData file opening with retry mechanism.
`read_adata_slice_chunked`(adata_obj, indices)	Chunked reading - more robust for large slices.
`read_adata_slice_direct`(adata_obj, indices)	Direct slice reading - fastest when it works.
`read_adata_slice_sequential`(adata_obj, indices)	Sequential reading - slowest but most robust.
`retry_hdf5_operation`(operation[, ...])	Retry an operation that may fail due to HDF5 multiprocessing issues.
`safe_read_adata_slice`(adata_obj, indices)	Safely read a slice from AnnData with retry mechanism and fallback strategies.
`save_predictions_to_csv`(predictions, obs_df, ...)	Save predictions to a CSV file.

Classes

IOSettings()

Settings for I/O operations, following AnnData patterns.

class scxpand.util.io.IOSettings#

Settings for I/O operations, following AnnData patterns.

__init__()#

scxpand.util.io.close_adata_file_safely(adata_obj)#

Safely close AnnData file handle without raising exceptions.

Return type:: None

scxpand.util.io.ensure_directory_exists(path)#

Ensure a directory exists, creating it if necessary.

Parameters:: path (Path) – Directory path to create
Return type:: None

scxpand.util.io.exponential_backoff_delay(attempt, initial_delay=0.1, backoff_factor=2.0, max_delay=2.0, jitter=True)#

Calculate exponential backoff delay for retry attempts with optional jitter.

Parameters:

attempt (int) – Current attempt number (0-indexed)
initial_delay (float (default: 0.1)) – Base delay for first retry
backoff_factor (float (default: 2.0)) – Multiplier for exponential growth
max_delay (float (default: 2.0)) – Maximum delay cap
jitter (bool (default: True)) – If True, adds randomization to prevent thundering herd

Return type:

float

Returns:

Delay in seconds before next retry attempt

scxpand.util.io.is_hdf5_error(error)#

Check if an error is HDF5-related and should be retried.

Return type:: bool

scxpand.util.io.load_eval_indices(eval_row_inds_path)#

Load evaluation indices from a file.

Parameters:

eval_row_inds_path (str | Path) – Path to file containing cell indices (one per line)

Return type:

ndarray

Returns:

Array of evaluation indices

Raises:

FileNotFoundError – If the file doesn’t exist
ValueError – If the file contains invalid data

scxpand.util.io.open_adata_file_with_retry(data_path)#

Open AnnData file with retry mechanism.

Return type:: AnnData

scxpand.util.io.open_adata_multiprocessing_safe(data_path, adata=None, indices=None)#

Context manager for multiprocessing-safe AnnData file opening with retry mechanism.

This function ensures that each worker process opens its own file handle to avoid the common “OSError: Can’t synchronously read data” error when using PyTorch DataLoader with num_workers > 0. Includes robust retry mechanisms for HDF5-related errors.

Parameters:

data_path (str | Path | None) – Path to H5AD file. Required unless adata is provided.
adata (AnnData | None (default: None)) – In-memory AnnData object. Alternative to data_path.
indices (ndarray | None (default: None)) – Optional indices for the yielded data (passed through unchanged).

Yields:

Tuple of (AnnData object, indices) for batch access.

Example

>>> with open_adata_multiprocessing_safe("data.h5ad") as (adata, indices):
...     X_data = safe_read_adata_slice(adata, cell_indices)

Note

If adata is provided, it’s used directly (no file operations)
Each call opens a fresh file handle (safer for multiprocessing)
File handles are properly closed when the context exits
Includes retry mechanism for common HDF5 multiprocessing errors

scxpand.util.io.read_adata_slice_chunked(adata_obj, indices, chunk_size=None)#

Chunked reading - more robust for large slices.

Return type:: ndarray

scxpand.util.io.read_adata_slice_direct(adata_obj, indices)#

Direct slice reading - fastest when it works.

Return type:: ndarray

scxpand.util.io.read_adata_slice_sequential(adata_obj, indices)#

Sequential reading - slowest but most robust.

Return type:: ndarray

scxpand.util.io.retry_hdf5_operation(operation, max_retries=None, operation_name='operation')#

Retry an operation that may fail due to HDF5 multiprocessing issues.

Parameters:

operation (Callable[[], Any]) – Function to retry (should take no arguments)
max_retries (int | None (default: None)) – Maximum number of retry attempts (uses settings default if None)
operation_name (str (default: 'operation')) – Name of operation for logging

Return type:

Any

Returns:

Result of the operation if successful

Raises:

Exception – The last exception if all retries fail

scxpand.util.io.safe_read_adata_slice(adata_obj, indices)#

Safely read a slice from AnnData with retry mechanism and fallback strategies.

Parameters:

adata_obj – AnnData object to read from
indices (ndarray) – Row indices to read

Return type:

ndarray

Returns:

Dense numpy array with the requested data

Raises:

OSError – If all retry attempts and fallback strategies fail

scxpand.util.io.save_predictions_to_csv(predictions, obs_df, model_type, save_path)#

Save predictions to a CSV file.

Parameters:

predictions (ndarray) – Model predictions (probabilities)
obs_df (DataFrame) – DataFrame with cell metadata
model_type (ModelType | str) – Type of model used for predictions
save_path (Path | None) – Directory to save predictions (None to skip saving)

Raises:

ValueError – If predictions and obs_df have mismatched lengths

Return type:

None

scxpand.util.io

Contents

scxpand.util.io#