scxpand.util.io#

File I/O operations for scXpand.

This module handles all file input/output operations including: - Saving predictions to CSV files - Loading evaluation indices - File path validation and creation - Multiprocessing-safe AnnData file operations with retry mechanisms

This module has no dependencies on model-specific code.

Functions

close_adata_file_safely(adata_obj)

Safely close AnnData file handle without raising exceptions.

ensure_directory_exists(path)

Ensure a directory exists, creating it if necessary.

exponential_backoff_delay(attempt[, ...])

Calculate exponential backoff delay for retry attempts with optional jitter.

is_hdf5_error(error)

Check if an error is HDF5-related and should be retried.

load_eval_indices(eval_row_inds_path)

Load evaluation indices from a file.

open_adata_file_with_retry(data_path)

Open AnnData file with retry mechanism.

open_adata_multiprocessing_safe(data_path[, ...])

Context manager for multiprocessing-safe AnnData file opening with retry mechanism.

read_adata_slice_chunked(adata_obj, indices)

Chunked reading - more robust for large slices.

read_adata_slice_direct(adata_obj, indices)

Direct slice reading - fastest when it works.

read_adata_slice_sequential(adata_obj, indices)

Sequential reading - slowest but most robust.

retry_hdf5_operation(operation[, ...])

Retry an operation that may fail due to HDF5 multiprocessing issues.

safe_read_adata_slice(adata_obj, indices)

Safely read a slice from AnnData with retry mechanism and fallback strategies.

save_predictions_to_csv(predictions, obs_df, ...)

Save predictions to a CSV file.

Classes

IOSettings()

Settings for I/O operations, following AnnData patterns.

class scxpand.util.io.IOSettings#

Settings for I/O operations, following AnnData patterns.

__init__()#
scxpand.util.io.close_adata_file_safely(adata_obj)#

Safely close AnnData file handle without raising exceptions.

Return type:

None

scxpand.util.io.ensure_directory_exists(path)#

Ensure a directory exists, creating it if necessary.

Parameters:

path (Path) – Directory path to create

Return type:

None

scxpand.util.io.exponential_backoff_delay(attempt, initial_delay=0.1, backoff_factor=2.0, max_delay=2.0, jitter=True)#

Calculate exponential backoff delay for retry attempts with optional jitter.

Parameters:
  • attempt (int) – Current attempt number (0-indexed)

  • initial_delay (float (default: 0.1)) – Base delay for first retry

  • backoff_factor (float (default: 2.0)) – Multiplier for exponential growth

  • max_delay (float (default: 2.0)) – Maximum delay cap

  • jitter (bool (default: True)) – If True, adds randomization to prevent thundering herd

Return type:

float

Returns:

Delay in seconds before next retry attempt

scxpand.util.io.is_hdf5_error(error)#

Check if an error is HDF5-related and should be retried.

Return type:

bool

scxpand.util.io.load_eval_indices(eval_row_inds_path)#

Load evaluation indices from a file.

Parameters:

eval_row_inds_path (str | Path) – Path to file containing cell indices (one per line)

Return type:

ndarray

Returns:

Array of evaluation indices

Raises:
scxpand.util.io.open_adata_file_with_retry(data_path)#

Open AnnData file with retry mechanism.

Return type:

AnnData

scxpand.util.io.open_adata_multiprocessing_safe(data_path, adata=None, indices=None)#

Context manager for multiprocessing-safe AnnData file opening with retry mechanism.

This function ensures that each worker process opens its own file handle to avoid the common “OSError: Can’t synchronously read data” error when using PyTorch DataLoader with num_workers > 0. Includes robust retry mechanisms for HDF5-related errors.

Parameters:
  • data_path (str | Path | None) – Path to H5AD file. Required unless adata is provided.

  • adata (AnnData | None (default: None)) – In-memory AnnData object. Alternative to data_path.

  • indices (ndarray | None (default: None)) – Optional indices for the yielded data (passed through unchanged).

Yields:

Tuple of (AnnData object, indices) for batch access.

Example

>>> with open_adata_multiprocessing_safe("data.h5ad") as (adata, indices):
...     X_data = safe_read_adata_slice(adata, cell_indices)

Note

  • If adata is provided, it’s used directly (no file operations)

  • Each call opens a fresh file handle (safer for multiprocessing)

  • File handles are properly closed when the context exits

  • Includes retry mechanism for common HDF5 multiprocessing errors

scxpand.util.io.read_adata_slice_chunked(adata_obj, indices, chunk_size=None)#

Chunked reading - more robust for large slices.

Return type:

ndarray

scxpand.util.io.read_adata_slice_direct(adata_obj, indices)#

Direct slice reading - fastest when it works.

Return type:

ndarray

scxpand.util.io.read_adata_slice_sequential(adata_obj, indices)#

Sequential reading - slowest but most robust.

Return type:

ndarray

scxpand.util.io.retry_hdf5_operation(operation, max_retries=None, operation_name='operation')#

Retry an operation that may fail due to HDF5 multiprocessing issues.

Parameters:
  • operation (Callable[[], Any]) – Function to retry (should take no arguments)

  • max_retries (int | None (default: None)) – Maximum number of retry attempts (uses settings default if None)

  • operation_name (str (default: 'operation')) – Name of operation for logging

Return type:

Any

Returns:

Result of the operation if successful

Raises:

Exception – The last exception if all retries fail

scxpand.util.io.safe_read_adata_slice(adata_obj, indices)#

Safely read a slice from AnnData with retry mechanism and fallback strategies.

Parameters:
  • adata_obj – AnnData object to read from

  • indices (ndarray) – Row indices to read

Return type:

ndarray

Returns:

Dense numpy array with the requested data

Raises:

OSError – If all retry attempts and fallback strategies fail

scxpand.util.io.save_predictions_to_csv(predictions, obs_df, model_type, save_path)#

Save predictions to a CSV file.

Parameters:
  • predictions (ndarray) – Model predictions (probabilities)

  • obs_df (DataFrame) – DataFrame with cell metadata

  • model_type (ModelType | str) – Type of model used for predictions

  • save_path (Path | None) – Directory to save predictions (None to skip saving)

Raises:

ValueError – If predictions and obs_df have mismatched lengths

Return type:

None