scxpand.util.io#
File I/O operations for scXpand.
This module handles all file input/output operations including: - Saving predictions to CSV files - Loading evaluation indices - File path validation and creation - Multiprocessing-safe AnnData file operations with retry mechanisms
This module has no dependencies on model-specific code.
Functions
|
Safely close AnnData file handle without raising exceptions. |
|
Ensure a directory exists, creating it if necessary. |
|
Calculate exponential backoff delay for retry attempts with optional jitter. |
|
Check if an error is HDF5-related and should be retried. |
|
Load evaluation indices from a file. |
|
Open AnnData file with retry mechanism. |
|
Context manager for multiprocessing-safe AnnData file opening with retry mechanism. |
|
Chunked reading - more robust for large slices. |
|
Direct slice reading - fastest when it works. |
|
Sequential reading - slowest but most robust. |
|
Retry an operation that may fail due to HDF5 multiprocessing issues. |
|
Safely read a slice from AnnData with retry mechanism and fallback strategies. |
|
Save predictions to a CSV file. |
Classes
Settings for I/O operations, following AnnData patterns. |
- class scxpand.util.io.IOSettings#
Settings for I/O operations, following AnnData patterns.
- __init__()#
- scxpand.util.io.close_adata_file_safely(adata_obj)#
Safely close AnnData file handle without raising exceptions.
- Return type:
- scxpand.util.io.ensure_directory_exists(path)#
Ensure a directory exists, creating it if necessary.
- scxpand.util.io.exponential_backoff_delay(attempt, initial_delay=0.1, backoff_factor=2.0, max_delay=2.0, jitter=True)#
Calculate exponential backoff delay for retry attempts with optional jitter.
- Parameters:
attempt (
int) – Current attempt number (0-indexed)initial_delay (
float(default:0.1)) – Base delay for first retrybackoff_factor (
float(default:2.0)) – Multiplier for exponential growthmax_delay (
float(default:2.0)) – Maximum delay capjitter (
bool(default:True)) – If True, adds randomization to prevent thundering herd
- Return type:
- Returns:
Delay in seconds before next retry attempt
- scxpand.util.io.is_hdf5_error(error)#
Check if an error is HDF5-related and should be retried.
- Return type:
- scxpand.util.io.load_eval_indices(eval_row_inds_path)#
Load evaluation indices from a file.
- Parameters:
eval_row_inds_path (
str|Path) – Path to file containing cell indices (one per line)- Return type:
- Returns:
Array of evaluation indices
- Raises:
FileNotFoundError – If the file doesn’t exist
ValueError – If the file contains invalid data
- scxpand.util.io.open_adata_file_with_retry(data_path)#
Open AnnData file with retry mechanism.
- Return type:
- scxpand.util.io.open_adata_multiprocessing_safe(data_path, adata=None, indices=None)#
Context manager for multiprocessing-safe AnnData file opening with retry mechanism.
This function ensures that each worker process opens its own file handle to avoid the common “OSError: Can’t synchronously read data” error when using PyTorch DataLoader with num_workers > 0. Includes robust retry mechanisms for HDF5-related errors.
- Parameters:
- Yields:
Tuple of (AnnData object, indices) for batch access.
Example
>>> with open_adata_multiprocessing_safe("data.h5ad") as (adata, indices): ... X_data = safe_read_adata_slice(adata, cell_indices)
Note
If adata is provided, it’s used directly (no file operations)
Each call opens a fresh file handle (safer for multiprocessing)
File handles are properly closed when the context exits
Includes retry mechanism for common HDF5 multiprocessing errors
- scxpand.util.io.read_adata_slice_chunked(adata_obj, indices, chunk_size=None)#
Chunked reading - more robust for large slices.
- Return type:
- scxpand.util.io.read_adata_slice_direct(adata_obj, indices)#
Direct slice reading - fastest when it works.
- Return type:
- scxpand.util.io.read_adata_slice_sequential(adata_obj, indices)#
Sequential reading - slowest but most robust.
- Return type:
- scxpand.util.io.retry_hdf5_operation(operation, max_retries=None, operation_name='operation')#
Retry an operation that may fail due to HDF5 multiprocessing issues.
- Parameters:
- Return type:
- Returns:
Result of the operation if successful
- Raises:
Exception – The last exception if all retries fail
- scxpand.util.io.safe_read_adata_slice(adata_obj, indices)#
Safely read a slice from AnnData with retry mechanism and fallback strategies.
- scxpand.util.io.save_predictions_to_csv(predictions, obs_df, model_type, save_path)#
Save predictions to a CSV file.
- Parameters:
- Raises:
ValueError – If predictions and obs_df have mismatched lengths
- Return type: