scxpand.data_util.statistics#
Statistical computation utilities for expression data.
This module provides efficient batch-based statistical computations for preprocessing parameter estimation and sparse matrix operations.
Functions
|
Compute per-gene means and standard deviations after preprocessing. |
- scxpand.data_util.statistics.compute_preprocessed_genes_means_stds(data_path, row_inds, batch_size=200000, target_sum=10000.0, use_log_transform=False)#
Compute per-gene means and standard deviations after preprocessing.
This function efficiently computes statistics on large datasets by processing in batches and applying the same preprocessing steps used during training.
- Parameters:
row_inds (
ndarray) – Indices of rows to process (preferably sorted for speed)batch_size (
int(default:200000)) – Size of each batch for processingtarget_sum (
float(default:10000.0)) – Target sum for row normalization per celluse_log_transform (
bool(default:False)) – Whether to apply log1p transformation
- Return type:
- Returns:
Tuple of (means, stds) arrays with shape [n_genes]