scxpand.data_util.statistics

scxpand.data_util.statistics#

Statistical computation utilities for expression data.

This module provides efficient batch-based statistical computations for preprocessing parameter estimation and sparse matrix operations.

Functions

compute_preprocessed_genes_means_stds(...[, ...])

Compute per-gene means and standard deviations after preprocessing.

scxpand.data_util.statistics.compute_preprocessed_genes_means_stds(data_path, row_inds, batch_size=200000, target_sum=10000.0, use_log_transform=False)#

Compute per-gene means and standard deviations after preprocessing.

This function efficiently computes statistics on large datasets by processing in batches and applying the same preprocessing steps used during training.

Parameters:
  • data_path (str | Path) – Path to the AnnData file

  • row_inds (ndarray) – Indices of rows to process (preferably sorted for speed)

  • batch_size (int (default: 200000)) – Size of each batch for processing

  • target_sum (float (default: 10000.0)) – Target sum for row normalization per cell

  • use_log_transform (bool (default: False)) – Whether to apply log1p transformation

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple of (means, stds) arrays with shape [n_genes]