scxpand.data_util.dataloaders#

Functions

create_eval_dataloader(dataset, batch_size)

Create a DataLoader optimized for evaluation and inference.

create_train_dataloader(train_dataset, ...)

Create a DataLoader for training.

Classes

BalancedLabelsBatchSampler(dataset, batch_size)

BalancedTypesBatchSampler(dataset, batch_size)

class scxpand.data_util.dataloaders.BalancedLabelsBatchSampler(dataset, batch_size, seed=1)#
__init__(dataset, batch_size, seed=1)#

Balanced batch sampler that ensures each batch has roughly equal number of positive and negative examples.

Parameters:
  • dataset (CellsDataset) – CellsDataset instance.

  • batch_size (int) – int, batch size.

  • seed (int (default: 1)) – int, random seed for reproducibility.

class scxpand.data_util.dataloaders.BalancedTypesBatchSampler(dataset, batch_size, seed=1)#
__init__(dataset, batch_size, seed=1)#

Balanced batch sampler that equalizes the proportions of each stratum defined by.

the combinations of categorical features [“tissue_type”, “imputed_labels”]. For composite groups with both positive and negative labels, the group is split into two strata. Each batch contains an equal (or nearly equal) number of samples from each stratum.

Parameters:
  • dataset (CellsDataset) – CellsDataset instance

  • batch_size (int) – int, must be at least as large as the number of strata.

  • seed (int (default: 1)) – int, random seed for reproducibility.

scxpand.data_util.dataloaders.create_eval_dataloader(dataset, batch_size, num_workers=0)#

Create a DataLoader optimized for evaluation and inference.

Sets up a DataLoader with deterministic behavior (no shuffling) suitable for inference tasks. Automatically configures worker processes and memory settings based on the dataset.

Parameters:
  • dataset (CellsDataset) – CellsDataset configured for evaluation (is_train=False).

  • batch_size (int) – Number of cells per batch during inference.

  • num_workers (int (default: 0)) – Number of parallel data loading processes.

Return type:

DataLoader

Returns:

DataLoader ready for inference with consistent ordering.

scxpand.data_util.dataloaders.create_train_dataloader(train_dataset, loader_params, num_workers=0)#

Create a DataLoader for training.

Parameters:
  • train_dataset (CellsDataset) – Dataset for training

  • loader_params (DataLoaderParams) – Parameters for the loader

  • num_workers (int (default: 0)) – Number of worker processes for parallel data loading.

Return type:

DataLoader

Returns:

DataLoader for training