scxpand.data_util.dataloaders#
Functions
|
Create a DataLoader optimized for evaluation and inference. |
|
Create a DataLoader for training. |
Classes
|
|
|
- class scxpand.data_util.dataloaders.BalancedLabelsBatchSampler(dataset, batch_size, seed=1)#
- __init__(dataset, batch_size, seed=1)#
Balanced batch sampler that ensures each batch has roughly equal number of positive and negative examples.
- Parameters:
dataset (
CellsDataset) – CellsDataset instance.batch_size (
int) – int, batch size.seed (
int(default:1)) – int, random seed for reproducibility.
- class scxpand.data_util.dataloaders.BalancedTypesBatchSampler(dataset, batch_size, seed=1)#
- __init__(dataset, batch_size, seed=1)#
Balanced batch sampler that equalizes the proportions of each stratum defined by.
the combinations of categorical features [“tissue_type”, “imputed_labels”]. For composite groups with both positive and negative labels, the group is split into two strata. Each batch contains an equal (or nearly equal) number of samples from each stratum.
- Parameters:
dataset (
CellsDataset) – CellsDataset instancebatch_size (
int) – int, must be at least as large as the number of strata.seed (
int(default:1)) – int, random seed for reproducibility.
- scxpand.data_util.dataloaders.create_eval_dataloader(dataset, batch_size, num_workers=0)#
Create a DataLoader optimized for evaluation and inference.
Sets up a DataLoader with deterministic behavior (no shuffling) suitable for inference tasks. Automatically configures worker processes and memory settings based on the dataset.
- Parameters:
dataset (
CellsDataset) – CellsDataset configured for evaluation (is_train=False).batch_size (
int) – Number of cells per batch during inference.num_workers (
int(default:0)) – Number of parallel data loading processes.
- Return type:
- Returns:
DataLoader ready for inference with consistent ordering.
- scxpand.data_util.dataloaders.create_train_dataloader(train_dataset, loader_params, num_workers=0)#
Create a DataLoader for training.
- Parameters:
train_dataset (
CellsDataset) – Dataset for trainingloader_params (
DataLoaderParams) – Parameters for the loadernum_workers (
int(default:0)) – Number of worker processes for parallel data loading.
- Return type:
- Returns:
DataLoader for training