Output Format Specification#

This document describes the output files generated by scXpand during training, optimization, and inference.

Overview#

After training, various files are generated in the results directory depending on the model type and configuration. All models produce a core set of common files, with additional model-specific files based on the architecture and training approach.

Common Files (All Models)#

These files are generated for every model type:

File	Description
`parameters.json`	Model configuration and hyperparameters
`model_type.txt`	Model type identifier (e.g., “mlp”, “autoencoder”)
`data_format.json`	Input data preprocessing configuration
`results_dev.txt`	Evaluation metrics in text format
`results_table_dev.csv`	Detailed per-cell predictions and metadata
`train_patient_ids.csv`	Patient IDs used for training
`dev_patient_ids.csv`	Patient IDs used for validation

Model Files (Model-Specific)#

Different model types save their trained models in different formats:

Model Type	File	Description
PyTorch Models (autoencoder, mlp)	`model.pt`	Trained model state dictionary
Scikit-learn Models (logistic, svm, lightgbm)	`model.joblib`	Serialized model using joblib

Visualization Files#

File	Location	Description
`roc_curve_dev.png`	`plots/` subdirectory	ROC curve visualization for validation set

PyTorch Model Files (autoencoder, mlp)#

PyTorch models (autoencoder and mlp) use the TrainLogger and create additional training monitoring files:

File	Description
`run_info.json`	Training run metadata and configuration
`summary_info.json`	Training summary including final metrics
`best_model_info.json`	Configuration and metrics of best performing model

Autoencoder and MLP-Specific Files#

Only the autoencoder and the MLP models create checkpoint files for training resumption:

File	Description
`best_ckpt.pt`	Best model checkpoint during training
`last_ckpt.pt`	Final model checkpoint
`best_model_dev_set_metrics.json`	Detailed metrics for the best model

Linear Model Files (logistic, svm)#

Logistic regression and SVM models use a simplified logger and create:

File	Description
`best_model_info.json`	Best hyperparameters and performance metrics
`summary_info.json`	Training summary including final metrics

Hyperparameter Optimization Files#

When using optimize or optimize-all commands, additional files are created in the study directories:

File	Location	Description
`optuna.db`	`results/optuna_studies/{study_name}/`	SQLite database storing all trial data
`info.json`	`results/optuna_studies/{study_name}/`	Study summary with best trial results
`trials.log`	`results/optuna_studies/{study_name}/`	Detailed log of all trial executions
`trial_{N}/`	`results/optuna_studies/{study_name}/`	Individual trial directories with full model outputs

Each trial_{N}/ directory contains the complete output files for that trial (model files, metrics, plots, etc.) as described in the sections above.

Model Type File#

The model_type.txt file is automatically generated during training and contains a simple text identifier for the model type. This enables automatic model type detection during inference.

File Format:

The file contains just the model type as plain text:

mlp

Supported Values:

autoencoder - Autoencoder models
mlp - Multi-layer perceptron models
lightgbm - LightGBM gradient boosting models
logistic - Logistic regression models
svm - Support vector machine models

Usage:

This file is automatically used by the inference pipeline to detect the model type without manual specification. If the file is missing, users can create it manually or specify the model type explicitly in their inference code.

Directory Structure Example#

results/
├── autoencoder_v1/                    # Main model results
│   ├── model.pt                       # Trained model
│   ├── parameters.json                # Model configuration
│   ├── model_type.txt                 # Model type identifier
│   ├── data_format.json               # Data preprocessing config
│   ├── results_dev.txt                # Evaluation metrics
│   ├── plots/
│   │   └── roc_curve_dev.png          # ROC curve
│   └── ...
└── optuna_studies/                    # Hyperparameter optimization
    └── autoencoder_study/
        ├── optuna.db                  # Trial database
        ├── info.json                  # Study summary
        ├── trials.log                 # Trial logs
        └── trial_0/, trial_1/, ...    # Individual trials

Output Format Specification

Contents