scxpand.util.general_util#

Functions

compute_error_rate(label, y_pred[, threshold])

compute_false_negative_rate(label, prob_out)

compute_false_positive_rate(label, prob_out)

compute_row_sums(X)

Compute row sums for any array-like object, returning NumPy array.

compute_scaling_factors(row_sums, target_sum)

Compute row scaling factors for normalization.

convert_enums_to_values(obj)

Recursively convert enum objects to their string values for JSON serialization and logging.

copy_array_like(x[, copy])

Create a copy of an array-like object if requested.

decisions_to_probabilities(decisions)

Convert raw decision_function scores to probability estimates.

ensure_numpy_array(x)

Ensure the input is converted to a NumPy array.

flatten_dict(d[, parent_key, sep])

Flatten a nested dictionary, preserving the hierarchy in the keys.

flatten_nested_dict(nested_dict[, parent_key])

Convert a nested dictionary to a flattened dictionary with keys in the format 'key1/key2/...'.

floats_to_str(a[, precision])

Convert a numeric float value to a string, with a given precision.

format_float(x[, precision, threshold])

Format a float number using fixed-point notation unless it is very small (but nonzero).

get_device()

Automatically detect and return the best available device for PyTorch.

get_elapsed_time_str(t0, t1)

get_last_git_commit_link()

Get the last git commit link from the remote repository.

get_local_time()

get_new_version_path(save_path[, start_index])

Create a versioned directory path to avoid overwriting existing results.

get_utc_time()

load_and_override_params(param_class[, ...])

Load parameters from config file or use defaults, then apply overrides.

load_params(save_path)

Load model parameters from a saved JSON file.

log_inference_progress(current_iteration, ...)

Log progress during inference or processing operations.

log_nested_metrics(metrics, logger_func[, ...])

Log nested metrics with hierarchical display and highlighted score metric.

metrics_dict_to_dataframes(metrics[, precision])

Convert nested metrics dictionary to pandas DataFrames for nice display in notebooks.

metrics_dict_to_table(metrics[, title, ...])

Convert nested metrics dictionary to a formatted table string using pandas.

nested_dict_to_flat_str(nested_scalars[, ...])

Flatten a nested dictionary into a string with , separated values.

nested_dict_to_multiline_str(nested_scalars)

Recursively returns a string containing a nested dictionary of scalars in a hierarchically indented multi-line format.

num2str(v)

Convert a number to a string, with a fixed number of decimal places.

save_json_data(data, save_path)

Save arbitrary dictionary data to a JSON file.

save_params(params, save_dir)

Save parameters to a json file and save model type from parameter object.

set_seed(seed)

sigmoid(x)

time_seconds_to_str(seconds)

time_to_str(t[, fmt])

Convert a datetime object to a string.

to_np(x)

Convert tensor to numpy array.

scxpand.util.general_util.compute_error_rate(label, y_pred, threshold=0.5)#
Return type:

float

scxpand.util.general_util.compute_false_negative_rate(label, prob_out, threshold=0.5)#
Return type:

float

scxpand.util.general_util.compute_false_positive_rate(label, prob_out, threshold=0.5)#
Return type:

float

scxpand.util.general_util.compute_row_sums(X)#

Compute row sums for any array-like object, returning NumPy array.

Parameters:

X – Array-like object (numpy array, torch tensor, sparse matrix)

Return type:

ndarray

Returns:

NumPy array of row sums

Raises:

TypeError – If the input type doesn’t support row sum computation

scxpand.util.general_util.compute_scaling_factors(row_sums, target_sum, dtype=<class 'numpy.float32'>)#

Compute row scaling factors for normalization.

Parameters:
  • row_sums (ndarray) – Sum of each row

  • target_sum (float) – Target sum for normalization

  • dtype (type (default: <class 'numpy.float32'>)) – Data type for the output array

Return type:

ndarray

Returns:

Array of scaling factors

scxpand.util.general_util.convert_enums_to_values(obj)#

Recursively convert enum objects to their string values for JSON serialization and logging.

Parameters:

obj (Any) – Any object that might contain enums (dict, list, tuple, or individual values)

Return type:

Any

Returns:

Object with all enums converted to their .value strings

scxpand.util.general_util.copy_array_like(x, copy=True)#

Create a copy of an array-like object if requested.

Parameters:
  • x – Array-like object (numpy array, torch tensor, sparse matrix)

  • copy (bool (default: True)) – Whether to create a copy

Returns:

Copy or reference to the original array

scxpand.util.general_util.decisions_to_probabilities(decisions)#

Convert raw decision_function scores to probability estimates.

If decisions is 1-D (binary classifier) we apply a sigmoid. If it is 2-D (multi-class) we apply a numerically-stable softmax and return the probability of the positive / class-1 column (or column-0 if it is the only one). This mirrors the logic used during training.

Return type:

ndarray

scxpand.util.general_util.ensure_numpy_array(x)#

Ensure the input is converted to a NumPy array.

Handles PyTorch tensors, sparse matrices, and other array-like objects.

Parameters:

x – Array-like object (numpy array, torch tensor, sparse matrix, etc.)

Return type:

ndarray

Returns:

NumPy array

scxpand.util.general_util.flatten_dict(d, parent_key='', sep='/')#

Flatten a nested dictionary, preserving the hierarchy in the keys.

scxpand.util.general_util.flatten_nested_dict(nested_dict, parent_key='')#

Convert a nested dictionary to a flattened dictionary with keys in the format ‘key1/key2/…’.

Parameters:
  • nested_dict (dict) – The dictionary to flatten.

  • parent_key (str) – A prefix for the keys (used in recursion). Defaults to an empty string.

Returns:

A flattened dictionary where nested keys are concatenated by ‘/’.

Return type:

dict

scxpand.util.general_util.floats_to_str(a, precision=5)#

Convert a numeric float value to a string, with a given precision.

If the input is a data structure, convert all float elements in it to strings.

scxpand.util.general_util.format_float(x, precision=4, threshold=0.001)#

Format a float number using fixed-point notation unless it is very small (but nonzero).

In which case scientific notation is used.

For scientific notation, trailing zeros in the significand and unnecessary zeros in the exponent are removed. For example, 5.000e-5 is formatted as 5e-5.

Parameters:
  • x (float) – The float to format.

  • precision (int) – Number of digits for formatting.

  • threshold (float) – If abs(x) < threshold and x != 0, use scientific notation.

Returns:

The formatted float as a string.

Return type:

str

scxpand.util.general_util.get_device()#

Automatically detect and return the best available device for PyTorch.

Checks for available hardware acceleration in order of preference: CUDA (NVIDIA) > MPS (Apple Silicon) > XPU (Intel) > CPU.

Returns:

‘cuda’, ‘mps’, ‘xpu’, or ‘cpu’.

Return type:

Device string

Example

>>> device = get_device()
>>> print(f"Using device: {device}")
>>> model = model.to(device)
scxpand.util.general_util.get_elapsed_time_str(t0, t1)#

Get the last git commit link from the remote repository.

Return type:

str

scxpand.util.general_util.get_local_time()#
Return type:

datetime

scxpand.util.general_util.get_new_version_path(save_path, start_index=1)#

Create a versioned directory path to avoid overwriting existing results.

If the target path already exists and contains files, creates a new versioned directory (e.g., ‘results_v_1’, ‘results_v_2’) to preserve existing data.

Parameters:
  • save_path (Path | str) – Desired save directory path.

  • start_index (int (default: 1)) – Starting index for version numbering (default: 1).

Return type:

Path

Returns:

Path to use for saving (original path or new versioned path).

scxpand.util.general_util.get_utc_time()#
Return type:

datetime

scxpand.util.general_util.load_and_override_params(param_class, config_path=None, logger=None, **kwargs)#

Load parameters from config file or use defaults, then apply overrides.

Parameters:
  • param_class (type[TypeVar(T, bound= BaseParams)]) – The parameter dataclass to instantiate

  • config_path (str | None (default: None)) – Optional path to JSON config file

  • logger (BoundLogger | None (default: None)) – Logger instance for logging changes

  • **kwargs (Any) – Parameter overrides to apply

Return type:

TypeVar(T, bound= BaseParams)

Returns:

The parameter object with overrides applied

scxpand.util.general_util.load_params(save_path)#

Load model parameters from a saved JSON file.

Loads hyperparameters and configuration from training results directory.

Parameters:

save_path (Path | str) – Path to directory containing ‘parameters.json’ file.

Return type:

dict

Returns:

Dictionary containing all saved parameters.

Example

>>> params = load_params("results/model_001")
>>> print(f"Learning rate: {params['init_learning_rate']}")
scxpand.util.general_util.log_inference_progress(current_iteration, total_iterations, start_time, log_interval=20, logger_instance=None)#

Log progress during inference or processing operations.

Parameters:
  • current_iteration (int) – Current iteration number (0-indexed)

  • total_iterations (int) – Total number of iterations

  • start_time (float) – Start time of the process (from time.time())

  • log_interval (int (default: 20)) – Log every N iterations

  • logger_instance (BoundLogger | None (default: None)) – Logger object (default: module logger)

Return type:

None

scxpand.util.general_util.log_nested_metrics(metrics, logger_func, prefix='', group='validation', score_metric=None, epoch=None, use_table_format=True)#

Log nested metrics with hierarchical display and highlighted score metric.

Parameters:
  • metrics (dict) – Nested dictionary of metrics to log

  • logger_func – Logger function (e.g., logger.info)

  • prefix (str (default: '')) – Prefix for log messages

  • group (str (default: 'validation')) – Group name for the metrics (e.g., “validation”, “test”)

  • score_metric (str | None (default: None)) – Key of the main score metric to highlight

  • epoch (int | None (default: None)) – Optional epoch number to include in messages

  • use_table_format (bool (default: True)) – If True, display metrics in table format instead of hierarchical

Return type:

None

scxpand.util.general_util.metrics_dict_to_dataframes(metrics, precision=4)#

Convert nested metrics dictionary to pandas DataFrames for nice display in notebooks.

Parameters:
  • metrics (dict) – Nested dictionary containing metrics data

  • precision (int (default: 4)) – Number of decimal places for float values

Returns:

  • overall_df: DataFrame with overall metrics (Metric, Value columns)

  • category_df: DataFrame with category-specific metrics (categories as rows, metrics as columns)

Either DataFrame can be None if no data exists for that category

Return type:

Tuple of (overall_df, category_df) where

scxpand.util.general_util.metrics_dict_to_table(metrics, title='Metrics', precision=4)#

Convert nested metrics dictionary to a formatted table string using pandas.

Parameters:
  • metrics (dict) – Nested dictionary containing metrics data

  • title (str (default: 'Metrics')) – Title for the table

  • precision (int (default: 4)) – Number of decimal places for float values

Return type:

str

Returns:

Formatted table as a string

scxpand.util.general_util.nested_dict_to_flat_str(nested_scalars, omit_keys=None)#

Flatten a nested dictionary into a string with , separated values.

In case of a float, display it with 4 decimal places.

Return type:

str

scxpand.util.general_util.nested_dict_to_multiline_str(nested_scalars, indent=0, oneline_last_level=True)#

Recursively returns a string containing a nested dictionary of scalars in a hierarchically indented multi-line format.

Float values are displayed with improved formatting.

Parameters:
  • nested_scalars (dict) – The nested dictionary containing scalar values.

  • indent (int) – The current indentation level (used during recursion). Defaults to 0.

  • oneline_last_level (bool) – Whether to format the last level on a single line. Defaults to True.

Returns:

The formatted multi-line string.

Return type:

str

scxpand.util.general_util.num2str(v)#

Convert a number to a string, with a fixed number of decimal places.

For floats, if their absolute value is small but nonzero, use scientific notation.

Return type:

str

scxpand.util.general_util.save_json_data(data, save_path)#

Save arbitrary dictionary data to a JSON file.

Parameters:
  • data (dict[str, Any]) – Dictionary of data to save

  • save_path (Path | str) – Full path to the JSON file to create

scxpand.util.general_util.save_params(params, save_dir)#

Save parameters to a json file and save model type from parameter object.

Parameters:
  • params (BaseParams) – Parameter object (must inherit from BaseParams and have get_model_type method)

  • save_dir (Path | str) – Directory where to save the parameters

scxpand.util.general_util.set_seed(seed)#
Return type:

None

scxpand.util.general_util.sigmoid(x)#
Return type:

ndarray

scxpand.util.general_util.time_seconds_to_str(seconds)#
Return type:

str

scxpand.util.general_util.time_to_str(t, fmt='%Y-%m-%d %H:%M:%S')#

Convert a datetime object to a string.

scxpand.util.general_util.to_np(x)#

Convert tensor to numpy array.