src.superphot_plus.format_data_ztf

This script provides functions for importing, preprocessing, and manipulating data related to ZTF lightcurves.

Module Contents

Functions

import_labels_only(input_csvs, allowed_types[, ...])

Filters CSVs for rows where label is in allowed_types and returns

generate_K_fold(features, classes, num_folds)

Generates set of K test sets and corresponding training sets.

tally_each_class(labels)

Prints the number of samples with each class label.

oversample_using_posteriors(lc_names, labels, ...[, ...])

Oversamples, drawing from posteriors of a certain fit.

normalize_features(features[, mean, std])

Normalizes the features for feeding into the neural network.

oversample_smote(features, labels)

Uses SMOTE to oversample data from rarer classes.

import_labels_only(input_csvs, allowed_types, fits_dir=None, needs_posteriors=True, sampler=None)[source]

Filters CSVs for rows where label is in allowed_types and returns names, labels.

Parameters:
  • input_csvs (list of str) – List of input CSV file paths.

  • allowed_types (list) – List of allowed types for labels.

  • fits_dir (str, optional) – Directory path for FITS files. Defaults to None.

  • needs_posteriors (boolean, optional) – Indicates whether to load posterior samples.

  • sampler (str, optional) – The sampler to get posteriors from.

Returns:

Tuple of names, labels and redshifts.

Return type:

tuple of np.ndarray

Notes

Maps groups of similar labels to a single representative label name (eg, “SN Ic”, “SNIc-BL”, and “21” all become “SN Ibc”).

generate_K_fold(features, classes, num_folds)[source]

Generates set of K test sets and corresponding training sets.

Parameters:
  • features (list) – Input features.

  • classes (list) – Input classes.

  • num_folds (int) – Number of folds. If -1, sets num_folds=len(features).

Returns:

Generator yielding the indices for training and test sets.

Return type:

generator

tally_each_class(labels)[source]

Prints the number of samples with each class label.

Parameters:

labels (list) – Input labels.

oversample_using_posteriors(lc_names, labels, goal_per_class, fits_dir, sampler=None, redshifts=None, oversample_redshifts=False)[source]

Oversamples, drawing from posteriors of a certain fit.

Parameters:
  • lc_names (str) – Lightcurve names.

  • labels (list) – List of labels.

  • goal_per_class (int) – Number of samples per class.

  • fits_dir (str) – Where fit parameters are stored.

  • sampler (str, optional) – The name of the sampler to use.

  • redshifts (list, optional) – List of redshift values.

  • oversample_redshifts (boolean, optional) – Indicates whether to oversample redshifts.

Returns:

Tuple containing oversampled features, labels, and redshifts.

Return type:

tuple of np.ndarray

normalize_features(features, mean=None, std=None)[source]

Normalizes the features for feeding into the neural network.

Parameters:
  • features (numpy array) – Input features. Must be a 2-d array where each row corresponds to a data point and each entry to a feature.

  • mean (ndarray, optional) – Mean values for normalization. Defaults to None.

  • std (ndarray, optional) – Standard deviation values for normalization. Defaults to None.

Returns:

Tuple containing normalized features, mean values, and standard deviation values.

Return type:

tuple of np.ndarray

oversample_smote(features, labels)[source]

Uses SMOTE to oversample data from rarer classes.