src.superphot_plus.format_data_ztf
This script provides functions for importing, preprocessing, and manipulating data related to ZTF lightcurves.
Module Contents
Functions
|
Filters CSVs for rows where label is in allowed_types and returns |
|
Generates set of K test sets and corresponding training sets. |
|
Prints the number of samples with each class label. |
|
Oversamples, drawing from posteriors of a certain fit. |
|
Normalizes the features for feeding into the neural network. |
|
Uses SMOTE to oversample data from rarer classes. |
- import_labels_only(input_csvs, allowed_types, fits_dir=None, needs_posteriors=True, sampler=None)[source]
Filters CSVs for rows where label is in allowed_types and returns names, labels.
- Parameters:
input_csvs (list of str) – List of input CSV file paths.
allowed_types (list) – List of allowed types for labels.
fits_dir (str, optional) – Directory path for FITS files. Defaults to None.
needs_posteriors (boolean, optional) – Indicates whether to load posterior samples.
sampler (str, optional) – The sampler to get posteriors from.
- Returns:
Tuple of names, labels and redshifts.
- Return type:
tuple of np.ndarray
Notes
Maps groups of similar labels to a single representative label name (eg, “SN Ic”, “SNIc-BL”, and “21” all become “SN Ibc”).
- generate_K_fold(features, classes, num_folds)[source]
Generates set of K test sets and corresponding training sets.
- Parameters:
features (list) – Input features.
classes (list) – Input classes.
num_folds (int) – Number of folds. If -1, sets num_folds=len(features).
- Returns:
Generator yielding the indices for training and test sets.
- Return type:
generator
- tally_each_class(labels)[source]
Prints the number of samples with each class label.
- Parameters:
labels (list) – Input labels.
- oversample_using_posteriors(lc_names, labels, goal_per_class, fits_dir, sampler=None, redshifts=None, oversample_redshifts=False)[source]
Oversamples, drawing from posteriors of a certain fit.
- Parameters:
lc_names (str) – Lightcurve names.
labels (list) – List of labels.
goal_per_class (int) – Number of samples per class.
fits_dir (str) – Where fit parameters are stored.
sampler (str, optional) – The name of the sampler to use.
redshifts (list, optional) – List of redshift values.
oversample_redshifts (boolean, optional) – Indicates whether to oversample redshifts.
- Returns:
Tuple containing oversampled features, labels, and redshifts.
- Return type:
tuple of np.ndarray
- normalize_features(features, mean=None, std=None)[source]
Normalizes the features for feeding into the neural network.
- Parameters:
features (numpy array) – Input features. Must be a 2-d array where each row corresponds to a data point and each entry to a feature.
mean (ndarray, optional) – Mean values for normalization. Defaults to None.
std (ndarray, optional) – Standard deviation values for normalization. Defaults to None.
- Returns:
Tuple containing normalized features, mean values, and standard deviation values.
- Return type:
tuple of np.ndarray