popmon.analysis.profiling package
Submodules
popmon.analysis.profiling.hist_profiler module
- class popmon.analysis.profiling.hist_profiler.HistProfiler(read_key, store_key, features=None, ignore_features=None, var_timestamp=None, hist_col='histogram', index_col='date', stats_functions=None)
Bases:
Module
Generate profiles of histograms using default statistical functions.
Profiles are:
1 dim histograms, all: ‘count’, ‘filled’, ‘distinct’, ‘nan’, ‘most_probable_value’, ‘overflow’, ‘underflow’.
1 dim histograms, numeric: mean, std, min, max, p01, p05, p16, p50, p84, p95, p99.
1 dim histograms, boolean: fraction of true entries.
2 dim histograms: count, phi_k correlation constant, p-value and Z-score of contingency test.
n dim histograms: count (n >= 3)
- Parameters
read_key (str) – key of the input test data to read from the datastore
store_key (str) – key of the output data to store in the datastore
features (list) – features of data-frames to pick up from input data (optional)
ignore_features (list) – features to ignore (optional)
var_timestamp (list) – list of timestamp variables (optional)
hist_col (str) – key for histogram in split dictionary
index_col (str) – key for index in split dictionary
stats_functions (dict) – function_name, function(bin_labels, bin_counts) dictionary
- __init__(read_key, store_key, features=None, ignore_features=None, var_timestamp=None, hist_col='histogram', index_col='date', stats_functions=None)
Module initialization
- transform(data)
Central function of the module.
Typically transform() takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
popmon.analysis.profiling.profiles module
- popmon.analysis.profiling.profiles.profile_fraction_of_true(bin_labels, bin_counts)
Compute fraction of ‘true’ labels
- Parameters
bin_labels – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
bin_entries – Array containing weights for the elements of a. If weights is not an array, a conversion is attempted.
- Returns
fraction of ‘true’ labels
popmon.analysis.profiling.pull_calculator module
- class popmon.analysis.profiling.pull_calculator.ExpandingPullCalculator(read_key, shift=1, features=None, store_key=None, suffix_mean='_exp_mean', suffix_std='_exp_std', suffix_pull='_exp_pull', *args, **kwargs)
Bases:
PullCalculator
Pull calculation based on expanding mean and standard deviations
- __init__(read_key, shift=1, features=None, store_key=None, suffix_mean='_exp_mean', suffix_std='_exp_std', suffix_pull='_exp_pull', *args, **kwargs)
Initialize an instance of HistComparer.
- Parameters
read_key (str) – key of input data to read from data store
shift (int) – shift of the window, default is 1.
features (list) – list of features to calculate pull for. (optional)
store_key (str) – key of the output data to store in the datastore (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.profiling.pull_calculator.PullCalculator(func_mean, func_std, apply_to_key, assign_to_key=None, store_key=None, suffix_mean='_mean', suffix_std='_std', suffix_pull='_pull', features=None, *args, **kwargs)
Bases:
Pipeline
Base module for pull calculation, based on mean and standard deviation calculation
Steps as performed by ApplyFunc modules:
calculate standard deviation seen in a metric.
calculate mean seen in a metric.
calculate pull of a metric as: pull = (metric - mean) / std.
The pull is then stored as a new column.
- __init__(func_mean, func_std, apply_to_key, assign_to_key=None, store_key=None, suffix_mean='_mean', suffix_std='_std', suffix_pull='_pull', features=None, *args, **kwargs)
Initialize an instance of HistComparer.
- Parameters
func_mean (str) – applied-function to calculate mean of profiled statistics
func_std (str) – applied-function to calculate std of profiled statistics
apply_to_key (str) – key of the input data to apply funcs to.
assign_to_key (str) – key of the input data to assign function applied-output to. (optional)
store_key (str) – key of the output data to store in the datastore (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean. default is
_mean
.suffix_std (str) – suffix of std. std column = metric + suffix_std. default is
_std
.suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull. default is
_pull
.features (list) – list of features to calculate pull for. default is all.
args – (tuple, optional): residual args passed on to func_mean and func_std
kwargs – (dict, optional): residual kwargs passed on to func_mean and func_std
- class popmon.analysis.profiling.pull_calculator.RefMedianMadPullCalculator(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)
Bases:
PullCalculator
Pull calculation based on reference median and mad
- __init__(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)
Initialize an instance of HistComparer.
- Parameters
reference_key (str) – key of input data to read from data store
assign_to_key (str) – key of output data to store in data store
store_key (str) – key of the output data to store in the datastore (optional)
features (list) – list of features to calculate pull for.
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.profiling.pull_calculator.ReferencePullCalculator(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)
Bases:
PullCalculator
Pull calculation based on reference mean and standard deviations
- __init__(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)
Initialize an instance of HistComparer.
- Parameters
reference_key (str) – key of input data to read from data store
assign_to_key (str) – key of output data to store in data store
store_key (str) – key of the output data to store in the datastore (optional)
features (list) – list of features to calculate pull for. (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.profiling.pull_calculator.RollingPullCalculator(read_key, window, shift=1, features=None, store_key=None, suffix_mean='_roll_mean', suffix_std='_roll_std', suffix_pull='_roll_pull', *args, **kwargs)
Bases:
PullCalculator
Pull calculation based on rolling mean and standard deviations
- __init__(read_key, window, shift=1, features=None, store_key=None, suffix_mean='_roll_mean', suffix_std='_roll_std', suffix_pull='_roll_pull', *args, **kwargs)
Initialize an instance of HistComparer.
- Parameters
read_key (str) – key of input data to read from data store
window (int) – size of rolling window
shift (int) – shift of the window, default is 1.
features (list) – list of features to calculate pull for.
store_key (str) – key of the output data to store in the datastore (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict