popmon.analysis.profiling package

Submodules

popmon.analysis.profiling.hist_profiler module

class popmon.analysis.profiling.hist_profiler.HistProfiler(read_key, store_key, features=None, ignore_features=None, var_timestamp=None, hist_col='histogram', index_col='date', stats_functions=None)

Bases: Module

Generate profiles of histograms using default statistical functions.

Profiles are:

1 dim histograms, all: ‘count’, ‘filled’, ‘distinct’, ‘nan’, ‘most_probable_value’, ‘overflow’, ‘underflow’.
1 dim histograms, numeric: mean, std, min, max, p01, p05, p16, p50, p84, p95, p99.
1 dim histograms, boolean: fraction of true entries.
2 dim histograms: count, phi_k correlation constant, p-value and Z-score of contingency test.
n dim histograms: count (n >= 3)

Parameters

read_key (str) – key of the input test data to read from the datastore
store_key (str) – key of the output data to store in the datastore
features (list) – features of data-frames to pick up from input data (optional)
ignore_features (list) – features to ignore (optional)
var_timestamp (list) – list of timestamp variables (optional)
hist_col (str) – key for histogram in split dictionary
index_col (str) – key for index in split dictionary
stats_functions (dict) – function_name, function(bin_labels, bin_counts) dictionary

__init__(read_key, store_key, features=None, ignore_features=None, var_timestamp=None, hist_col='histogram', index_col='date', stats_functions=None): Module initialization

transform(data)

Central function of the module.

Typically transform() takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters: datastore (dict) – input datastore
Returns: updated output datastore
Return type: dict

popmon.analysis.profiling.profiles module

popmon.analysis.profiling.profiles.profile_fraction_of_true(bin_labels, bin_counts)

Compute fraction of ‘true’ labels

Parameters

bin_labels – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
bin_entries – Array containing weights for the elements of a. If weights is not an array, a conversion is attempted.

Returns

fraction of ‘true’ labels

popmon.analysis.profiling.pull_calculator module

class popmon.analysis.profiling.pull_calculator.ExpandingPullCalculator(read_key, shift=1, features=None, store_key=None, suffix_mean='_exp_mean', suffix_std='_exp_std', suffix_pull='_exp_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on expanding mean and standard deviations

__init__(read_key, shift=1, features=None, store_key=None, suffix_mean='_exp_mean', suffix_std='_exp_std', suffix_pull='_exp_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters

read_key (str) – key of input data to read from data store
shift (int) – shift of the window, default is 1.
features (list) – list of features to calculate pull for. (optional)
store_key (str) – key of the output data to store in the datastore (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters: datastore (dict) – input datastore
Returns: updated output datastore
Return type: dict

class popmon.analysis.profiling.pull_calculator.PullCalculator(func_mean, func_std, apply_to_key, assign_to_key=None, store_key=None, suffix_mean='_mean', suffix_std='_std', suffix_pull='_pull', features=None, *args, **kwargs)

Bases: Pipeline

Base module for pull calculation, based on mean and standard deviation calculation

Steps as performed by ApplyFunc modules:

calculate standard deviation seen in a metric.
calculate mean seen in a metric.
calculate pull of a metric as: pull = (metric - mean) / std.

The pull is then stored as a new column.

__init__(func_mean, func_std, apply_to_key, assign_to_key=None, store_key=None, suffix_mean='_mean', suffix_std='_std', suffix_pull='_pull', features=None, *args, **kwargs)

Initialize an instance of HistComparer.

Parameters

func_mean (str) – applied-function to calculate mean of profiled statistics
func_std (str) – applied-function to calculate std of profiled statistics
apply_to_key (str) – key of the input data to apply funcs to.
assign_to_key (str) – key of the input data to assign function applied-output to. (optional)
store_key (str) – key of the output data to store in the datastore (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean. default is _mean.
suffix_std (str) – suffix of std. std column = metric + suffix_std. default is _std.
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull. default is _pull.
features (list) – list of features to calculate pull for. default is all.
args – (tuple, optional): residual args passed on to func_mean and func_std
kwargs – (dict, optional): residual kwargs passed on to func_mean and func_std

class popmon.analysis.profiling.pull_calculator.RefMedianMadPullCalculator(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on reference median and mad

__init__(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters

reference_key (str) – key of input data to read from data store
assign_to_key (str) – key of output data to store in data store
store_key (str) – key of the output data to store in the datastore (optional)
features (list) – list of features to calculate pull for.
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters: datastore (dict) – input datastore
Returns: updated output datastore
Return type: dict

class popmon.analysis.profiling.pull_calculator.ReferencePullCalculator(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on reference mean and standard deviations

__init__(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters

reference_key (str) – key of input data to read from data store
assign_to_key (str) – key of output data to store in data store
store_key (str) – key of the output data to store in the datastore (optional)
features (list) – list of features to calculate pull for. (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters: datastore (dict) – input datastore
Returns: updated output datastore
Return type: dict

class popmon.analysis.profiling.pull_calculator.RollingPullCalculator(read_key, window, shift=1, features=None, store_key=None, suffix_mean='_roll_mean', suffix_std='_roll_std', suffix_pull='_roll_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on rolling mean and standard deviations

__init__(read_key, window, shift=1, features=None, store_key=None, suffix_mean='_roll_mean', suffix_std='_roll_std', suffix_pull='_roll_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters

read_key (str) – key of input data to read from data store
window (int) – size of rolling window
shift (int) – shift of the window, default is 1.
features (list) – list of features to calculate pull for.
store_key (str) – key of the output data to store in the datastore (optional)
suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean
suffix_std (str) – suffix of std. std column = metric + suffix_std
suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull
args – (tuple, optional): residual args passed on to mean and std functions
kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters: datastore (dict) – input datastore
Returns: updated output datastore
Return type: dict