popmon.analysis.profiling package

Submodules

popmon.analysis.profiling.hist_profiler module

class popmon.analysis.profiling.hist_profiler.HistProfiler(read_key, store_key, features=None, ignore_features=None, var_timestamp=None, hist_col='histogram', index_col='date', stats_functions=None)

Bases: Module

Generate profiles of histograms using default statistical functions.

Profiles are:

  • 1 dim histograms, all: ‘count’, ‘filled’, ‘distinct’, ‘nan’, ‘most_probable_value’, ‘overflow’, ‘underflow’.

  • 1 dim histograms, numeric: mean, std, min, max, p01, p05, p16, p50, p84, p95, p99.

  • 1 dim histograms, boolean: fraction of true entries.

  • 2 dim histograms: count, phi_k correlation constant, p-value and Z-score of contingency test.

  • n dim histograms: count (n >= 3)

Parameters
  • read_key (str) – key of the input test data to read from the datastore

  • store_key (str) – key of the output data to store in the datastore

  • features (list) – features of data-frames to pick up from input data (optional)

  • ignore_features (list) – features to ignore (optional)

  • var_timestamp (list) – list of timestamp variables (optional)

  • hist_col (str) – key for histogram in split dictionary

  • index_col (str) – key for index in split dictionary

  • stats_functions (dict) – function_name, function(bin_labels, bin_counts) dictionary

__init__(read_key, store_key, features=None, ignore_features=None, var_timestamp=None, hist_col='histogram', index_col='date', stats_functions=None)

Module initialization

transform(data)

Central function of the module.

Typically transform() takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

popmon.analysis.profiling.profiles module

popmon.analysis.profiling.profiles.profile_fraction_of_true(bin_labels, bin_counts)

Compute fraction of ‘true’ labels

Parameters
  • bin_labels – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

  • bin_entries – Array containing weights for the elements of a. If weights is not an array, a conversion is attempted.

Returns

fraction of ‘true’ labels

popmon.analysis.profiling.pull_calculator module

class popmon.analysis.profiling.pull_calculator.ExpandingPullCalculator(read_key, shift=1, features=None, store_key=None, suffix_mean='_exp_mean', suffix_std='_exp_std', suffix_pull='_exp_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on expanding mean and standard deviations

__init__(read_key, shift=1, features=None, store_key=None, suffix_mean='_exp_mean', suffix_std='_exp_std', suffix_pull='_exp_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • shift (int) – shift of the window, default is 1.

  • features (list) – list of features to calculate pull for. (optional)

  • store_key (str) – key of the output data to store in the datastore (optional)

  • suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean

  • suffix_std (str) – suffix of std. std column = metric + suffix_std

  • suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull

  • args – (tuple, optional): residual args passed on to mean and std functions

  • kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.profiling.pull_calculator.PullCalculator(func_mean, func_std, apply_to_key, assign_to_key=None, store_key=None, suffix_mean='_mean', suffix_std='_std', suffix_pull='_pull', features=None, *args, **kwargs)

Bases: Pipeline

Base module for pull calculation, based on mean and standard deviation calculation

Steps as performed by ApplyFunc modules:

  • calculate standard deviation seen in a metric.

  • calculate mean seen in a metric.

  • calculate pull of a metric as: pull = (metric - mean) / std.

The pull is then stored as a new column.

__init__(func_mean, func_std, apply_to_key, assign_to_key=None, store_key=None, suffix_mean='_mean', suffix_std='_std', suffix_pull='_pull', features=None, *args, **kwargs)

Initialize an instance of HistComparer.

Parameters
  • func_mean (str) – applied-function to calculate mean of profiled statistics

  • func_std (str) – applied-function to calculate std of profiled statistics

  • apply_to_key (str) – key of the input data to apply funcs to.

  • assign_to_key (str) – key of the input data to assign function applied-output to. (optional)

  • store_key (str) – key of the output data to store in the datastore (optional)

  • suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean. default is _mean.

  • suffix_std (str) – suffix of std. std column = metric + suffix_std. default is _std.

  • suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull. default is _pull.

  • features (list) – list of features to calculate pull for. default is all.

  • args – (tuple, optional): residual args passed on to func_mean and func_std

  • kwargs – (dict, optional): residual kwargs passed on to func_mean and func_std

class popmon.analysis.profiling.pull_calculator.RefMedianMadPullCalculator(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on reference median and mad

__init__(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters
  • reference_key (str) – key of input data to read from data store

  • assign_to_key (str) – key of output data to store in data store

  • store_key (str) – key of the output data to store in the datastore (optional)

  • features (list) – list of features to calculate pull for.

  • suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean

  • suffix_std (str) – suffix of std. std column = metric + suffix_std

  • suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull

  • args – (tuple, optional): residual args passed on to mean and std functions

  • kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.profiling.pull_calculator.ReferencePullCalculator(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on reference mean and standard deviations

__init__(reference_key, assign_to_key, store_key=None, features=None, suffix_mean='_ref_mean', suffix_std='_ref_std', suffix_pull='_ref_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters
  • reference_key (str) – key of input data to read from data store

  • assign_to_key (str) – key of output data to store in data store

  • store_key (str) – key of the output data to store in the datastore (optional)

  • features (list) – list of features to calculate pull for. (optional)

  • suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean

  • suffix_std (str) – suffix of std. std column = metric + suffix_std

  • suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull

  • args – (tuple, optional): residual args passed on to mean and std functions

  • kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.profiling.pull_calculator.RollingPullCalculator(read_key, window, shift=1, features=None, store_key=None, suffix_mean='_roll_mean', suffix_std='_roll_std', suffix_pull='_roll_pull', *args, **kwargs)

Bases: PullCalculator

Pull calculation based on rolling mean and standard deviations

__init__(read_key, window, shift=1, features=None, store_key=None, suffix_mean='_roll_mean', suffix_std='_roll_std', suffix_pull='_roll_pull', *args, **kwargs)

Initialize an instance of HistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • window (int) – size of rolling window

  • shift (int) – shift of the window, default is 1.

  • features (list) – list of features to calculate pull for.

  • store_key (str) – key of the output data to store in the datastore (optional)

  • suffix_mean (str) – suffix of mean. mean column = metric + suffix_mean

  • suffix_std (str) – suffix of std. std column = metric + suffix_std

  • suffix_pull (str) – suffix of pull. pull column = metric + suffix_pull

  • args – (tuple, optional): residual args passed on to mean and std functions

  • kwargs – (dict, optional): residual kwargs passed on to mean and std functions

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict