popmon.analysis.comparison package

Submodules

popmon.analysis.comparison.comparisons module

popmon.analysis.comparison.comparisons.googl_test(bins_1, bins_2)

Google-paper test

Reference link: https://mlsys.org/Conferences/2019/doc/2019/167.pdf

Parameters
  • bins_1 – first array of bin entries

  • bins_2 – second array of entries

Returns

maximum difference between the two entry distributions

Return type

float

popmon.analysis.comparison.comparisons.ks_prob(testscore)

KS-probability corresponding ti KS test score

Copyright ROOT: Formulas translated from c++ to python, but formulas otherwise not modified. Reference: https://root.cern.ch/doc/master/classTH1.html#TH1:KolmogorovTest GNU license: https://root.cern.ch/license All modifications copyright INGA WB.

Parameters

testscore (float) – Kolmogorov-Smirnov test score

Returns

approximate pvalue for the Kolmogorov-Smirnov test score

Return type

float

popmon.analysis.comparison.comparisons.ks_test(hist_1, hist_2)

KS-test for two histograms with different number of entries

Copyright ROOT: Formulas translated from c++ to python, but formulas otherwise not modified. Reference: link: https://root.cern.ch/doc/master/classTH1.html#TH1:KolmogorovTest GNU license: https://root.cern.ch/license All modifications copyright INGA WB.

Parameters
  • hist_1 – 1D array with bin counts of the histogram_1

  • hist_2 – 1D array with bin counts of the histogram_2

Returns

ks_score: Kolmogorov-Smirnov Test score

Return type

float

popmon.analysis.comparison.comparisons.uu_chi2(n, m)

Normalized Chi^2 formula for two histograms with different number of entries

Copyright ROOT: Formulas translated from c++ to python, but formulas otherwise not modified. Reference: https://root.cern.ch/doc/master/classTH1.html#a6c281eebc0c0a848e7a0d620425090a5 GNU License: https://root.cern.ch/license All modifications copyright INGA WB.

Parameters
  • n – 1d array with bin counts of the reference set

  • m – 1d array with bin counts of the test set

Returns

tuple of floats (chi2_value, chi2_norm, z_score, p_value, res)

popmon.analysis.comparison.hist_comparer module

class popmon.analysis.comparison.hist_comparer.ExpandingHistComparer(read_key, store_key, shift=1, hist_col='histogram', suffix='expanding')

Bases: HistComparer

Compare histogram to previous expanding histograms

__init__(read_key, store_key, shift=1, hist_col='histogram', suffix='expanding')

Initialize an instance of ExpandingHistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • shift (int) – shift of rolling window. default is 1.

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

  • suffix (str) – column/key of rolling histogram. default is ‘expanding’ -> column = ‘histogram_expanding’

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.comparison.hist_comparer.ExpandingNormHistComparer(read_key, store_key, shift=1, hist_col='histogram')

Bases: NormHistComparer

Compare histogram to previous expanding normalized histograms

__init__(read_key, store_key, shift=1, hist_col='histogram')

Initialize an instance of ExpandingNormHistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • shift (int) – shift of rolling window. default is 1.

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.comparison.hist_comparer.HistComparer(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', suffix='comp', *args, **kwargs)

Bases: Pipeline

Base pipeline to compare histogram to previous rolling histograms

__init__(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', suffix='comp', *args, **kwargs)

Initialize an instance of RollingHistComparer.

Parameters
  • func_hist_collector – histogram collection function

  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • assign_to_key (str) – key of the input data to assign function applied-output to. (optional)

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

  • suffix (str) – column/key of rolling histogram. default is ‘roll’ -> column = ‘histogram_roll’

  • args – (tuple, optional): residual args passed on to func_mean and func_std

  • kwargs – (dict, optional): residual kwargs passed on to func_mean and func_std

class popmon.analysis.comparison.hist_comparer.NormHistComparer(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', *args, **kwargs)

Bases: Pipeline

Base pipeline to compare histogram to normalized histograms

__init__(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', *args, **kwargs)

Initialize an instance of NormHistComparer.

Parameters
  • func_hist_collector – histogram collection function

  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • assign_to_key (str) – key of the input data to assign function applied-output to. (optional)

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

  • args – (tuple, optional): residual args passed on to func_hist_collector

  • kwargs – (dict, optional): residual kwargs passed on to func_hist_collector

class popmon.analysis.comparison.hist_comparer.PreviousHistComparer(read_key, store_key, hist_col='histogram', suffix='prev1')

Bases: RollingHistComparer

Compare histogram to previous histograms

__init__(read_key, store_key, hist_col='histogram', suffix='prev1')

Initialize an instance of PreviousHistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

  • suffix (str) – column/key of rolling histogram. default is ‘prev’ -> column = ‘histogram_prev’

class popmon.analysis.comparison.hist_comparer.ReferenceHistComparer(reference_key, assign_to_key, store_key, hist_col='histogram', suffix='ref')

Bases: HistComparer

Compare histogram to reference histograms

__init__(reference_key, assign_to_key, store_key, hist_col='histogram', suffix='ref')

Initialize an instance of ReferenceHistComparer.

Parameters
  • reference_key (str) – key of input data to read from data store

  • assign_to_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

  • suffix (str) – column/key of rolling histogram. default is ‘ref’ -> column = ‘histogram_ref’

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.comparison.hist_comparer.ReferenceNormHistComparer(reference_key, assign_to_key, store_key, hist_col='histogram')

Bases: NormHistComparer

Compare histogram to reference normalized histograms

__init__(reference_key, assign_to_key, store_key, hist_col='histogram')

Initialize an instance of ReferenceNormHistComparer.

Parameters
  • reference_key (str) – key of input data to read from data store

  • assign_to_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.comparison.hist_comparer.RollingHistComparer(read_key, store_key, window, shift=1, hist_col='histogram', suffix='roll')

Bases: HistComparer

Compare histogram to previous rolling histograms

__init__(read_key, store_key, window, shift=1, hist_col='histogram', suffix='roll')

Initialize an instance of RollingHistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • window (int) – size of rolling window

  • shift (int) – shift of rolling window. default is 1.

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

  • suffix (str) – column/key of rolling histogram. default is ‘roll’ -> column = ‘histogram_roll’

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

class popmon.analysis.comparison.hist_comparer.RollingNormHistComparer(read_key, store_key, window, shift=1, hist_col='histogram')

Bases: NormHistComparer

Compare histogram to previous rolling normalized histograms

__init__(read_key, store_key, window, shift=1, hist_col='histogram')

Initialize an instance of RollingNormHistComparer.

Parameters
  • read_key (str) – key of input data to read from data store

  • store_key (str) – key of output data to store in data store

  • window (int) – size of rolling window

  • shift (int) – shift of rolling window. default is 1.

  • hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’

transform(datastore)

Central function of the pipeline.

Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.

Parameters

datastore (dict) – input datastore

Returns

updated output datastore

Return type

dict

popmon.analysis.comparison.hist_comparer.hist_compare(row, hist_name1='', hist_name2='')

Function to compare two histograms

Apply statistical tests to compare two input histograms, such as: Chi2, KS, Pearson, max probability difference. For categorical histograms, also check for unknown labels.

Parameters
  • row (pd.Series) – row to apply compare function to

  • hist_name1 (str) – name of histogram one to compare

  • hist_name2 (str) – name of histogram two to compare

Returns

pandas Series with popular comparison metrics.