popmon.analysis.comparison package
Submodules
popmon.analysis.comparison.comparisons module
- popmon.analysis.comparison.comparisons.googl_test(bins_1, bins_2)
Google-paper test
Reference link: https://mlsys.org/Conferences/2019/doc/2019/167.pdf
- Parameters
bins_1 – first array of bin entries
bins_2 – second array of entries
- Returns
maximum difference between the two entry distributions
- Return type
float
- popmon.analysis.comparison.comparisons.ks_prob(testscore)
KS-probability corresponding ti KS test score
Copyright ROOT: Formulas translated from c++ to python, but formulas otherwise not modified. Reference: https://root.cern.ch/doc/master/classTH1.html#TH1:KolmogorovTest GNU license: https://root.cern.ch/license All modifications copyright INGA WB.
- Parameters
testscore (float) – Kolmogorov-Smirnov test score
- Returns
approximate pvalue for the Kolmogorov-Smirnov test score
- Return type
float
- popmon.analysis.comparison.comparisons.ks_test(hist_1, hist_2)
KS-test for two histograms with different number of entries
Copyright ROOT: Formulas translated from c++ to python, but formulas otherwise not modified. Reference: link: https://root.cern.ch/doc/master/classTH1.html#TH1:KolmogorovTest GNU license: https://root.cern.ch/license All modifications copyright INGA WB.
- Parameters
hist_1 – 1D array with bin counts of the histogram_1
hist_2 – 1D array with bin counts of the histogram_2
- Returns
ks_score: Kolmogorov-Smirnov Test score
- Return type
float
- popmon.analysis.comparison.comparisons.uu_chi2(n, m)
Normalized Chi^2 formula for two histograms with different number of entries
Copyright ROOT: Formulas translated from c++ to python, but formulas otherwise not modified. Reference: https://root.cern.ch/doc/master/classTH1.html#a6c281eebc0c0a848e7a0d620425090a5 GNU License: https://root.cern.ch/license All modifications copyright INGA WB.
- Parameters
n – 1d array with bin counts of the reference set
m – 1d array with bin counts of the test set
- Returns
tuple of floats (chi2_value, chi2_norm, z_score, p_value, res)
popmon.analysis.comparison.hist_comparer module
- class popmon.analysis.comparison.hist_comparer.ExpandingHistComparer(read_key, store_key, shift=1, hist_col='histogram', suffix='expanding')
Bases:
HistComparer
Compare histogram to previous expanding histograms
- __init__(read_key, store_key, shift=1, hist_col='histogram', suffix='expanding')
Initialize an instance of ExpandingHistComparer.
- Parameters
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
shift (int) – shift of rolling window. default is 1.
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
suffix (str) – column/key of rolling histogram. default is ‘expanding’ -> column = ‘histogram_expanding’
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.comparison.hist_comparer.ExpandingNormHistComparer(read_key, store_key, shift=1, hist_col='histogram')
Bases:
NormHistComparer
Compare histogram to previous expanding normalized histograms
- __init__(read_key, store_key, shift=1, hist_col='histogram')
Initialize an instance of ExpandingNormHistComparer.
- Parameters
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
shift (int) – shift of rolling window. default is 1.
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.comparison.hist_comparer.HistComparer(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', suffix='comp', *args, **kwargs)
Bases:
Pipeline
Base pipeline to compare histogram to previous rolling histograms
- __init__(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', suffix='comp', *args, **kwargs)
Initialize an instance of RollingHistComparer.
- Parameters
func_hist_collector – histogram collection function
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
assign_to_key (str) – key of the input data to assign function applied-output to. (optional)
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
suffix (str) – column/key of rolling histogram. default is ‘roll’ -> column = ‘histogram_roll’
args – (tuple, optional): residual args passed on to func_mean and func_std
kwargs – (dict, optional): residual kwargs passed on to func_mean and func_std
- class popmon.analysis.comparison.hist_comparer.NormHistComparer(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', *args, **kwargs)
Bases:
Pipeline
Base pipeline to compare histogram to normalized histograms
- __init__(func_hist_collector, read_key, store_key, assign_to_key=None, hist_col='histogram', *args, **kwargs)
Initialize an instance of NormHistComparer.
- Parameters
func_hist_collector – histogram collection function
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
assign_to_key (str) – key of the input data to assign function applied-output to. (optional)
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
args – (tuple, optional): residual args passed on to func_hist_collector
kwargs – (dict, optional): residual kwargs passed on to func_hist_collector
- class popmon.analysis.comparison.hist_comparer.PreviousHistComparer(read_key, store_key, hist_col='histogram', suffix='prev1')
Bases:
RollingHistComparer
Compare histogram to previous histograms
- __init__(read_key, store_key, hist_col='histogram', suffix='prev1')
Initialize an instance of PreviousHistComparer.
- Parameters
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
suffix (str) – column/key of rolling histogram. default is ‘prev’ -> column = ‘histogram_prev’
- class popmon.analysis.comparison.hist_comparer.ReferenceHistComparer(reference_key, assign_to_key, store_key, hist_col='histogram', suffix='ref')
Bases:
HistComparer
Compare histogram to reference histograms
- __init__(reference_key, assign_to_key, store_key, hist_col='histogram', suffix='ref')
Initialize an instance of ReferenceHistComparer.
- Parameters
reference_key (str) – key of input data to read from data store
assign_to_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
suffix (str) – column/key of rolling histogram. default is ‘ref’ -> column = ‘histogram_ref’
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.comparison.hist_comparer.ReferenceNormHistComparer(reference_key, assign_to_key, store_key, hist_col='histogram')
Bases:
NormHistComparer
Compare histogram to reference normalized histograms
- __init__(reference_key, assign_to_key, store_key, hist_col='histogram')
Initialize an instance of ReferenceNormHistComparer.
- Parameters
reference_key (str) – key of input data to read from data store
assign_to_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.comparison.hist_comparer.RollingHistComparer(read_key, store_key, window, shift=1, hist_col='histogram', suffix='roll')
Bases:
HistComparer
Compare histogram to previous rolling histograms
- __init__(read_key, store_key, window, shift=1, hist_col='histogram', suffix='roll')
Initialize an instance of RollingHistComparer.
- Parameters
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
window (int) – size of rolling window
shift (int) – shift of rolling window. default is 1.
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
suffix (str) – column/key of rolling histogram. default is ‘roll’ -> column = ‘histogram_roll’
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.analysis.comparison.hist_comparer.RollingNormHistComparer(read_key, store_key, window, shift=1, hist_col='histogram')
Bases:
NormHistComparer
Compare histogram to previous rolling normalized histograms
- __init__(read_key, store_key, window, shift=1, hist_col='histogram')
Initialize an instance of RollingNormHistComparer.
- Parameters
read_key (str) – key of input data to read from data store
store_key (str) – key of output data to store in data store
window (int) – size of rolling window
shift (int) – shift of rolling window. default is 1.
hist_col (str) – column/key in input df/dict that contains the histogram. default is ‘histogram’
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- popmon.analysis.comparison.hist_comparer.hist_compare(row, hist_name1='', hist_name2='')
Function to compare two histograms
Apply statistical tests to compare two input histograms, such as: Chi2, KS, Pearson, max probability difference. For categorical histograms, also check for unknown labels.
- Parameters
row (pd.Series) – row to apply compare function to
hist_name1 (str) – name of histogram one to compare
hist_name2 (str) – name of histogram two to compare
- Returns
pandas Series with popular comparison metrics.