popmon.pipeline package
Submodules
popmon.pipeline.amazing_pipeline module
- class popmon.pipeline.amazing_pipeline.AmazingPipeline(histogram_path, **kwargs)
Bases:
Pipeline
- __init__(histogram_path, **kwargs)
Initialization of the pipeline
- Parameters
modules (list) – modules of the pipeline.
logger – logger to be used by each module.
- popmon.pipeline.amazing_pipeline.run()
Example that run self-reference pipeline and produces monitoring report
popmon.pipeline.dataset_splitter module
- popmon.pipeline.dataset_splitter.split_dataset(dataset, split, time_axis)
Split a dataset into a reference and remaining part based on split params.
- Parameters
dataset (pd.Dataset|pyspark.sql.Dataset) – dataset as input
split (Any) – split details, meaning depends on the type: if integer, then the reference will be the first
split
instances if float, thensplit
will be used as ration (e.g. 0.5 returns a 50/50 split) otherwise, thesplit
are interpreted as condition, where the records for which the condition is true are considered the reference, and the other records the remaining dataset.time_axis (
str
) – the time axis
- Returns
tuple of reference, dataset
popmon.pipeline.metrics module
- popmon.pipeline.metrics.df_stability_metrics(df, settings=None, time_width=None, time_offset=0, var_dtype=None, reference=None, **kwargs)
Create a data stability monitoring html datastore for given pandas or spark dataframe.
- Parameters
df – input pandas/spark dataframe to be profiled and monitored over time.
settings (popmon.config.Settings) – popmon configuration object
time_width –
bin width of time axis. str or number (ns). note: bin_specs takes precedence. (optional)
Examples: '1w', 3600e9 (number of ns), anything understood by pd.Timedelta(time_width).value
time_offset –
bin offset of time axis. str or number (ns). note: bin_specs takes precedence. (optional)
Examples: '1-1-2020', 0 (number of ns since 1-1-1970), anything parsed by pd.Timestamp(time_offset).value
var_dtype (dict) – dictionary with specified datatype per feature. auto-guessed when not provided.
reference – reference dataframe or histograms. default is None
kwargs – residual keyword arguments, passed on to stability_report()
- Returns
dict with results of metrics pipeline
- popmon.pipeline.metrics.stability_metrics(hists, settings, reference=None)
Create a data stability monitoring datastore for given dict of input histograms.
- Parameters
hists (dict) – input histograms to be profiled and monitored over time.
settings (popmon.config.Settings) – popmon configuration object
reference – histograms used as reference. default is None
- Returns
dict with results of metrics pipeline
popmon.pipeline.metrics_pipelines module
- class popmon.pipeline.metrics_pipelines.ExpandingReferenceMetricsPipeline(settings, hists_key='test_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists')
Example metrics pipeline for comparing test data with itself (expanding test set)
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
- Returns
assembled expanding reference pipeline
- class popmon.pipeline.metrics_pipelines.ExternalReferenceMetricsPipeline(settings, hists_key='test_hists', ref_hists_key='ref_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists', ref_hists_key='ref_hists')
Example metrics pipeline for comparing test data with other (full) external reference set
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
ref_hists_key (str) – key to reference histograms in datastore. default is ‘ref_hists’
- Returns
assembled external reference pipeline
- class popmon.pipeline.metrics_pipelines.RollingReferenceMetricsPipeline(settings, hists_key='test_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists')
Example metrics pipeline for comparing test data with itself (rolling test set)
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
- Returns
assembled rolling reference pipeline
- class popmon.pipeline.metrics_pipelines.SelfReferenceMetricsPipeline(settings, hists_key)
Bases:
Pipeline
- __init__(settings, hists_key)
Example metrics pipeline for comparing test data with itself (full test set)
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
- Returns
assembled self reference pipeline
- popmon.pipeline.metrics_pipelines.get_dynamic_bound_modules(pull_rules)
Generate dynamic traffic light boundaries, based on traffic lights for normalized residuals, used for plotting in popmon_profiles report.
- popmon.pipeline.metrics_pipelines.get_splitting_modules(hists_key, features, time_axis)
Splitting of test histograms. For each histogram with datetime i, comparison of histogram i with histogram i-1, results in chi2 comparison of histograms
- popmon.pipeline.metrics_pipelines.get_static_bound_modules(pull_rules)
generate dynamic traffic light boundaries, based on traffic lights for normalized residuals, used for plotting in popmon_profiles report.
- popmon.pipeline.metrics_pipelines.get_traffic_light_modules(monitoring_rules)
Expand all (wildcard) static traffic light bounds and apply them. Applied to both profiles and comparisons datasets
popmon.pipeline.report module
- class popmon.pipeline.report.StabilityReport(datastore, read_key='html_report')
Bases:
object
Representation layer of the report.
Stability report module wraps the representation functionality of the report after running the pipeline and generating the report. Report can be represented as a HTML string, HTML file or Jupyter notebook’s cell output.
- __init__(datastore, read_key='html_report')
Initialize an instance of StabilityReport.
- Parameters
read_key (str) – key of HTML report data to read from data store. default is html_report.
- regenerate(store_key='html_report', sections_key='report_sections', settings=None)
Regenerate HTML report with different plot settings :param str sections_key: key to store sections data in the datastore. default is ‘report_sections’. :param str store_key: key to store the HTML report data in the datastore. default is ‘html_report’ :param Settings settings: configuration to regenerate the report :return HTML: HTML report in an iframe
- to_file(filename)
Store HTML report in the local file system.
- Parameters
filename (str) – filename for the HTML report
- to_html(escape=False)
HTML code representation of the report (represented as a string).
- Parameters
escape (bool) – escape characters which could conflict with other HTML code. default: False
- Return str
HTML code of the report
- to_notebook_iframe(width='100%', height='100%')
HTML representation of the class (report) embedded in an iframe.
- Parameters
width (str) – width of the frame to be shown
height (str) – height of the frame to be shown
- Return HTML
HTML report in an iframe
- popmon.pipeline.report.df_stability_report(df, settings=None, time_width=None, time_offset=0, var_dtype=None, reference=None, split=None, **kwargs)
Create a data stability monitoring html report for given pandas or spark dataframe.
- Parameters
df – input pandas/spark dataframe to be profiled and monitored over time.
settings (popmon.config.Settings) – popmon configuration object
time_width –
bin width of time axis. str or number (ns). note: bin_specs takes precedence. (optional)
Examples: '1w', 3600e9 (number of ns), anything understood by pd.Timedelta(time_width).value
time_offset –
bin offset of time axis. str or number (ns). note: bin_specs takes precedence. (optional)
Examples: '1-1-2020', 0 (number of ns since 1-1-1970), anything parsed by pd.Timestamp(time_offset).value
var_dtype (dict) – dictionary with specified datatype per feature. auto-guessed when not provided.
reference – reference dataframe or histograms. default is None
- Returns
dict with results of reporting pipeline
- popmon.pipeline.report.stability_report(hists, settings=None, reference=None, **kwargs)
Create a data stability monitoring html report for given dict of input histograms.
- Parameters
hists (dict) – input histograms to be profiled and monitored over time.
settings (popmon.config.Settings) – popmon configuration object
reference – histograms used as reference. default is None
kwargs – when settings=None, parameters such as features and time_axis can be passed
- Returns
dict with results of reporting pipeline
popmon.pipeline.report_pipelines module
- class popmon.pipeline.report_pipelines.ExpandingReference(settings, hists_key='test_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists')
Example pipeline for comparing test data with itself (expanding test set)
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
- Returns
assembled expanding reference pipeline
- class popmon.pipeline.report_pipelines.ExternalReference(settings, hists_key='test_hists', ref_hists_key='ref_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists', ref_hists_key='ref_hists')
Example pipeline for comparing test data with other (full) external reference set
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
ref_hists_key (str) – key to reference histograms in datastore. default is ‘ref_hists’
- Returns
assembled external reference pipeline
- class popmon.pipeline.report_pipelines.ReportPipe(settings, sections_key='report_sections', store_key='html_report')
Bases:
Pipeline
Pipeline of modules for generating sections and a final report.
- __init__(settings, sections_key='report_sections', store_key='html_report')
Initialize an instance of Report.
- Parameters
settings (Settings) – the configuration object
sections_key (str) – key to store sections data in the datastore
store_key (str) – key to store the HTML report data in the datastore
- transform(datastore)
Central function of the pipeline.
Calls transform() of each module in the pipeline. Typically, transform() of a module takes something from the datastore, does something to it, and puts the results back into the datastore again, to be passed on to the next module in the pipeline.
- Parameters
datastore (dict) – input datastore
- Returns
updated output datastore
- Return type
dict
- class popmon.pipeline.report_pipelines.RollingReference(settings, hists_key='test_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists')
Example pipeline for comparing test data with itself (rolling test set)
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
- Returns
assembled rolling reference pipeline
- class popmon.pipeline.report_pipelines.SelfReference(settings, hists_key='test_hists')
Bases:
Pipeline
- __init__(settings, hists_key='test_hists')
Example pipeline for comparing test data with itself (full test set)
- Parameters
hists_key (str) – key to test histograms in datastore. default is ‘test_hists’
- Returns
assembled self reference pipeline