Profiles

Profiles: tracking a metric over time

Available profiles

The following metrics are implemented:

Dimension

Histogram Type

Metric

Any

Any

Count

Any

Any

Entropy

1D

Any

Filled

1D

Any

Distinct

1D

Any

Underflow, Overflow

1D

Any

NaN

1D

Any

Mode

1D

Numeric

Mean

1D

Numeric

1%, 5%, 16%, 50% (median), 84%, 95%, 99% percentiles

1D

Numeric

Standard deviation

1D

Numeric

Min, Max

1D

Categorical

Fraction of True

2D

Any

PhiK Correlation

The comparisons registry can be consulted for available comparisons:

from popmon.analysis import Profiles

print(Profiles.get_keys())

Profile extensions

How to enable profile extensions:

  • Install the required package(s). This can be achieved via popmon’s extras: pip install popmon[extension_name]

  • To show the profile values in your report:
    • Include the relevant values to the show_stats list: settings.report.show_stats.append("[value_prefix]*") or

    • Show all statistics: settings.report.extended_report = True

Available profile extensions:

  • diptest: Hartigan & Hartigan’s dip test for unimodality. Available for 1D numerical histograms. Uses the diptest package by Ralph Urlus.

Custom profiles

Tracking custom metrics over time is easy. The following code snippet registers a new metric to popmon.

import numpy as np

from popmon.analysis.profiling.profiles import Profiles


@Profiles.register(key="name_of_profile", description="<description_for_report>", dim=2)
def your_profile_function_name(hist) -> float:
    """Write your function to profile the histogram."""
    return np.sum(p)

Variations:

  • A profile function may return multiple values for efficiency (e.g. quantiles do not need to be computed)

@Profiles.register(
    key=["key1", "key2"], description=["Statistic 1", "Statistic 2"], dim=None
)
def your_profile_function_name(hist) -> float:
    result1, result2 = your_logic(hist)
    return result1, result2
  • A profile may work on the histogram, or on the value counts/labels (also for efficiency). This occurs when the htype parameter is passed (1D only)

@Profiles.register(
    key="name_of_profile", description="<description_for_report>", dim=1, htype="all"
)
def your_profile_function_name(bin_labels, bin_counts) -> float:
    return bin_counts.sum()
  • Profiles may depend on variable type (possible values for htype: num, cat, all).

@Profiles.register(
    key="name_of_profile", description="<description_for_report>", dim=1, htype="num"
)
def your_profile_function_name(bin_labels, bin_counts) -> float:
    return bin_counts.sum()

If you developed a custom profiles that could be generically used, then please considering contributing it to the package.