pycanon.utility package#

Module contents#

Module with different functions for calculating the utility.

pycanon.utility.average_ecsize(data_raw: DataFrame, data_anon: DataFrame, quasi_ident: Union[List, ndarray], sup=True) float#

Calculate the metric average equivalence class size.

Parameters:
  • data_raw (pandas dataframe) – dataframe with the data raw under study.

  • data_anon (pandas dataframe) – dataframe with the data anonymized.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sup (boolean) – boolean, default to True. If true, suppression has been applied to the original dataset (some records may have been deleted).

pycanon.utility.classification_metric(data_raw: DataFrame, data_anon: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray]) float#

Calculate the classification metric.

Parameters:
  • data_raw (pandas dataframe) – dataframe with the data raw under study.

  • data_anon (pandas dataframe) – dataframe with the data anonymized.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

pycanon.utility.discernability_metric(data_raw: DataFrame, data_anon: DataFrame, quasi_ident: Union[List, ndarray]) float#

Calculate the discernability metric.

Parameters:
  • data_raw (pandas dataframe) – dataframe with the data raw under study.

  • data_anon – dataframe with the data anonymized. Assuming that all the

equivalence classes have more than k records, and given each suppressed record a penalty of the size of the input dataset. :type data_anon: pandas dataframe

Parameters:

quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

pycanon.utility.sizes_ec(data: DataFrame, quasi_ident: Union[List, ndarray]) dict#

Calculate statistics associated to the equivalence classes.

Parameters:
  • data (pandas dataframe) – dataframe with the data anonymized.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

pycanon.utility.stats_quasi_ident(data: DataFrame, quasi_ident: str) dict#

Calculate statistics associated to a given quasi-identifier.

Parameters:
  • data (pandas dataframe) – dataframe with the data anonymized.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.