pycanon.utility package#
Module contents#
Module with different functions for calculating the utility.
- pycanon.utility.average_ecsize(data_raw: DataFrame, data_anon: DataFrame, quasi_ident: Union[List, ndarray], sup=True) float #
Calculate the metric average equivalence class size.
- Parameters:
data_raw (pandas dataframe) – dataframe with the data raw under study.
data_anon (pandas dataframe) – dataframe with the data anonymized.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
sup (boolean) – boolean, default to True. If true, suppression has been applied to the original dataset (some records may have been deleted).
- pycanon.utility.classification_metric(data_raw: DataFrame, data_anon: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray]) float #
Calculate the classification metric.
- Parameters:
data_raw (pandas dataframe) – dataframe with the data raw under study.
data_anon (pandas dataframe) – dataframe with the data anonymized.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.
- pycanon.utility.discernability_metric(data_raw: DataFrame, data_anon: DataFrame, quasi_ident: Union[List, ndarray]) float #
Calculate the discernability metric.
- Parameters:
data_raw (pandas dataframe) – dataframe with the data raw under study.
data_anon – dataframe with the data anonymized. Assuming that all the
equivalence classes have more than k records, and given each suppressed record a penalty of the size of the input dataset. :type data_anon: pandas dataframe
- Parameters:
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
- pycanon.utility.sizes_ec(data: DataFrame, quasi_ident: Union[List, ndarray]) dict #
Calculate statistics associated to the equivalence classes.
- Parameters:
data (pandas dataframe) – dataframe with the data anonymized.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
- pycanon.utility.stats_quasi_ident(data: DataFrame, quasi_ident: str) dict #
Calculate statistics associated to a given quasi-identifier.
- Parameters:
data (pandas dataframe) – dataframe with the data anonymized.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.