pycanon.anonymity.utils package#
Submodules#
pycanon.anonymity.utils.aux_anonymity module#
Module with different functions which calculate properties about anonymity.
k-anonymity, (alpha,k)-anonymity, l-diversity, entropy l-diversity, (c,l)-diversity, basic beta-likeness, enhanced beta-likeness, t-closeness and delta-disclosure privacy.
- pycanon.anonymity.utils.aux_anonymity.aux_calculate_beta(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: str) Tuple[ndarray, list] #
Beta calculation for basic and enhanced beta-likeness.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
sens_att_value (string) – sensitive attribute under study.
- Returns:
proportion of each value of the sensitive attribute in the entire database and distance from the proportion in each equivalence class.
- Return type:
np.array and list.
- pycanon.anonymity.utils.aux_anonymity.aux_calculate_delta_disclosure(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: str) float #
Delta calculation for delta-disclosure privacy.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
sens_att_value (string) – sensitive attribute under study.
- Returns:
delta for the introduced SA.
- Return type:
float.
- pycanon.anonymity.utils.aux_anonymity.aux_t_closeness_num(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: str) float #
Obtain t for t-closeness.
Function used for numerical attributes: the definition of the EMD is used.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
sens_att_value (string) – sensitive attribute under study.
- Returns:
t for the introduced SA (numerical).
- Return type:
float.
- pycanon.anonymity.utils.aux_anonymity.aux_t_closeness_str(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: list) float #
Obtain t for for t-closeness.
Function used for categorical attributes: the metric “Equal Distance” is used.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
sens_att_value (string) – sensitive attribute under study.
- Returns:
t for the introduced SA (categorical).
- Return type:
float.
- pycanon.anonymity.utils.aux_anonymity.get_equiv_class(data: DataFrame, quasi_ident: Union[list, ndarray]) list #
Find the equivalence classes present in the dataset.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
quasi_ident (is a list of strings) – list with the name of the columns of the dataframe that are the quasi-identifiers.
- Returns:
equivalence classes.
- Return type:
list.
pycanon.anonymity.utils.aux_functions module#
Module with different auxiliary functions.
- pycanon.anonymity.utils.aux_functions.check_qi(data: DataFrame, quasi_ident: Union[List, ndarray]) None #
Check if the entered quasi-identifiers are valid.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.
- pycanon.anonymity.utils.aux_functions.check_sa(data: DataFrame, sens_att: Union[List, ndarray]) None #
Check if the entered sensitive attributes are valid.
- Parameters:
data (pandas dataframe) – dataframe with the data under study.
sens_att (is a list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.
- pycanon.anonymity.utils.aux_functions.convert(ec_set: set) list #
Convert a set with an equivalence class to a list.
- Parameters:
ec_set (set) – set which will be convert into a list.
- Returns:
equivalence class into a list.
- Return type:
list.
- pycanon.anonymity.utils.aux_functions.intersect(tmp: list) list #
Intersect two sets: the first and the second of the given list.
- Parameters:
tmp (list of numpy arrays) – list of sets sorted in decreasing order of cardinality
- Returns:
list obtained when intersecting the first and the second sets of the given list.
- Return type:
list.
- pycanon.anonymity.utils.aux_functions.read_file(file_name: Union[str, Path], sep: str = ',') DataFrame #
Read the given file. Returns a pandas dataframe.
- Parameters:
file_name (string or pathlib.Path) – file with the data under study.
sep (string) – delimiter to use for a csv file.
- Returns:
dataframe with the data.
- Return type:
pandas dataframe.
Module contents#
Package containing auxiliary functions related with privacy models.