pycanon.anonymity.utils package#

Submodules#

pycanon.anonymity.utils.aux_anonymity module#

Module with different functions which calculate properties about anonymity.

k-anonymity, (alpha,k)-anonymity, l-diversity, entropy l-diversity, (c,l)-diversity, basic beta-likeness, enhanced beta-likeness, t-closeness and delta-disclosure privacy.

pycanon.anonymity.utils.aux_anonymity.aux_calculate_beta(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: str) Tuple[ndarray, list]#

Beta calculation for basic and enhanced beta-likeness.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att_value (string) – sensitive attribute under study.

Returns:

proportion of each value of the sensitive attribute in the entire database and distance from the proportion in each equivalence class.

Return type:

np.array and list.

pycanon.anonymity.utils.aux_anonymity.aux_calculate_delta_disclosure(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: str) float#

Delta calculation for delta-disclosure privacy.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att_value (string) – sensitive attribute under study.

Returns:

delta for the introduced SA.

Return type:

float.

pycanon.anonymity.utils.aux_anonymity.aux_t_closeness_num(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: str) float#

Obtain t for t-closeness.

Function used for numerical attributes: the definition of the EMD is used.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att_value (string) – sensitive attribute under study.

Returns:

t for the introduced SA (numerical).

Return type:

float.

pycanon.anonymity.utils.aux_anonymity.aux_t_closeness_str(data: DataFrame, quasi_ident: Union[list, ndarray], sens_att_value: list) float#

Obtain t for for t-closeness.

Function used for categorical attributes: the metric “Equal Distance” is used.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att_value (string) – sensitive attribute under study.

Returns:

t for the introduced SA (categorical).

Return type:

float.

pycanon.anonymity.utils.aux_anonymity.get_equiv_class(data: DataFrame, quasi_ident: Union[list, ndarray]) list#

Find the equivalence classes present in the dataset.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (is a list of strings) – list with the name of the columns of the dataframe that are the quasi-identifiers.

Returns:

equivalence classes.

Return type:

list.

pycanon.anonymity.utils.aux_functions module#

Module with different auxiliary functions.

pycanon.anonymity.utils.aux_functions.check_qi(data: DataFrame, quasi_ident: Union[List, ndarray]) None#

Check if the entered quasi-identifiers are valid.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

pycanon.anonymity.utils.aux_functions.check_sa(data: DataFrame, sens_att: Union[List, ndarray]) None#

Check if the entered sensitive attributes are valid.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • sens_att (is a list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

pycanon.anonymity.utils.aux_functions.convert(ec_set: set) list#

Convert a set with an equivalence class to a list.

Parameters:

ec_set (set) – set which will be convert into a list.

Returns:

equivalence class into a list.

Return type:

list.

pycanon.anonymity.utils.aux_functions.intersect(tmp: list) list#

Intersect two sets: the first and the second of the given list.

Parameters:

tmp (list of numpy arrays) – list of sets sorted in decreasing order of cardinality

Returns:

list obtained when intersecting the first and the second sets of the given list.

Return type:

list.

pycanon.anonymity.utils.aux_functions.read_file(file_name: Union[str, Path], sep: str = ',') DataFrame#

Read the given file. Returns a pandas dataframe.

Parameters:
  • file_name (string or pathlib.Path) – file with the data under study.

  • sep (string) – delimiter to use for a csv file.

Returns:

dataframe with the data.

Return type:

pandas dataframe.

Module contents#

Package containing auxiliary functions related with privacy models.