pycanon.anonymity package#

Subpackages#

Module contents#

Module with different functions which calculate properties about anonymity.

k-anonymity, (alpha,k)-anonymity, l-diversity, entropy l-diversity, (c,l)-diversity, basic beta-likeness, enhanced beta-likeness, t-closeness and delta-disclosure privacy.

pycanon.anonymity.alpha_k_anonymity(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) Tuple[float, int]#

Calculate alpha and k for (alpha,k)-anonymity.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

alpha and k values for (alpha,k)-anonymity.

Return type:

alpha is a float, k is an int.

pycanon.anonymity.basic_beta_likeness(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) float#

Calculate beta for basic beta-likeness.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

beta value for basic beta-likeness.

Return type:

float.

pycanon.anonymity.delta_disclosure(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) float#

Calculate delta for delta-disclousure privacy.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

delta value for delta-discloure privacy.

Return type:

float.

pycanon.anonymity.enhanced_beta_likeness(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) float#

Calculate beta for enhanced beta-likeness.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

beta value for enhanced beta-likeness.

Return type:

float.

pycanon.anonymity.entropy_l_diversity(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) float#

Calculate l for entropy l-diversity.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

l value for entropy l-diversity.

Return type:

float.

pycanon.anonymity.k_anonymity(data: DataFrame, quasi_ident: Union[List, ndarray]) int#

Calculate k for k-anonymity.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

Returns:

k value for k-anonymity.

Return type:

int.

pycanon.anonymity.l_diversity(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) int#

Calculate l for l-diversity.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

l value for l-diversity.

Return type:

int.

pycanon.anonymity.recursive_c_l_diversity(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], imp=False, gen=True) Tuple[float, int]#

Calculate c and l for recursive (c,l)-diversity.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

c and l values for recursive (c,l)-diversity.

Return type:

c is a float, l is an int.

pycanon.anonymity.t_closeness(data: DataFrame, quasi_ident: Union[List, ndarray], sens_att: Union[List, ndarray], gen=True) float#

Calculate t for t-closeness.

Parameters:
  • data (pandas dataframe) – dataframe with the data under study.

  • quasi_ident (list of strings) – list with the name of the columns of the dataframe that are quasi-identifiers.

  • sens_att (list of strings) – list with the name of the columns of the dataframe that are the sensitive attributes.

  • gen (boolean) – boolean, default to True. If true, it is generalized for the case of multiple SA, if False, the set of QI is updated for each SA

Returns:

t value for basic t-closeness.

Return type:

float.