nnsa.edfreadpy.anonymization package

Submodules

nnsa.edfreadpy.anonymization.anonymization module

Module for anonymization of data files/records.

Functions:

compute_anonymized_date(date, id, seed_offset)

Return an anonymized date based on original date and ID (e.g.

compute_days_to_shift(id, seed_offset)

Compute the number of days to shift a date (e.g.

extract_patient_file_id(filename[, base_id])

Takes a file name and extracts the patient file id from it, assuming some general structuring of the filenames.

nnsa.edfreadpy.anonymization.anonymization.compute_anonymized_date(date, id, seed_offset)[source]

Return an anonymized date based on original date and ID (e.g. a patient file ID) by adding a random number of days, using the specified ID as a seed, such that the same number of days will be added from dates with the same id.

Parameters:
  • date (datetime.date) – date to anonymize.

  • id (str) – string that is used to seed the random generator.

  • seed_offset (int) – offset to the seed that is determined by id.

Returns:

anonymized_date (datetime.date) – random (anonymized) date.

nnsa.edfreadpy.anonymization.anonymization.compute_days_to_shift(id, seed_offset)[source]

Compute the number of days to shift a date (e.g. by adding) when the dates needs to be converted to its anonymized date. The number of days is chosen randomly, while first seeding the random generator using the specified id, such that this function returns the same output when called with the same id.

Parameters:

id (str) – id that determines the anonymization (runs with same id will return equal numbers).

Returns:

(int) – random number (seeded).

nnsa.edfreadpy.anonymization.anonymization.extract_patient_file_id(filename, base_id=None)[source]

Takes a file name and extracts the patient file id from it, assuming some general structuring of the filenames. E.g. extract_patient_file_id(‘EEG43a_1’) returns ‘EEG43’

Parameters:
  • filename (str) – filename. May or may not include directory path and/or extension.

  • base_id (str, optional) – Optionally specify a base for the patient id. The extracted patient file id will contain at least this base id. Raises an error if base id cannot be identified in the filename. Not case sensitive.

Returns:

(str) – the id specifying the patient (e.g. EEG43).

nnsa.edfreadpy.anonymization.config module

Module with general constants for anonymization.

Module contents

Package for anonymizing clinical data files.