Edf reader
Clinical datasets are typically stored as EDF files. For a description of the EDF file format, see https://www.edfplus.info/. The nnsa package contains an EdfReader class, which was developed for fast reading of EDF(+) files. This script demonstrates how to read EDF(+) files. Examples how to quickly read an entire EEG dataset from edf files is provided in the example script quick_edf_read.py
Link to script: io/edf_reader.py
import os
from nnsa import EdfReader
from nnsa.edfreadpy.io.config import EDFPLUS_TYPES
from nnsa.utils.code_performance import Timer
First define a filepath to an edf file on you PC that you want to read (include the file extension).
# Specify path to an EDF(+) file.
filepath = 'C:/data_temp/test.edf'
As is good practie when reading files, the EdfReader can be used as a context manager, to ensure the EDF file is closed after reading, and also if any error occurred. The EdfReader has properties, that can be accessed like attributes. Those properties will only hold data once they are called. I.e. when initializing an EdfReader object, nothing is read from the file into memory. Once when e.g. EdfReader.file_header is called, the file header is read from the EDF file and stored in the EdfReader object. Subsequent calls to EdfReader.file_header then simply return the stored header.
The file header contains information about the contents of the edf file. Let’s read the file header.
# Open the edf file with EdfReader.
with EdfReader(filepath) as r:
# Read/get the file header.
hdr = r.file_header
r.file_header returns a dictionary, in which the keys are strings representing the fields in the (normal) EDF header (no EDF+ fields are returned by r.file_header). The values are the entries in the corresponding fields stored as a suitable data type (e.g. str, int, float).
print('\nFile header:', hdr)
# E.g. print the number of signals in a the file (last field in the EDF header):
print('Number of signals in file:', hdr['num_signals'])
# Print all keys of the header dictionary:
print('Keys in r.file_header:', list(hdr.keys()))
Since the file header has fields for the patient and the startdate, the edf file may not be anonymous. To remove this sensitive information from the EDF file, we can use the anonymize_and_save function.
# This function changes the startdate and patient_id fields, and in case the file is an EDF+ also
# the startdate in the recording_id field. The date is changed by adding a random amount of days to the
# original startdate.
with EdfReader(filepath) as r:
# Using the property is_anonymized, you can check whether the loaded header information is anonymized (see the
# is_anonymized code in EdfReader to see which checks are performed).
print('Is anonymized:', r.is_anonymized)
print(r.file_header)
# We can anonymize the header information and save a new EDF file with the anonymized header information:
base, filename = os.path.split(filepath)
filepath_out = os.path.join(base, 'anonymized', filename)
r.anonymize_and_save(
filepath_out, # New edf filepath to save to (can not be an existing file).
seed_offset=100, # Choose a random seed_offset which determines how the date is anonimyzed.
check_is_anonymized=False, # Force anonymization, even if r.is_anonymized is True (defaults to True).
)
# Let us now open the anonymized file and check its header.
with EdfReader(filepath_out) as r:
# Verify that it is anonymized:
print('Is anonymized:', r.is_anonymized)
print(r.file_header)
# Remove the anonymized file that was created.
os.remove(filepath_out)
The signal headers contains information on each signal in the edf file. Let’s read the signal headers.
with EdfReader(filepath) as r:
# Read/get the signal headers.
s_hdrs = r.signal_headers
r.signal_headers returns a dictionary, in which the keys are strings representing the fields in an EDF signal header. The values are lists that contain the entries for the corresponding fields for each signal in the file, stored as a suitable data type (e.g. str, int, float).
print('\nSignal headers:', s_hdrs)
# E.g. print the list of labels of all the signals in the file:
print('Signal labels:', s_hdrs['label'])
# E.g. get the physical dimension of the 3rd signal (index 2) in the file.
print('Physical dimension of 3rd signal:', s_hdrs['physical_dimension'][2])
# Print the list of all keys in the dictionary:
print('Keys in r.signal_headers:', list(s_hdrs.keys()))
We can collect some other additional information about the file and the signals. This will also include the EDF+ fields in case of an EDF+ file.
with EdfReader(filepath) as r:
a_info = r.additional_info
r.additional_info returns a dictionary, in which the keys are strings representing several kinds of information about the file or signals in the file.
print('\nAdditional info:', a_info)
# E.g. print the file type (EDF or EDF+):
print('File type:', a_info['filetype'])
# Or, in case of EDF+ file, print the investigator:
if a_info['filetype'] in EDFPLUS_TYPES:
print('Investigator:', a_info['recording_id']['investigator'])
# Print the list of all keys in the dictionary:
print('Keys in r.additional_info:', list(a_info.keys()))
Now let’s read some signals. To read data of a particular channel, use r.read_signal() and specify the channel either by its index (int) or label (str). E.g. read the entire first signal (first channel) in the EDF file. Note that the channel count starts at zero (like indexing in Python).
with EdfReader(filepath) as r:
# Read first channel.
channel_index = 0
signal_0 = r.read_signal(channel=channel_index) # Returns a numpy array.
# Read the second channel, by specifying its label (which you can look up in the signal headers.
channel_label = r.signal_headers['label'][1]
signal_1 = r.read_signal(channel=channel_label)
# To only read a specific part of a signal, specify the index of the start and the stop sample.
# E.g. read first 500 samples of the signal in the third channel.
signal_2 = r.read_signal(channel=2, start=0, stop=500)
assert len(signal_2) == 500
The read_signal() method accepts an optional argument efficiency. You can set it either to ‘memory’ or ‘speed’. The difference between the two options is that for the ‘memory’ option, only the bytes of the signal you requested are read from the file. Therefore, the amount of memory it uses is proportional to the amount of data that you want to read from the file. On the other hand, with the ‘speed’ option, the entire file is read into the memory and only after reading the entire file, the parts you requested are extracted and returned. This latter approach is typically faster (unless you only want to read a small part of a signal). Another advantage of the speed option is that the entire (raw) data will be stored in the EdfReader object for quick (later) access to prevent having to read the entire file again if you want to read another channel, making it especially fast when reading multiple channels. Therefore, the efficiency argument defaults to ‘speed’. I suggest to use the ‘memory’ option only when you have problems with the entire EDF file fitting in the RAM, or if the part you want to read is singificantly smaller than the entire file.
Let’s compare the read times obatined with the two options for reading one signal.
with EdfReader(filepath) as r:
# Read fourth channel.
channel_index = 3
print(f'Comparing reading speed for reading one signal ("{s_hdrs["label"][channel_index]}").')
with Timer(prefix='efficiency="memory":'):
signal_memory = r.read_signal(channel=channel_index, efficiency='memory')
with Timer(prefix='efficiency="speed":'):
signal_speed = r.read_signal(channel=channel_index, efficiency='speed')
# Verify that the two signals are the same.
assert all(signal_speed == signal_memory)
# Now the entire file is read into the RAM. You can flush this data to free some memory
# (approx. the size of the EDF file):
r.flush_all_digital_data()
Let’s compare the read times obatined with the two options for reading one signal.
with EdfReader(filepath) as r:
# Read fourth channel.
channel_index = 3
print(f'Comparing reading speed for reading one signal ("{s_hdrs["label"][channel_index]}").')
In case of an EDF+ file, the EDF can contain annotations. You can read annotations in the file using r.read_annotations(). This function will collect each annotation as an edfreadpy.Annotation object in a edfreadpy.AnnotationSet object. The Annotation object is a simple object having three attributes: text, onset, duration. The AnnotationSet is basically just a list of Annotation objects. Check out the class definitions and examples in the annotations directory for more information.
with EdfReader(filepath) as r:
annotations = r.read_annotations() # Return nnsa.AnnotationSet object.
print(annotations)
Also for EDF+ files, the data can be discontinuous (EDF+D). The read_signal() method also accepts an optional argument discontinuous_mode (str) determining how to handle discontinuous datafiles (EDF+D): If ‘longest’, return the longest continuous segment; If ‘all’, return a list of all continuous segments; If ‘fill’ or ‘full’, merge all continuous sements filling up the gaps with nan; If ‘ignore’, the discontinuous signal is returned as if it is continuous.