nnsa.utils package
Submodules
nnsa.utils.arrays module
This module contains functions dealing with numpy arrays.
Functions:
|
Delay x with delay seconds wrt y and truncate. |
|
Apply a clampling function to the array x. |
|
Count the number of NaN (Not a Number) values in array x. |
|
Apply a function operating on 1D arrays on a multidimensional array x, operating along the axis. |
|
Get bin edges for variable x and n_bins. |
|
Compute features of the distribution of the channel data in x using a histogram representation. |
|
Linearly interpolate nan values in x. |
|
Mirror array x at the start and end of the array, along a specified dimension. |
|
Compute the moving (centered) average of x, with window size n. |
|
|
|
|
|
Compute the moving (centered) median absolute deviation (MAD) of x, with window size n. |
|
Compute the moving (centered) max of x, with window size n. |
|
Compute the moving (centered) mean of x, with window size n. |
|
Compute the moving (centered) median of x, with window size n. |
|
Compute the moving (centered) standard deviation of x, with window size n. |
|
Compute things like nanmean(x, axis=axis), but set the output to np.nan if more than max_nan_frac proportion of the data was nan. |
|
Segment 1D array x using efficient striding. |
- nnsa.utils.arrays.apply_delay(x, y, delay, fs=1, d_min=None, d_max=None)[source]
Delay x with delay seconds wrt y and truncate.
- Parameters:
x (np.ndarray) – 1D array to delay.
y (np.ndarray) – 1D array.
delay (float or int) – number of samples (or seconds if fs is given) to delay x with.
fs (float, optional) – sample frequency of x and y. If specified, the delay can be given in seconds.
- Returns:
x_out (np.ndarray) – delayed version of x, with length len(x) - abs(delay).
y_out (np.ndarray) – delayed version of y, same length as x.
- nnsa.utils.arrays.clamp(x, neg_thres=-inf, pos_thres=inf)[source]
Apply a clampling function to the array x.
References
L. Webb, M. Kauppila, J. A. Roberts, S. Vanhatalo, and N. J. Stevenson, “Automated detection of artefacts in neonatal EEG with residual neural networks,” Computer Methods and Programs in Biomedicine, vol. 208, p. 106194, Sep. 2021, doi: 10.1016/j.cmpb.2021.106194.
- nnsa.utils.arrays.count_nans(x, **kwargs)[source]
Count the number of NaN (Not a Number) values in array x.
- Parameters:
x (np.ndarray) – the array in which to count the NaNs.
**kwargs (optional) – optional keyword arguments for np.sum() to specify e.g. the axis along which to count the NaNs.
- Returns:
(np.int32 or np.ndarray of np.int32) – the
- nnsa.utils.arrays.do_for_axis(x, fun, axis)[source]
Apply a function operating on 1D arrays on a multidimensional array x, operating along the axis.
- Parameters:
x (np.ndarray) – multidimensional array.
fun (function) – function that takes in one argument: a 1D array.
axis (int) – the axis in x along which the function should be applied.
*args – positional and keyword arguments for fun.
**kwargs –
positional and keyword arguments for fun.
- Returns:
result (np.ndarray) – the result. Has the same number of dimensions as x. The length of the axis dimension might differ from x depending on what fun returns.
Examples: >>> x = np.random.rand(10, 20, 30)
# Mean along second axis using numpy. >>> mean1 = np.mean(x, axis=1, keepdims=True) >>> mean1.shape (10, 1, 30)
# Mean along second axis using this function. >>> mean2 = do_for_axis(x, fun=np.mean, axis=1, keepdims=True) >>> mean2.shape (10, 1, 30)
>>> np.max(np.abs(mean1 - mean2)) < 1e-15 True
# Cumsum. >>> cumsum1 = np.cumsum(x, axis=1) >>> cumsum1.shape (10, 20, 30) >>> cumsum2 = do_for_axis(x, fun=np.cumsum, axis=1) >>> cumsum2.shape (10, 20, 30) >>> np.max(np.abs(cumsum1 - cumsum2)) < 1e-15 True
- nnsa.utils.arrays.get_bin_edges(x, n_bins)[source]
Get bin edges for variable x and n_bins.
- Parameters:
x (np.ndarray) – array to divide into bins.
n_bins (int) – number of bins to divide into.
- Returns:
bin_edges (np.ndarray) – array with length `n_bins`+1, containing bin edges.
- nnsa.utils.arrays.histogram_features_per_channel(x, bins, channel_labels=None, ignore_nan=True)[source]
Compute features of the distribution of the channel data in x using a histogram representation.
To compute the histogram features on a 1D array, reshape it to a one-channel array, i.e. x.reshape(1, -1).
- Parameters:
x (np.ndarray) – array containing data with dimensions (channels, segments).
bins (np.ndaaray or list) – bin edges of the histogram, see np.histogram().
channel_labels (list, optional) – list of the channel/feature labels corresponding to the rows of x`.
ignore_nan (bool, optional) – if True, ignore nan values in x. Defaults to True.
- Returns:
(pd.DataFrame) – dataframe with the channel labels as row index and the histogram features as columns.
- nnsa.utils.arrays.interp_nan(x, max_nan_length=None, axis=-1)[source]
Linearly interpolate nan values in x.
- Parameters:
x (np.ndarray) – data array.
max_nan_length (int, optional) – number of maximum consecutive nan samples to interpolate. If specified, data that is missing for more than max_nan_lengh is not interpolated. Instead, nans are remained. Additionally, no extrapolation is done if max_nan_length is specified. If None, all nan values are interpolated or extrapolated, no matter the length of the missing data. Defaults to None.
axis (int, optional) – the axis of x which to interpolate along. Defaults to -1.
- Returns:
x_interp (np.ndarray) – data array with same shape as x, where np.nan values have been linearly interpolated.
Examples
>>> x = np.array([1, 1, 1, np.nan, np.nan, 2, 2, np.nan, 0]) >>> interp_nan(x) array([1. , 1. , 1. , 1.33333333, 1.66666667, 2. , 2. , 1. , 0. ])
>>> x = np.array([1, np.nan, np.nan, np.nan, np.nan, 2, 2, np.nan, 0]) >>> interp_nan(x, max_nan_length=3) array([ 1., nan, nan, nan, nan, 2., 2., 1., 0.])
- nnsa.utils.arrays.mirror_boundaries(x, n_left=0, n_right=0, axis=-1)[source]
Mirror array x at the start and end of the array, along a specified dimension.
- Parameters:
x (np.ndarray) – data array.
n_left (int, optional) – number of samples to mirror at the start of the array. Cannot be greater than x.shape[axis]. Defaults to 0.
n_right (int, optional) – number of samples to mirror at the end of the array. Cannot be greater than x.shape[axis]. Defaults to 0.
axis (int, optional) – axis along which to mirror. Defaults to -1.
- Returns:
x (np.ndarray) – original array x, with n_left mirrord samples appended to the left (before the start) and n_right mirrored samples added to the right (after the end).
Examples
>>> x = np.arange(10) >>> mirror_boundaries(x) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> mirror_boundaries(x, n_left=3) array([3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> mirror_boundaries(x, n_left=3, n_right=2) array([3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7])
>>> x2 = np.arange(8).reshape(4, 2) >>> x2 array([[0, 1], [2, 3], [4, 5], [6, 7]])
>>> mirror_boundaries(x2, n_left=2, n_right=1, axis=0) array([[4, 5], [2, 3], [0, 1], [2, 3], [4, 5], [6, 7], [4, 5]])
- nnsa.utils.arrays.moving_average(x, n, axis=-1, maintain_nan=True)[source]
Compute the moving (centered) average of x, with window size n.
Ignores np.nan values, but outputs np.nan at locations where x is np.nan if maintain_nan is True. The output at index i is the average of a window of x, with the center of the window located at x[i]. Mirrors the input at the borders to reduce boundary effects.
Adapted from: https://stackoverflow.com/questions/39919050/calculate-moving-average-in-numpy-array-with-nans.
- Parameters:
x (np.ndarray) – data array.
n (int) – window size/number of taps. If -1 or larger than 2*len(x), computes global (non-moving) average.
axis (int, optional) – axis along which to compute the moving average. Defaults to -1.
maintain_nan (bool, optional) – if True, output is nan at locations where x is nan. If False, the local average is returned even if the current sample was a nan. If all values in the local average are nan, returns nan at that index. Defaults to True.
- Returns:
ret (np.ndarray) – the moving average of x. Has same shape as x.
counts (np.ndarray) – the number of non-np.nan samples that participated in the average. Has same shape as ret.
Examples
>>> x = np.arange(10).astype(float) >>> moving_average(x, n=3)[0] array([0.66666667, 1. , 2. , 3. , 4. , 5. , 6. , 7. , 8. , 8.33333333])
>>> moving_average(x, n=2)[0] array([0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5])
>>> x[[2, 4]] = np.nan >>> x array([ 0., 1., nan, 3., nan, 5., 6., 7., 8., 9.]) >>> moving_average(x, n=3)[0] array([0.66666667, 0.5 , nan, 3. , nan, 5.5 , 6. , 7. , 8. , 8.33333333])
>>> moving_average(x, n=3, maintain_nan=False)[0] array([0.66666667, 0.5 , 2. , 3. , 4. , 5.5 , 6. , 7. , 8. , 8.33333333])
>>> x2 = np.arange(8).reshape(4, 2) >>> x2 array([[0, 1], [2, 3], [4, 5], [6, 7]])
>>> moving_average(x2, n=3, axis=0)[0] array([[1.33333333, 2.33333333], [2. , 3. ], [4. , 5. ], [4.66666667, 5.66666667]])
- nnsa.utils.arrays.moving_mad(x, n, axis=-1, maintain_nan=True, max_nan_frac=1, k=1.4826)[source]
Compute the moving (centered) median absolute deviation (MAD) of x, with window size n.
The output at index i is the average of a window of x, with the center of the window located at x[i]. Mirrors the input at the borders to reduce boundary effects.
- Parameters:
x (np.ndarray) – data array.
n (int) – window size/number of taps. If n=-1 or None, computes metric over the entire array.
axis (int, optional) – axis along which to compute the moving metric. Defaults to -1.
maintain_nan (bool, optional) – if True, output is nan at locations where x is nan. If False, the local metric is returned even if the current sample was a nan (if other samples in window were not nan). Defaults to True.
max_nan_frac (float) – fraction between 0 and 1 specifying how many nans (proportion) are maximally allowed in a local window. If more nans are in the window, the output is nan for the corresponding output sample.
k (float) – scaling factor for MAD. If k=1.4826, MAD ~ std for Gaussian data. See also https://en.wikipedia.org/wiki/Median_absolute_deviation.
- Returns:
running_val (np.ndarray) – the moving MAD of x. Has same shape as x.
Examples
>>> x = np.array([[0, 0, 0, 1 , 2 , 3 , 3, 2, 10, 10], ... [0, 0, 0, np.nan, np.nan, np.nan, 3, 2, 10, 10]]) >>> moving_mad(x, n=3, axis=-1) array([[0. , 0. , 0. , 1.4826, 1.4826, 0. , 0. , 1.4826, 0. , 0. ], [0. , 0. , 0. , nan, nan, nan, 0.7413, 1.4826, 0. , 0. ]]) >>> moving_mad(x, n=3, axis=0, maintain_nan=False, max_nan_frac=1) array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) >>> moving_mad(x, n=-1) array([[2.2239, 2.2239, 2.2239, 2.2239, 2.2239, 2.2239, 2.2239, 2.2239, 2.2239, 2.2239], [2.9652, 2.9652, 2.9652, nan, nan, nan, 2.9652, 2.9652, 2.9652, 2.9652]]) >>> moving_mad(x, n=3, max_nan_frac=0) array([[0. , 0. , 0. , 1.4826, 1.4826, 0. , 0. , 1.4826, 0. , 0. ], [0. , 0. , nan, nan, nan, nan, nan, 1.4826, 0. , 0. ]])
- nnsa.utils.arrays.moving_max(x, n, axis=-1, maintain_nan=True, max_nan_frac=1)[source]
Compute the moving (centered) max of x, with window size n.
The output at index i is the average of a window of x, with the center of the window located at x[i]. Mirrors the input at the borders to reduce boundary effects.
- Parameters:
x (np.ndarray) – data array.
n (int) – window size/number of taps. If n=-1 or None, computes metric over the entire array.
axis (int, optional) – axis along which to compute the moving metric. Defaults to -1.
maintain_nan (bool, optional) – if True, output is nan at locations where x is nan. If False, the local metric is returned even if the current sample was a nan (if other samples in window were not nan). Defaults to True.
max_nan_frac (float) – fraction between 0 and 1 specifying how many nans (proportion) are maximally allowed in a local window. If more nans are in the window, the output is nan for the corresponding output sample.
- Returns:
running_val (np.ndarray) – the moving max of x. Has same shape as x.
Examples
>>> x = np.array([[0, 0, 0, 1 , 2 , 3 , 3, 2, 10, 10], ... [0, 0, 0, np.nan, np.nan, np.nan, 3, 2, 10, 10]]) >>> moving_max(x, n=3, axis=-1) array([[ 0., 0., 1., 2., 3., 3., 3., 10., 10., 10.], [ 0., 0., 0., nan, nan, nan, 3., 10., 10., 10.]]) >>> moving_max(x, n=3, axis=0, maintain_nan=False, max_nan_frac=1) array([[ 0., 0., 0., 1., 2., 3., 3., 2., 10., 10.], [ 0., 0., 0., 1., 2., 3., 3., 2., 10., 10.]]) >>> moving_max(x, n=-1) array([[10., 10., 10., 10., 10., 10., 10., 10., 10., 10.], [10., 10., 10., nan, nan, nan, 10., 10., 10., 10.]]) >>> moving_max(x, n=3, max_nan_frac=0) array([[ 0., 0., 1., 2., 3., 3., 3., 10., 10., 10.], [ 0., 0., nan, nan, nan, nan, nan, 10., 10., 10.]])
- nnsa.utils.arrays.moving_mean(x, n, axis=-1, maintain_nan=True, max_nan_frac=1)[source]
Compute the moving (centered) mean of x, with window size n.
The output at index i is the average of a window of x, with the center of the window located at x[i]. Mirrors the input at the borders to reduce boundary effects.
- Parameters:
x (np.ndarray) – data array.
n (int) – window size/number of taps. If n=-1 or None, computes metric over the entire array.
axis (int, optional) – axis along which to compute the moving metric. Defaults to -1.
maintain_nan (bool, optional) – if True, output is nan at locations where x is nan. If False, the local metric is returned even if the current sample was a nan (if other samples in window were not nan). Defaults to True.
max_nan_frac (float) – fraction between 0 and 1 specifying how many nans (proportion) are maximally allowed in a local window. If more nans are in the window, the output is nan for the corresponding output sample.
- Returns:
running_val (np.ndarray) – the moving mean of x. Has same shape as x.
Examples
>>> x = np.array([[0, 0, 0, 1 , 2 , 3 , 3, 2, 10, 10], ... [0, 0, 0, np.nan, np.nan, np.nan, 3, 2, 10, 10]]) >>> moving_mean(x, n=3, axis=-1) array([[ 0. , 0. , 0.33333333, 1. , 2. , 2.66666667, 2.66666667, 5. , 7.33333333, 10. ], [ 0. , 0. , 0. , nan, nan, nan, 2.5 , 5. , 7.33333333, 10. ]]) >>> moving_mean(x, n=3, axis=0, maintain_nan=False, max_nan_frac=1) array([[ 0., 0., 0., 1., 2., 3., 3., 2., 10., 10.], [ 0., 0., 0., 1., 2., 3., 3., 2., 10., 10.]]) >>> moving_mean(x, n=-1) array([[3.1 , 3.1 , 3.1 , 3.1 , 3.1 , 3.1 , 3.1 , 3.1 , 3.1 , 3.1 ], [3.57142857, 3.57142857, 3.57142857, nan, nan, nan, 3.57142857, 3.57142857, 3.57142857, 3.57142857]]) >>> moving_mean(x, n=3, max_nan_frac=0) array([[ 0. , 0. , 0.33333333, 1. , 2. , 2.66666667, 2.66666667, 5. , 7.33333333, 10. ], [ 0. , 0. , nan, nan, nan, nan, nan, 5. , 7.33333333, 10. ]])
- nnsa.utils.arrays.moving_median(x, n, axis=-1, maintain_nan=True, max_nan_frac=1)[source]
Compute the moving (centered) median of x, with window size n.
The output at index i is the average of a window of x, with the center of the window located at x[i]. Mirrors the input at the borders to reduce boundary effects.
- Parameters:
x (np.ndarray) – data array.
n (int) – window size/number of taps. If n=-1 or None, computes metric over the entire array.
axis (int, optional) – axis along which to compute the moving metric. Defaults to -1.
maintain_nan (bool, optional) – if True, output is nan at locations where x is nan. If False, the local metric is returned even if the current sample was a nan (if other samples in window were not nan). Defaults to True.
max_nan_frac (float) – fraction between 0 and 1 specifying how many nans (proportion) are maximally allowed in a local window. If more nans are in the window, the output is nan for the corresponding output sample.
- Returns:
running_val (np.ndarray) – the moving median of x. Has same shape as x.
Examples
>>> x = np.array([[0, 0, 0, 1 , 2 , 3 , 3, 2, 10, 10], ... [0, 0, 0, np.nan, np.nan, np.nan, 3, 2, 10, 10]]) >>> moving_median(x, n=3, axis=-1) array([[ 0. , 0. , 0. , 1. , 2. , 3. , 3. , 3. , 10. , 10. ], [ 0. , 0. , 0. , nan, nan, nan, 2.5, 3. , 10. , 10. ]]) >>> moving_median(x, n=3, axis=0, maintain_nan=False, max_nan_frac=1) array([[ 0., 0., 0., 1., 2., 3., 3., 2., 10., 10.], [ 0., 0., 0., 1., 2., 3., 3., 2., 10., 10.]]) >>> moving_median(x, n=-1) array([[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [ 2., 2., 2., nan, nan, nan, 2., 2., 2., 2.]]) >>> moving_median(x, n=3, max_nan_frac=0) array([[ 0., 0., 0., 1., 2., 3., 3., 3., 10., 10.], [ 0., 0., nan, nan, nan, nan, nan, 3., 10., 10.]])
- nnsa.utils.arrays.moving_std(x, n, axis=-1, maintain_nan=True, max_nan_frac=1, ddof=0)[source]
Compute the moving (centered) standard deviation of x, with window size n.
The output at index i is the average of a window of x, with the center of the window located at x[i]. Mirrors the input at the borders to reduce boundary effects.
- Parameters:
x (np.ndarray) – data array.
n (int) – window size/number of taps. If n=-1 or None, computes metric over the entire array.
axis (int, optional) – axis along which to compute the moving metric. Defaults to -1.
maintain_nan (bool, optional) – if True, output is nan at locations where x is nan. If False, the local metric is returned even if the current sample was a nan (if other samples in window were not nan). Defaults to True.
max_nan_frac (float) – fraction between 0 and 1 specifying how many nans (proportion) are maximally allowed in a local window. If more nans are in the window, the output is nan for the corresponding output sample.
ddof (int) – degrees of freedom for the denominator when computing the std (see np.std()).
- Returns:
running_val (np.ndarray) – the moving std of x. Has same shape as x.
Examples
>>> x = np.array([[0, 0, 0, 1 , 2 , 3 , 3, 2, 10, 10], ... [0, 0, 0, np.nan, np.nan, np.nan, 3, 2, 10, 10]]) >>> moving_std(x, n=3, axis=-1) array([[0. , 0. , 0.47140452, 0.81649658, 0.81649658, 0.47140452, 0.47140452, 3.55902608, 3.77123617, 0. ], [0. , 0. , 0. , nan, nan, nan, 0.5 , 3.55902608, 3.77123617, 0. ]]) >>> moving_std(x, n=3, axis=0, maintain_nan=False, max_nan_frac=1) array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) >>> moving_std(x, n=-1) array([[3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 , 3.6180105 ], [4.20398256, 4.20398256, 4.20398256, nan, nan, nan, 4.20398256, 4.20398256, 4.20398256, 4.20398256]]) >>> moving_std(x, n=3, max_nan_frac=0) array([[0. , 0. , 0.47140452, 0.81649658, 0.81649658, 0.47140452, 0.47140452, 3.55902608, 3.77123617, 0. ], [0. , 0. , nan, nan, nan, nan, nan, 3.55902608, 3.77123617, 0. ]])
- nnsa.utils.arrays.nanfun(fun, x, axis=None, max_nan_frac=0.5, **kwargs)[source]
Compute things like nanmean(x, axis=axis), but set the output to np.nan if more than max_nan_frac proportion of the data was nan.
- Parameters:
fun – a function that accepts an array and an axis parameter, such as np.nanmean, np.nanmedian.
x (np.ndarray) – array on which to apply fun.
axis (int) – the axis along which to apply the function.
max_nan_frac (float) – maximum fraction of nan.
**kwargs (optional) – for fun.
- Returns:
y (np.ndarray) – result of the fun(x, axis=axis) with nans where there were too many nans.
Examples
>>> x = np.array([[0, 1, 2, 3, np.nan, np.nan, 6],[0, np.nan, np.nan, 3, np.nan, 5, np.nan]]) >>> nanfun(np.nanmean, x, axis=1, max_nan_frac=0.5) array([2.4, nan]) >>> nanfun(np.nanmean, x, axis=1, max_nan_frac=1) array([2.4 , 2.66666667])
- nnsa.utils.arrays.strided_x(x, n, stride=1)[source]
Segment 1D array x using efficient striding.
From this post: http://stackoverflow.com/a/40085052/3293881
- Parameters:
x (np.ndarray) – 1D array to segment.
n (int) – segment length (samples).
stride (int) – the stride/step length (samples).
- Returns:
x_seg = (np.ndarray) – array with shape (-1, n).
nnsa.utils.code_performance module
This module contains functions dealing with assessing code performance.
Classes:
|
Implements a context-manager timer. |
Functions:
Decorator that print elasped time and change in available memory when calling the decorated function. |
- class nnsa.utils.code_performance.Timer(prefix='', ndecimals=None)[source]
Bases:
objectImplements a context-manager timer.
- Parameters:
ndecimals (int) – number of decimals to print (for the seconds). By default rounds to 0 decimals, i.e., whole seconds.
Examples
>>> with Timer(): ... s = [x**2 for x in range(10000000)] Elapsed time: 0:00:02.172509
nnsa.utils.config module
Handy constants.
nnsa.utils.conversions module
Functions:
|
Convert a time array in seconds to a specified scale. |
|
Convert a time array in time_scale to seconds. |
- nnsa.utils.conversions.convert_time_scale(time, time_scale)[source]
Convert a time array in seconds to a specified scale.
- Parameters:
time (np.ndarray, float, int) – time array in seconds.
time_scale (str) – the time scale to convert to. Choose from ‘seconds’, ‘minutes’, ‘hours’.
- Returns:
time (np.ndarray) – a copy of the time array, scaled accordingly corresponding to the requested scale.
- nnsa.utils.conversions.revert_time_scale(time, time_scale)[source]
Convert a time array in time_scale to seconds.
- Parameters:
time (float, int, np.ndarray) – time array in arbitrary scale.
time_scale (str) – the time scale. Choose from ‘seconds’, ‘minutes’, ‘hours’.
- Returns:
time (np.ndarray) – a copy of the time array, rescaled to seconds.
nnsa.utils.dataframes module
Functions:
|
Boxplot of column x vs column y in data, connecting points for rows with same id. |
|
Collect all multimodal (absolute) correlation values higher than min_val. |
|
Compute correlations between x_columns and y_columns in data. |
|
Compute the scores for a condition having effect on the relation between each of columns and predictor in df. |
|
Apply feature correction by subtracting a linear regression model from the feature. |
|
Compute cluster quality for clustering power of every feature in x_columns together with any feature in y_column. |
|
Create scatterplots of each possible pairs of (specified) columns in the dataframe. |
|
Create scatterplots of n pairs of columns in df, that separate samples based on groupby best. |
|
Do a PCA on the columns of df. |
|
Create a figure and plot each specified column of a DataFrame in a subplot. |
|
Automatically detect the filetype and read the file into a DataFrame using pandas. |
|
Plot regression plots for each feature in columns against feature x. |
|
Scatter plot of column x vs column y in data, connecting points for rows with same id. |
|
Standardize (zero mean, unit standard deviation) columns in a DataFrame. |
- nnsa.utils.dataframes.boxplot_paired(x, y, data, id, hue=None, **kwargs)[source]
Boxplot of column x vs column y in data, connecting points for rows with same id.
- Parameters:
x (str) – name of the column in data for the x-axis.
y (str) – name of the column in data for the y-axis.
data (pd.DataFrame) – DataFrame containing the data.
id (str) – column name of the subject/patient/sample ID.
hue (str, optional) – name of the column for colouring of the boxes.
kwargs (dict, optional) – optional keyword arguments common for sns.boxplot() and sns.lineplot().
- nnsa.utils.dataframes.collect_high_cor_values(df_cor, min_val=0.8)[source]
Collect all multimodal (absolute) correlation values higher than min_val.
Only considers correlations between different modalities (encoded by the first 3 characters).
- Parameters:
df_cor (pd.DataFrame) – correlation dataframe (df.corr()).
min_val (float) – vlaues higher than min_val are collected.
- Returns:
series_report (pd.Series) – series with high correlation scores.
- nnsa.utils.dataframes.compute_correlation_matrix(data, x_columns=None, y_columns=None, control_var=None, corr_method='pearson')[source]
Compute correlations between x_columns and y_columns in data.
- Parameters:
data (pd.DataFrame, np.ndarray) – dataframe or arrays with columns.
x_columns (list) – list with columns names for x. If None, takes all numeric columns in data.
y_columns (list) – list with columns names for y. If None, x_columns will be used.
control_var (str) – variable to control for. If specified, computes partial correlation values.
corr_method (str, optional) – which correlation function to use (‘pearson’, ‘spearman’, ‘kendall’).
- Returns:
df_corr (pd.DataFrame) – dataframe with correlation values. Columns correspond to x_columns and indices to y_columns.
df_pval (pd.DataFrame) – dataframe with p-values corresponding to the correlations (same shape as df_corr).
- nnsa.utils.dataframes.different_regression_scores(df, predictor, condition, columns=None)[source]
Compute the scores for a condition having effect on the relation between each of columns and predictor in df.
Returns 1-pvalue that the condition is a signficiant contribution to the linear regression model.
Fits the following model and assesses whether the condition terms are significant. column ~ 1 + predictor + condition + condition*predictor.
# TODO :param df: :param predictor: :param condition: :param columns:
Returns:
- nnsa.utils.dataframes.feature_correction_lin_reg(x, df, columns=None, except_columns=None, groupby=None, x_ref=0, corr_pvalue_threshold=0.05, postfix=None)[source]
Apply feature correction by subtracting a linear regression model from the feature.
Useful in the following example: you want to investigate the influence of (categorical) feature groupby on feature y (in columns), however, feature x also influences feature x and is correlated with feature groupby. To eliminate the influence of feature x on feature y, this function first groups the data by the groupby feature. For the set of samples in each group, a linear regression model is fitted that predicts y as a function of x. Subsequently, this regression model is evaluated at each sample point at the corresponding feature value x, and subtracted from feature y as a way to eliminate the influence of feature x on feature y, keeping in account that the groupby feature might have influence on y as well.
- Parameters:
x (str) – the feature (column name in df) to correct for. For this feature, a linear regression model is fitted with the target feature y. Subsequently, this linear regression model is used to subtract the influence of feature x from feature y.
df (pd.DataFrame) – DataFrame containing the data.
columns (str or list, optional) – the feature column name(s) of the feature(s) to correct. If None, all numerical features, except feature x, will be corrected. Defaults to None.
except_columns (str or list, optional) – the column name(s) of the features not to correct (will be removed from columns). If None, no columns will be removed. Defaults to None.
groupby (str, optional) – a categorical feature that is expected to have an additional influence on y. If None, no grouping is applied. I.e., a linear regression model is fitted based on all samples, assuming that there are no features (predictors) correlating with x. If specified, the features are corrected per group. Defaults to None.
x_ref (float, optional) – reference value for feature x. After subtracting the contribution of feature x to feature y, it is possible to add the expected contribution at a fixed reference value for x. Defaults to 0.
corr_pvalue_threshold (float, optional) – the maximum p-value of the Pearson correlation coefficient (as returned by scipy.stats.pearsonr()) in order to correct the feature. Only features that correlate significantly with x will be corrected, i.e. features with a correlation p-value <= corr_pvalue_threshold will be corrected. Defaults to 0.05.
postfix (str, optional) – a postfix to add to the feature name(s) (column name(s)) that are corrected. If None, a standerd postfix will be added indicating the correction. Defaults to None.
- Returns:
df_out (pd.DataFrame) – a new DataFrame with the same number of features/columns, but where columns y are corrected and are optionally renamed using the postfix.
Examples
>>> df = pd.DataFrame(data=np.tile(np.random.rand(25).reshape(-1, 1), (1, 2)), columns=['random_1', 'random_2']) >>> df_corrected = feature_correction_lin_reg('random_1', df, columns='random_2') >>> print(np.all(df_corrected['random_2_linsub_random_1'] < 1e-15)) True
- nnsa.utils.dataframes.pair_separation_scores(df, groupby, x1_columns=None, x2_columns=None, standardize_columns=True, split_train=0.75, separation_metric='roc_auc', verbose=0)[source]
Compute cluster quality for clustering power of every feature in x_columns together with any feature in y_column.
- Parameters:
df (pd.DataFrame) – DataFrame containing the data.
groupby (str, optional) – categorical column in df that classifies the data.
x1_columns (list, optional) – list of (numeric) columns that hold the first features. If None, all numeric columns are used, except groupby. Defaults to None.
x2_columns (list, optional) – list of (numeric) columns that hold the seconds features. If None, all numeric columns are used, except groupby. Defaults to None.
standardize_columns (bool, optional) – if True, first standardize the features. If False, do not standardize. Defaults to True.
split_train (float or None, optional) – if a float, the data will be split in train and test set, where split_train is the fraction of data to train on. Must be between 0 and 1. If None, the data is not split and training and testing is done on the entire data set. Defaults to 0.75.
separation_metric (str, optional) – the separation metric to use. Choose from: ‘silhouette’, ‘roc_auc’, ‘accuracy’. Defaults to ‘roc_auc’.
verbose (int, optional) – verbosity level. Defaults to 0.
- Returns:
(pd.DataFrame) – table with cluster scores for any two pairs of features.
- nnsa.utils.dataframes.pairplot(*args, **kwargs)[source]
Create scatterplots of each possible pairs of (specified) columns in the dataframe.
Wrapper of seaborn’s pairplot.
- Parameters:
*args – see sns.pairplot().
**kwargs – see sns.pairplot().
- Returns:
see sns.pairplot()
- nnsa.utils.dataframes.pairplot_best(df, groupby, n, scores=None, **kwargs)[source]
Create scatterplots of n pairs of columns in df, that separate samples based on groupby best.
Based on cluster quality metric returned by nnsa.classifiers.evaluation.cluster_quality().
- Parameters:
df (pd.DataFrame) – DataFrame containing the data.
groupby (str) – categorical column in df that classifies the data.
n (int) – number of scatterplots/pairs to show.
**kwargs (optional) – keyword arguments for pair_separation_scores().
- Returns:
axes (list) – list of Axis objects in which the data is plotted.
- nnsa.utils.dataframes.pca(df, n_components, columns=None, standardize_columns=True, concatenate=True, **kwargs)[source]
Do a PCA on the columns of df.
- Parameters:
df (pd.DataFrame) – DataFrame containing the data.
n_components (int) – number of components for decomposition/projection.
columns (list, optional) – list of (numeric) columns that hold the features. If None, all numeric columns are used, except groupby. Defaults to None.
standardize_columns (bool, optional) – if True, first standardize the features. If False, do not standardize. Defaults to True.
concatenate (bool, optional) – if True, a new DataFrame consisting of the original df and the principal components is returned. If False, only a DataFrame with the principal components is returned. Defaults to True.
**kwargs (optional) – keyword arguments to pass to sklearn.decomposition.PCA().
- Returns:
principal_df (pd.DataFrame) – dataframe with principal components.
pca_ (sklearn.decomposition.pca.PCA) – object containing info on the decomposition, see sklearn.decomposition.PCA().
- nnsa.utils.dataframes.plot_column_data(df, columns=None, groupby=None, hue=None, kind='box', sharex=True, sharey='none', all_axes=None, **kwargs)[source]
Create a figure and plot each specified column of a DataFrame in a subplot.
- Parameters:
df (pd.DataFrame) – pandas DataFrame of which to plot (specified) columns.
columns (list, optional) – list of (numeric) columns to plot. If None, all numeric columns are plotted. Defaults to None.
groupby (str, optional) – categorical column to split the data on (per subplot, a separate plot is drawn per group).
hue (str, optional) – categorical column to split the data on within a group, using colors.
kind (str, optional) – the type of plot to draw. Choose from ‘strip’, ‘box’, ‘violin’, ‘swarm’, ‘line’, ‘dist’. Defaults to ‘box’.
sharex (str, optional) – option to share the x-axis, see plt.subplots(). Defaults to True.
sharey (str, optional) – option to share the y-axis, see plt.subplots(). Defaults to ‘none’.
all_axes (list, optional) – list with an aixs handle for each subplot/feature to plot. Must have at least the same length as columns. If None, a new figure with subplot axes is created. Defaults to None.
**kwargs (optional) – optional keyword arguments passed to the seaborn plot function.
- Returns:
all_axes (list) – list with an axis handle for each subplot.
- nnsa.utils.dataframes.read_dataframe(filepath, **kwargs)[source]
Automatically detect the filetype and read the file into a DataFrame using pandas.
- Parameters:
filepath (str) – path to a file that can be read as a pandas DataFrame. Compatible file types/extensions are: csv xlsx
**kwargs (optional) – keyword arguments for the read function.
- Returns:
df (pd.DataFrame) – the content of the file as a DataFrame.
- nnsa.utils.dataframes.regplots(df, x, columns=None, hue=None, axes=None)[source]
Plot regression plots for each feature in columns against feature x.
- Parameters:
df (pd.DataFrame) – DataFrame containing the data.
x (str) – name of the feature to compute the regression to.
columns (list or str, optional) – list of (numeric) columns to plot against x. Can also be a string to plot one column. If None, all numeric columns are plotted. Defaults to None.
hue (str, optional) – name of a (categorical) feature to control the coloring of the datapoints. If None, all datapoints are plotted in the same color. Defaults to None.
axes (list, optional) – list with Axes objects to plot in. If None, a new figure with subplots is created. Defaults to None.
- Returns:
axes (list) – the axes objects in which the plots are made.
- nnsa.utils.dataframes.scatter_paired(x, y, data, id, hue=None, style=None, **kwargs)[source]
Scatter plot of column x vs column y in data, connecting points for rows with same id.
- Parameters:
x (str) – name of the column in data for the x-axis.
y (str) – name of the column in data for the y-axis.
data (pd.DataFrame) – DataFrame containing the data.
id (str) – column name of the subject/patient/sample ID.
hue (str, optional) – name of the column for colouring of the scatter points.
style (str, optional) – name of the column for the style of the scatter points.
kwargs (dict, optional) – optional keyword arguments common for sns.scatterplot() and sns.lineplot().
- nnsa.utils.dataframes.standardize(df, columns=None, inplace=False)[source]
Standardize (zero mean, unit standard deviation) columns in a DataFrame.
See sklearn.preprocessing.StandardScaler() for additional info about the standardization.
- Parameters:
df (pd.DataFrame) – pandas DataFrame of which to standardize (specified) columns.
columns (list or str, optional) – list (or str) specifying the numeric column(s) to standardize. If None, all numeric columns are standardized. Defaults to None.
inplace (bool, optional) – if True, the values of the columns in df are replaced by the standardized values. If False, a new DataFrame is returned. Defaults to False.
- Returns:
df_out (pd.DataFrame) – DataFrame object with the standardized values (if inplace is False).
nnsa.utils.dictionaries module
This module contains functions dealing with dictionaries.
Functions:
|
Put a value in dictionary d with nested keys specified by the specified keys list. |
|
Traverse a nested/multi-level dictionary and create a new one-level dictionary. |
|
Return a string that prints a list of items, where each item is a pair of objects. |
|
Update a nested dictionary d with update dictionary u, maintaining deeper levels of d that are not in u. |
Restore the original dictionary after flattening the dict with flatten_dict(). |
|
|
Write a dictionary to a csv, structuring it as a table with the keys of the dict as column headers. |
- nnsa.utils.dictionaries.add_nested_dict(d, keys, value)[source]
Put a value in dictionary d with nested keys specified by the specified keys list.
Set d[‘a’][‘b’][‘c’] = value, when keys = [‘a’, ‘b’, ‘c’]
- Parameters:
d (dict) – dictionary to add the value with nested keys to.
keys (list) – list of nested keys.
value – the value to put in the dictionary.
- nnsa.utils.dictionaries.flatten_dict(d, path='', d_out=None)[source]
Traverse a nested/multi-level dictionary and create a new one-level dictionary.
Item d[‘a’][‘b’] is mapped to key ‘a/b’ in the output dictionary. The reverse operation is achieved by the unflatten_dict() function.
- Parameters:
d (dict) – dictionary to flatten.
path (str, optional) – prefix for the flattened keys. Only needed for recursive calls. The user needs not to specify this, i.e. specify ‘’. Defaults to ‘’.
d_out (dict or None, optional) – if a dict is specify, this dict if updated with the flattened keys and value pairs. Needed for recursive calls. If None, the output dictionary is a new empty dictionary. Defaults to None.
- Returns:
d_out (dict) – one-level dictionary with same values as input d, but with flattened keys.
Examples
>>> d = {'A': {'a': 2, 'b': True}, 'B': {'a': 10, 'b': False}} >>> flatten_dict(d) {'A/a': 2, 'A/b': True, 'B/a': 10, 'B/b': False}
- nnsa.utils.dictionaries.itemize_items(items)[source]
Return a string that prints a list of items, where each item is a pair of objects.
Handy for printing dictionaries, e.g. if d is some dict: print(itemize_items(d.items())
- Parameters:
items (iterable) – iterable yielding two values.
- Returns:
(str) – string in which the items are printed underneath each other with indentation.
- nnsa.utils.dictionaries.nested_update(d, other=None, accept_new_key=True, **kwargs)[source]
Update a nested dictionary d with update dictionary u, maintaining deeper levels of d that are not in u.
Adapted from https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth
- Parameters:
d (dict) – (nested) dictionary to update.
other (dict or iterable, optional) – dictionary or iterable of key, value pairs with which to update.
accept_new_key (bool, str, optional) – If True, accepts new keys (keys that do not exist in d). If False, raises an error if attempting to update a key that does not exists in d (including deeper levels). If ‘parameters_mode’, do not accept new keys when updating a nnsa.Parameters object, but accept new keys in ordinary dict objects. Default to True (this is the default behaviour of Python dict.update()).
**kwargs (optional) – keyword arguments with which to update (in which the keyword is the key).
- Returns:
d (dict) – updated dictionary (in place, so return is in fact redundant).
- nnsa.utils.dictionaries.unflatten_dict(d)[source]
Restore the original dictionary after flattening the dict with flatten_dict().
Item d[‘a/b’] is mapped to [‘a’][‘b’] in the output dictionary. The reverse operation is achieved by the flatten_dict() function.
Examples
>>> d = {'A': {'a': 2, 'b': True}, 'B': {'a': 10, 'b': False}} >>> d_flat = flatten_dict(d) >>> print(d == unflatten_dict(d_flat)) True
- Parameters:
d (dict) – one-level dictionary with keys representing nested dictionary keys, separated by ‘/’.
- Returns:
unflat_dict (dict) – unflattened, i.e. nested, dictionary with same values as input d.
- nnsa.utils.dictionaries.write_dict_to_csv_as_table(filepath, table_dict)[source]
Write a dictionary to a csv, structuring it as a table with the keys of the dict as column headers.
- Parameters:
filepath (str) – filepath to save the csv to.
table_dict (dict) – dictionary that contains the table data. The values of the dictionary must be a list, and each element of the list will be put on a new row. The number of elements in a list may vary between the columns, i.e. under each column a varying number of elements may be put.
nnsa.utils.dummy_data module
Functions:
|
Add artefacts to x with lengths from min_len to max_len each with a probability of p. |
|
Generate test data with artefacts. |
|
Generate a random toy EEG example. |
|
Generate a time series based on a random walk. |
|
Generate random timeseries from AR1 model. |
|
Randomly insert n artefacts to x with at random locations with a specific minimum and maximum length, along a given axis. |
|
Return q = x + p*y, with p, such that corr(q, y) == corr(z, y). |
- nnsa.utils.dummy_data.add_artefacts(x, p, min_len=1, max_len=None, seed=None, fill_value=nan)[source]
Add artefacts to x with lengths from min_len to max_len each with a probability of p.
- nnsa.utils.dummy_data.generate_af_data(n_repeat, fs, n_samples, n_cos=100, snr=None, n_pad=None, identical=True, af_len=1, squeeze=True)[source]
Generate test data with artefacts.
- nnsa.utils.dummy_data.generate_eeg(fs=250, duration=600, amplitude=75, f_low=0, f_high=20, n_f=50, seed=None)[source]
Generate a random toy EEG example.
- Parameters:
fs (float, optional) – sample frequency in Hz.
duration (float, optional) – duration in seconds.
amplitude (float, optional) – max amplitude of EEG signal.
f_low (float, optional) – minimum frequency to be present in the EEG (in Hz).
f_high (float, optional) – maximum frequency to be present in the EEG (in Hz).
n_f (int, optional) – number of frequencies to add to the EEG.
seed (int, optional) – seed for the random generator.
- Returns:
signal (np.ndarray) – 1D signal array.
- nnsa.utils.dummy_data.generate_timeseries(size, axis=-1, demean=False, seed=43)[source]
Generate a time series based on a random walk.
- Parameters:
size (tuple or int) – size of time series (number of samples).
axis (int) – axis corresponding to time.
demean (bool) – if True, subtracts the mean from the resulting random walk.
seed (int) – seed for the random generator.
- Returns:
x (np.ndarray) – random walk time series.
- nnsa.utils.dummy_data.generate_timeseries_AR1(n, r, seed=None)[source]
Generate random timeseries from AR1 model.
- Parameters:
n (int) – number of samples to generate.
r (r) – lag-1 autocorrelation.
seed (int) – seed for random generator.
- Returns:
yr (np.ndarray) – generated time series (1D array).
nnsa.utils.event_detections module
Functions:
|
Compute the onsets and offsets of events in detected. |
|
Remove detected events that last > max_duration or < min_duration (number of samples). |
- nnsa.utils.event_detections.get_onsets_offsets(detected, fs=None)[source]
Compute the onsets and offsets of events in detected.
Treats nans as not detected.
- Parameters:
detected (np.ndarray) – 1D array with 1s (detected) and 0s (not-detected).
fs (float, optional) – sampling frequency (optional). If given, the returned onsets and offsets are in seconds. If not given, in samples.
- Returns:
onsets (np.ndarray) – array with indices (or times) corresponding to onsets.
offsets (np.ndarray) – array with indices (or times) corresponding to offsets.
Examples
>>> get_onsets_offsets(np.array([1, 1, 1, 0, 0, 1, 1, 0])) (array([0, 5], dtype=int64), array([3, 7], dtype=int64)) >>> get_onsets_offsets(np.array([0, 1, 1, 1, 0, 1, 1, 1])) (array([1, 5], dtype=int64), array([4, 8], dtype=int64)) >>> get_onsets_offsets(np.array([np.nan, 0, 1, np.nan, 1, 0])) (array([2, 4], dtype=int64), array([3, 5], dtype=int64))
- nnsa.utils.event_detections.time_threshold(detected, min_duration=0, max_duration=inf)[source]
Remove detected events that last > max_duration or < min_duration (number of samples).
- Parameters:
detected (np.ndarray) – 1D array containing 1s at samples with detected events and 0s at samples without the event (e.g. burst mask).
min_duration (int, optional) – minimum number of samples that a detected event must last. If the duration of the event (in number of samples) is less than this value, the event is removed by replacing the 1s with 0s (convert those samples from detected to undetected). Defaults to 0.
max_duration (int, optional) – maximum number of samples that a detected event can last. If the duration of the event (in number of samples) is larger than this value, the event is removed by replacing the 1s with 0s (convert those samples from detected to undetected). Defaults to np.inf.
- Returns:
detected_joined (np.ndarray) – new array with same shape as input detected.
Examples
>>> detected = np.array([1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1]) >>> print(detected) [1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 1 1 0 1] >>> detected_thresholded = time_threshold(detected, min_duration=3) >>> print(detected_thresholded) [1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0] >>> detected_thresholded = time_threshold(detected, max_duration=3) >>> print(detected_thresholded) [0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1] >>> detected_thresholded = time_threshold(detected, min_duration=3, max_duration=4) >>> print(detected_thresholded) [0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0]
nnsa.utils.keras module
Deprecated file. All is moved to nnsa.keras.data.
Functions:
|
|
|
nnsa.utils.mathematics module
Functions related to math.
Functions:
|
Derivative of abs(x). |
|
Compute entropy of a probability distribution p along an axis. |
|
Convert magnitude to decibels. |
|
Compute the cofactors of all elements of a matrix. |
|
Returns the next power of 2 such that 2**nextpow2(x) >= x. |
- nnsa.utils.mathematics.abs_der(x)[source]
Derivative of abs(x).
- Parameters:
x (np.ndarray) – array with values. The derivative is computed for every value in the array.
- Returns:
(np.ndarray) – array with the same shape as x containing the derivative of every entry.
- nnsa.utils.mathematics.compute_entropy(p, axis=-1)[source]
Compute entropy of a probability distribution p along an axis.
- Parameters:
p (np.ndarray) – probability distribution(s).
axis (int) – axis along which to compute the entropy.
- Returns:
entropy (float or np.ndarray) – entropy of distribution(s) in p.
- nnsa.utils.mathematics.matrix_cofactor(matrix, raise_error=True)[source]
Compute the cofactors of all elements of a matrix.
From: https://www.geeksforgeeks.org/how-to-find-cofactor-of-a-matrix-using-numpy/
- Parameters:
matrix (np.ndarray) – input matrix to find the cofactors of.
raise_error (bool) – raise error if an Exception occurs (True) or do not raise an exception and return nans (False).
- Returns:
cofactor (np.ndarray) – matrix with same shape as input containing the cofactors of each element in the input matrix.
- nnsa.utils.mathematics.nextpow2(x)[source]
Returns the next power of 2 such that 2**nextpow2(x) >= x.
- Parameters:
x (float or np.ndarray) – number to compute the next power of two of. If an array is given, the next power of two is computed element-wise.
- Returns:
(np.int32 or np.ndarray of np.int32) – element-wise next power of two of x.
nnsa.utils.normalization module
Contains functions for normalizing data.
Functions:
|
Normalize the signals a and b by division by max and flipping the sign if needed. |
|
Normzalize the EEG data per channel. |
- nnsa.utils.normalization.align(a, b, scale=True, flip_sign=True)[source]
Normalize the signals a and b by division by max and flipping the sign if needed.
- Parameters:
a (np.ndarray) – 1D array.
b (np.ndarray) – 1D array.
scale (bool, optional) – if True, scales the signals by divisind by their maximum.
flip_sign (bool, optional) – if True, flips the sign of b if it results in a better match between a and b.
- Returns:
a_out (np.ndarray) – transformed a.
b_out (np.ndarray) – transformed b.
- nnsa.utils.normalization.normalize_eeg_channels(x, mean=None, std=None)[source]
Normzalize the EEG data per channel.
- Parameters:
x (np.ndarray) – EEG data with shape (batch_size, num_samples, num_channels), where num_samples are the number of samples in one EEG signal, i.e. it corresponds to the time dimension of the EEG data.
mean (np.ndarray, optional) – array of shape (num_channels) with a value for each channel used to normalize the data. If None, the mean of the EEG data in x will be computed per channel.
std (np.ndarray, optional) – array of shape (num_channels) with a value for each channel used to normalize the data. If None, the std of the EEG data in x will be computed per channel.
- Returns:
x_normalized (np.ndarray) – array with same shape as x containing normalized values.
mean (np.ndarray) – array of shape (num_channels) with a value for each channel used to normalize the data.
std (np.ndarray) – array of shape (num_channels) with a value for each channel used to normalize the data.
nnsa.utils.objects module
General functions for Python objects.
Functions:
|
Return string with basic information about object. |
|
Convert a string ontaining the name of a class to a callable object. |
|
Returns a list with names of public attributes of an object. |
|
Returns a list with names of public (callable) methods of an object. |
|
Print object summary: class name, public attributes, public methods. |
- nnsa.utils.objects.basic_repr(obj)[source]
Return string with basic information about object.
Useful as a default for __repr__ method.
- Parameters:
obj (object) – the object to print.
- Returns:
(str) – string with info about object (class name and attributes).
- nnsa.utils.objects.convert_to_nnsa_class_callable(class_name)[source]
Convert a string ontaining the name of a class to a callable object.
- Parameters:
class_name (str) – string of a class name defined in the nnsa package.
- Returns:
class_def (type) – class definition corresponding to input class_name.
Examples
>>> from nnsa import ResultBase >>> a = convert_to_nnsa_class_callable('ResultBase')(None) >>> b = ResultBase(None) # Equivalent to the above statement. >>> assert_equal(a, b)
- nnsa.utils.objects.list_attrs(obj)[source]
Returns a list with names of public attributes of an object.
- Parameters:
obj (object) – the object of which to get the attributes of.
- Returns:
(list of str) – list with strings containing the attribute names of the object.
nnsa.utils.other module
Utils that do not need their own category.
Functions:
|
Convert a string to the specified data type. |
|
Convert a string to the corresponding data type, automatically chosing the data type. |
|
Enumerate n labels with text label. |
|
Compute the inter quartile range. |
|
Check if a string is numeric. |
Print error message and traceback of error e which is handled/caught. |
- nnsa.utils.other.convert_string(value, target_type)[source]
Convert a string to the specified data type.
Supported types: strings, numbers, tuples, lists, dicts, booleans, and None.
- Parameters:
value (str) – string to be converted.
target_type (str) – string specifying the type to convert the string to (from type(a).__name__)
- Returns:
converted_value – value converted to corresponding data type.
- nnsa.utils.other.convert_string_auto(value)[source]
Convert a string to the corresponding data type, automatically chosing the data type.
Supported types: strings, numbers, tuples, lists, dicts, booleans, and None. Note that if the first character of value is alphabetical, it is considered a string, unless its ‘False’, ‘True’, or ‘None’.
E.g. a = convert_string(‘10’) print(a) # 10 print(type(a)) # <class ‘int’>
- Parameters:
value (str) – string to be converted.
- Returns:
converted_value – value converted to corresponding data type.
- nnsa.utils.other.enumerate_label(n, label='Label')[source]
Enumerate n labels with text label.
- Parameters:
n (int) – number of labels to create.
label (str, optional) – label text.
- Returns:
(list of str) – enumerated labels.
- nnsa.utils.other.iqr(x, *args, **kwargs)[source]
Compute the inter quartile range.
Ignores nan values.
- Parameters:
x (np.ndarray) – array with the data.
*args (optional) – positional arguments for np.percentile.
**kwargs (optional) – keyword arguments for np.percentile.
- Returns:
(float) – IQR.
- nnsa.utils.other.is_numeric(s)[source]
Check if a string is numeric.
- Parameters:
s (str) – string to check.
- Returns:
(bool) – True or False indicating whether the string is a number.
- nnsa.utils.other.print_exception_info(e)[source]
Print error message and traceback of error e which is handled/caught.
- Parameters:
e (Exception-derived) – exception to print.
Examples
>>> a = [1, 2, 3] >>> try: ... idx = a.index(0) ... except ValueError as ex: ... print_exception_info(ex) ... idx = None >>> print(idx) None
nnsa.utils.paths module
This module contains functions dealing with file and directory paths.
Functions:
|
Check if directory in filepath exists and create the directory if not. |
|
Check validity of extension of a filename for writing data file. |
|
Check if filepath already exists and raise an error if it does. |
|
Get the filename of a filepath, without extension and with spaces replaced by '_'. |
|
Return a list with paths of files living in directory and with pattern in the filename. |
|
Return a directory path for saving the output that a script generates. |
|
Open a file dialog and let the user select a (new) file or directory. |
|
Recursively split a filepath (ignores the drive) and return its parts in a tuple. |
- nnsa.utils.paths.check_directory_exists(directory=None, filepath=None)[source]
Check if directory in filepath exists and create the directory if not.
Specify either directory or filepath.
- Parameters:
directory (str, optional) – path to a directory.
filepath (str, optional) – path of a file. Checks the corresponding directory.
- nnsa.utils.paths.check_file_extension(filepath, valid_extensions)[source]
Check validity of extension of a filename for writing data file.
- Parameters:
filepath (str) – file path of a file.
valid_extensions (str or list of str) – string specifying the valid extension or a list of (case sensitive) extensions that are valid (without leading dot).
- Returns:
(str) – file extension (without leading dot).
- nnsa.utils.paths.check_filename_exists(filepath)[source]
Check if filepath already exists and raise an error if it does.
- Parameters:
filepath (str) – file path of a file.
- nnsa.utils.paths.get_filename(filepath)[source]
Get the filename of a filepath, without extension and with spaces replaced by ‘_’.
- Parameters:
filepath (str) – filepath.
- Returns:
filename (str) – filename without file extension and without white spaces.
- nnsa.utils.paths.get_filepaths(directory, pattern, case_sensitive=False, subdirectories=False, raise_error=False)[source]
Return a list with paths of files living in directory and with pattern in the filename.
See fnmatch.fnmatch() for the pattern matching. Use * for wildcards.
The paths are absolute if directory is absolute and relative if directory is relative.
- Parameters:
directory (str) – path to directory in which the files reside.
pattern (str) – pattern that the returned filenames must contain.
case_sensitive (bool, optional) – if True, pattern is case-sensitive. If False, pattern is not case-sensitive. Defaults to False.
subdirectories (bool) – if True, also look for files in subdirectories. If False, only look for files directly in the provided directory. Defaults to False.
raise_error (bool) – set to True to raise an error if there were no files found.
- Returns:
matching_filepaths (list) – filepaths of files in directory that match pattern.
- nnsa.utils.paths.get_output_dir(output_root, create_unique=False)[source]
Return a directory path for saving the output that a script generates.
A directory under output_dir is created using the path of the script that calls this function. The output directory is created such that a similar structure is maintained in output as the code that generate the output. E.g.: if a file nnsa/python/scripts/example.py calls this function, and the output is nnsa/output, then the returned directory path is nnsa/output/python/scripts/example.
If the directory does not exist, the directory is created.
- Parameters:
output_root (str) – path to the output root directory. The output dir will be located under this root directory.
create_unique (bool, optional) – if True, creates a unique output directory with a name based on the current date and time. If False, does not create this additional unique directory. Defaults to False.
- Returns:
dir_out (str) – path to a directory for saving outputs of the script that calls this function.
- nnsa.utils.paths.select_path(dialog_type, iconbitmap=None, **kwargs)[source]
Open a file dialog and let the user select a (new) file or directory.
- Parameters:
dialog_type (str) – if ‘select_file’, lets the user select an existing file. if ‘select_files’, lets the user select multiple existing files. If ‘select_directory’, lets the user select a directory. If ‘saveas_file’, lets the user create a new file(name).
**kwargs (optional) – optional keyword arguments for tkinter’s filedialog functions. E.g.: filetypes (list): sequence of (label, pattern) tuples. The same label may occur with several patterns. initialdir (str): initial directory. title (str): message box title.
- Returns:
path (str) – the selected path.
nnsa.utils.pkl module
Functions:
|
|
|
nnsa.utils.plotting module
This module contains functions dealing matplotlib plots.
Classes:
|
https://matplotlib.org/gallery/animation/image_slices_viewer.html |
|
Select and deselect points in a matplotlib plot. |
Functions:
|
Color the background of the plot according to c. |
Compute an approriate linewidth for a noisy signal (e.g. |
|
|
Add enumeration to axes. |
|
Format x-axes (time in seconds) as h:mm:ss, or change to hours or minutes. |
Maximizes the current figure window. |
|
|
|
|
Make a pie chart for the data x using similar syntax as seaborn. |
|
Remove axis ticks and label of specified axis. |
|
Save a figure to several different output formats. |
|
Scale figsize based on predefined width.. |
|
Standard plot style. |
|
Apply vertical shading of an axis background for epochs defined by onsets and durations. |
|
Plot a stripplot and overlay the box of a boxplot. |
|
Return a suitable number of rows and columns for a subplot figure with n plots. |
- class nnsa.utils.plotting.IndexTracker(ax, data, plot_fun)[source]
Bases:
objecthttps://matplotlib.org/gallery/animation/image_slices_viewer.html
fig, ax = plt.subplots(1, 1)
X = np.random.rand(20, 20, 40)
- def plot_fun(data, ind):
plt.plot(data[ind, 0], data[ind, 1])
tracker = IndexTracker(ax, X, plot_fun)
fig.canvas.mpl_connect(‘scroll_event’, tracker.onscroll) plt.show()
Methods:
onscroll(event)update()
- class nnsa.utils.plotting.PointPicker(fig, points=None, ax=None, **kwargs)[source]
Bases:
objectSelect and deselect points in a matplotlib plot.
Collect selected points in a list (self.points). Highlights selected points in the plot.
Selected points can be save to an Excel file using self.save_points(‘filename.xls’). Saved points in Excel (with x and y columns) can be loaded using self.load_points(‘filename.xlsx’).
- Parameters:
points (list) – optional list with tuples of (x, y) coordinates that should already be included in the selection.
ax (plt.Axes) – axes in which to highlight the specified points in points.
**kwargs (optional) – kwargs for ax.scatter() to control how the slected points are highlighted (e.g. c=’y’, s=100).
Examples
>>> fig, ax = plt.subplots() >>> ax.set_title('click on points') Text(0.5, 1.0, 'click on points') >>> line, = ax.plot(np.random.rand(100), 'o', ... picker=True, pickradius=5) # 5 points tolerance >>> picker = PointPicker(fig)
Methods:
load_points(filepath)Load points from an Excel file with columns x and y.
onpick(event)save_points(filepath, **kwargs)Save point coordinates to an Excel file.
Attributes:
Return list of x coordinates of selected points.
Return list of y coordinates of selected points.
- load_points(filepath)[source]
Load points from an Excel file with columns x and y.
Hint: save as .xls to be able to have the Excel file open in Excel while loading in Python.
- property points
- property xpoints
Return list of x coordinates of selected points.
- property ypoints
Return list of y coordinates of selected points.
- nnsa.utils.plotting.color_background(x, c, ylim=None, ax=None, **kwargs)[source]
Color the background of the plot according to c.
- Parameters:
x (np.ndarray) – x-locations of the levels in c.
c (np.ndarray) – array with color intensities (same shape as x).
ylim (np.ndarray, optional) – optional lower and upper y limit to color.
ax (plt.axes, optional) – axes to color.
plot_kwargs (dict, optional) – keyword arguments for plt.contourf.
- nnsa.utils.plotting.compute_linewidth(y)[source]
Compute an approriate linewidth for a noisy signal (e.g. EEG).
- Parameters:
y (np.ndarray) – data array that is plotted.
- Returns:
linewidth (float) – an appropriate linewidth for plotting the data y.
- nnsa.utils.plotting.enumerate_axes(axes, xloc=-0.1, yloc=1.05, style='alphabet', capitalize=False, postfix='', **kwargs)[source]
Add enumeration to axes. E.g. a, b, c.
- Parameters:
axes (list, tuple, np.ndarray) – list of axes to enumerate.
xloc (float) – x-coordinate for the text. By default this is in normalized axis coordinates. Specify transform (as kwargs) to use a different coordinate system.
yloc (float) – y-coordinate for the text. By default this is in normalized axis coordinates. Specify transform (as kwargs) to use a different coordinate system. style (str): specify which enumration style to use. Choose from: ‘alphabet’.
capitalize (bool) – whether to capitalize the enumeration.
postfix (str) – optional postfix to add to the enumeration. E.g. ‘)’.
**kwargs (dict, optional) – for ax.text().
- nnsa.utils.plotting.format_time_axis(time_scale=None, relative=False, ax=None)[source]
Format x-axes (time in seconds) as h:mm:ss, or change to hours or minutes.
- Parameters:
time_scale (str) – the time scale to convert to. Choose from ‘seconds’, ‘minutes’, ‘hours’ or None (which displays the time as h:mm:ss).
relative (bool) – if True, sets the first time point (xlim()[0]) to 0.
ax (plt.Axes) – axes to change. If None, takes the current axes.
- nnsa.utils.plotting.pieplot(x, weight=None, data=None, ax=None, add_legend=True, order=None, palette=None, normalize=True, add_labels=True, legend_kwargs=None, **kwargs)[source]
Make a pie chart for the data x using similar syntax as seaborn.
- nnsa.utils.plotting.remove_ticks(axis, **kwargs)[source]
Remove axis ticks and label of specified axis.
- Parameters:
axis (str or tuple) – the axis to remove the ticks and label from. Either ‘x’, ‘y’, ‘xy’ or (‘x’, ‘y’).
- nnsa.utils.plotting.save_fig_as(figname=None, directory='', filepath=None, info=None, formats=None, verbose=1, **kwargs)[source]
Save a figure to several different output formats.
- Parameters:
figname (str) – name of the figure, will be the filename.
directory (str) – directory in which to save.
filepath (str) – instead of specifying figname and directory, you can specify filepath. filepath will be os.path.join(directory, figname).
info (str) – info which will be written to a .txt file with the same name (e.g. the path to the script creating the figure). If None, no .txt file will be created.
formats (tuple, list) – list with formats to save to, e.g. (“eps”, “tiff”, “png”, “pdf”, “svg”). If specified, this overrides any extension that was specified in figname or filepath.
verbose (int) – if 1, prints a message on success.
**kwargs – for plt.savefig().
- nnsa.utils.plotting.scale_figsize(figsize, width, unit='cm')[source]
Scale figsize based on predefined width..
- Parameters:
figsize (list, tuple) – (width, height) ratio.
width (float) – desired width of the figure (in inches by default).
unit (str) – ‘inch’ or ‘cm’. The unit of width.
- Returns:
new_figsize (np.ndarray) – rescaled figsize in inches, with figsize[0] eqaul to width.
- nnsa.utils.plotting.set_plot_style(backend=None)[source]
Standard plot style.
- Parameters:
backend (str, optional) – matplotlib backend to use. If None, the current backend is used. Defaults to None.
- nnsa.utils.plotting.shade_axis(onsets, durations, labels=None, color=None, alpha=0.4, orientation='horizontal', add_legend=True, legend_kwargs=None, ax=None)[source]
Apply vertical shading of an axis background for epochs defined by onsets and durations.
- Parameters:
onsets (iterable) – onsets for the epochs to shade (in the dimension of the x-axis).
durations (iterable) – durations of the epochs to shade.
labels (iterable, optional) – labels corresponding to the epochs to shade.
color (dict or str, optional) – dict mapping a label to a color or one color for all.
alpha (float, optional) – the transparity level of the shading. Defaults to 0.5.
orientation (str, optional) – whether the plot is ‘horizontal’ (onsets are on the x-axis), or ‘vertical’ (onsets are on the y-axis). Defaults to ‘horizontal’.
add_legend (bool, optional) – if True, add a legend explaining the shading colours (only if color is a dict). If False, adds not legend. Defaults to True.
legend_kwargs (dict, optional) – if add_legend is True, legend_kwargs are passed to the plt.legend() function as optional keyword arguments. Defaults to None.
ax (plt.Axes, optional) – matplotlib axis to shade. If None, the current axis will be used. Defaults to None.
- Returns:
h (list) – list with handles for the shading spans.
- nnsa.utils.plotting.stripboxplot(x=None, y=None, hue=None, style=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, markers=None, ax=None, mediansize=0.75, boxkwargs=None, stripkwargs=None, legendkwargs=None)[source]
Plot a stripplot and overlay the box of a boxplot.
- Parameters:
inputs (Most) – Common inputs to seaborn’s boxplot and stripplot functions. See seaborn.boxplot and/or seaborn.stripplot.
style – column in data for which to use different markers in the stripplot.
markers – list or dict specifying the markers to be used for each unique marker category.
mediansize – length of the median line in the boxplot, as a fraction of the boxplot width.
boxkwargs – kwargs for seaborn’s boxplot function.
stripkwargs – kwargs for seaborn’s stripplot function.
legendkwargs – kwargs for matplotlib’s legend.
- Returns:
ax – axes handle.
- nnsa.utils.plotting.subplot_rows_columns(n, minimize='rows')[source]
Return a suitable number of rows and columns for a subplot figure with n plots.
- Parameters:
n (int) – number of plots in the subplot figure.
minimize (str, optional) – if ‘rows’, the number of rows will be <= number of columns. If ‘columns’, the number of columns will be <= number of rows.
- Returns:
nrows (int) – number of rows for the subplot figure.
ncols (int) – number of columns for the subplot figure.
nnsa.utils.scalebars module
Classes:
|
Functions:
|
Add scalebars to axes |
- class nnsa.utils.scalebars.AnchoredScaleBar(*args: Any, **kwargs: Any)[source]
Bases:
AnchoredOffsetbox
- nnsa.utils.scalebars.add_scalebar(ax, matchx=True, matchy=True, hidex=True, hidey=True, **kwargs)[source]
Add scalebars to axes
Adds a set of scale bars to ax, matching the size to the ticks of the plot and optionally hiding the x and y axes
ax : the axis to attach ticks to
- matchx,matchyif True, set size of scale bars to spacing between ticks
if False, size should be set using sizex and sizey params
hidex,hidey : if True, hide x-axis and y-axis of parent
**kwargs : additional arguments passed to AnchoredScaleBars
Returns created scalebar object
nnsa.utils.segmentation module
Functions:
|
The total number of segments that segment_generator() will generate. |
|
Segment the data in x along the specified axis and return an array with all segments. |
|
Compute the segments time array given the number of segment, length of one segment and the overlap between segments. |
|
Return a generator that segments the data in x along the specified axis. |
- nnsa.utils.segmentation.compute_n_segments(x, segment_length, overlap=0, fs=1, axis=0)[source]
The total number of segments that segment_generator() will generate.
- Parameters:
x (np.ndarray) – see segment_generator().
segment_length (float) – see segment_generator().
overlap (float, optional) – see segment_generator().
fs (float, optional) – see segment_generator().
axis (int, optional) – see segment_generator().
- Returns:
n_segments (int) – number of segments that segment_generator() will generate.
- nnsa.utils.segmentation.get_all_segments(x, segment_length, overlap=0, fs=1, axis=0)[source]
Segment the data in x along the specified axis and return an array with all segments.
- Parameters:
x (np.ndarray) – see segment_generator().
segment_length (float) – see segment_generator().
overlap (float, optional) – see segment_generator().
fs (float, optional) – see segment_generator().
axis (int, optional) – see segment_generator().
- Returns:
all_segments (np.ndarray) – array with all segments, where the first axis corresponds to the segments.
- nnsa.utils.segmentation.get_segment_times(num_segments, segment_length, overlap, offset=None)[source]
Compute the segments time array given the number of segment, length of one segment and the overlap between segments.
- Parameters:
num_segments (int) – total numer of segments.
segment_length (float) – segment length (in seconds).
overlap (float) – overlap between succesive segments (in seconds).
offset (float, optional) – offset for the segment times. If None, the offset will equal segment_length/2, so that the segment times will fall in the middle of the segments. Defaults to None.
- Returns:
segment_times (np.ndarray) – time (in seconds) array for the axis corresponding to segments.
- nnsa.utils.segmentation.segment_generator(x, segment_length, overlap=0, fs=1, axis=0, error_mode='raise')[source]
Return a generator that segments the data in x along the specified axis.
- Parameters:
x (np.ndarray) – array to be segmented.
segment_length (float) – length of the segment in seconds (specify fs). If None, uses entire signal length (will yield 1 segment).
overlap (float, optional) – overlap between successive segments in seconds (specify fs). Defaults to 0.
fs (float, optional) – sample frequency. By default fs is 1, meaning that the segment_length and overlap can be given as number of samples. Defaults to 1.
axis (int, optional) – the axis along which to segment the data. Defaults to 0.
- Yields:
(np.ndarray) – the next segment.
Examples
>>> x = np.arange(11) >>> seg_gen = segment_generator(x, segment_length=3, overlap=1, fs=1) >>> np.asarray(list(seg_gen)) array([[ 0, 1, 2], [ 2, 3, 4], [ 4, 5, 6], [ 6, 7, 8], [ 8, 9, 10]])
nnsa.utils.testing module
General functions useful for testing.
Functions:
|
Tests if two objects are equal (like numpy's assert_equal). |
|
Make a beep sound on Windows systems. |
- nnsa.utils.testing.assert_equal(actual, desired)[source]
Tests if two objects are equal (like numpy’s assert_equal).
Recursive algorithm. Base case is when the desired object is an object accepted by numpy’s assert_equal (scalars, lists, tuples, dictionaries, numpy arrays and None). In that case, assert_equal will be called to compare the objects. In the other cases, where the desired object is some custom, arbitrary object, the function will be called recursively on the objects’ attributes.
- Parameters:
actual (object) – the object to check.
desired (object) – the expected object.
Returns:
Module contents
Module containing commonly used utility functions.
Functions:
|
Check if directory in filepath exists and create the directory if not. |
|
Check validity of extension of a filename for writing data file. |
|
Check if filepath already exists and raise an error if it does. |
Maximizes the current figure window. |
|
Decorator that print elasped time and change in available memory when calling the decorated function. |
|
|
Return a suitable number of rows and columns for a subplot figure with n plots. |
- nnsa.utils.check_directory_exists(directory=None, filepath=None)[source]
Check if directory in filepath exists and create the directory if not.
Specify either directory or filepath.
- Parameters:
directory (str, optional) – path to a directory.
filepath (str, optional) – path of a file. Checks the corresponding directory.
- nnsa.utils.check_file_extension(filepath, valid_extensions)[source]
Check validity of extension of a filename for writing data file.
- Parameters:
filepath (str) – file path of a file.
valid_extensions (str or list of str) – string specifying the valid extension or a list of (case sensitive) extensions that are valid (without leading dot).
- Returns:
(str) – file extension (without leading dot).
- nnsa.utils.check_filename_exists(filepath)[source]
Check if filepath already exists and raise an error if it does.
- Parameters:
filepath (str) – file path of a file.
- nnsa.utils.print_efficiency(f)[source]
Decorator that print elasped time and change in available memory when calling the decorated function.
- nnsa.utils.subplot_rows_columns(n, minimize='rows')[source]
Return a suitable number of rows and columns for a subplot figure with n plots.
- Parameters:
n (int) – number of plots in the subplot figure.
minimize (str, optional) – if ‘rows’, the number of rows will be <= number of columns. If ‘columns’, the number of columns will be <= number of rows.
- Returns:
nrows (int) – number of rows for the subplot figure.
ncols (int) – number of columns for the subplot figure.