heterogeneity_index#

Heterogeneity Index.

Implementation details

The core function computing components should return three arrays. However Dask does not support having multiple outputs. I tried to add a np.stack operation after getting the components, but it would throw off Dask completely (it would compute the components three times, a real mess).

Instead, the core function returns a single output array, with an additional dimension at the end that corresponds to the components.

Note

I decided to put the components dimension last (y,x,c) because the main loop is on y/x. I guess it is better to put it on the fastest loop, but I would have to run benchmarks to make sure it is more efficient.

To apply numba.guvectorize we have a slight issue since all the output dimensions must also appear in the inputs. That means we have to pass a dummy argument of size 3 (c).

COMPONENTS_NAMES = ['stdev', 'skew', 'bimod']#: Components short name, in their order of appearance in function signatures.

DEFAULT_DIMS: list[Hashable] = ['lat', 'lon']#

Default dimensions names.

Used for Xarray input where the dims argument is None and window_size is not a Mapping.

apply_coefficients(components: Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]], coefficients: Mapping[str, float]) → ndarray[tuple[Any, ...], dtype[_ScalarT]]#

apply_coefficients(components: Sequence[Array], coefficients: Mapping[str, float]) → Array

apply_coefficients(components: Dataset | Sequence[DataArray], coefficients: Mapping[str, float]) → DataArray

Return Heterogeneity Index computed from un-normalized components.

Parameters:

components – Either a xarray.Dataset containing the three components, such as returned from components_xarray(), or three arrays (from Numpy, Dask, or Xarray) in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
coefficients – Dictionnary of the components normalization coefficients. If the coefficient for the HI is present, it will be applied, otherwise it will be taken equal to 1.

Returns:

Normalized HI (single variable).

coefficient_hi(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#

Compute final normalization coefficient for the HI.

Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters quantile_target and hi_limit).

Parameters:

components (Dataset | Sequence[DataArray] | Sequence[Array] | Sequence[NDArray]) – Either a xarray.Dataset containing the three components, such as returned from components_xarray(), or three arrays (from Numpy, Dask, or Xarray) in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below hi_limit once normalized. Should be between 0 and 1.
hi_limit (float) – See quantile_target.
kwargs (Any) – Arguments passed to either numpy.histogram(), dask.array.histogram() or xarray_histogram.core.histogram().

Returns:

Coefficient to normalize the HI with.

Return type:

float

coefficient_hi_dask(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#

Compute final normalization coefficient for the HI.

Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters quantile_target and hi_limit).

Parameters:

components (Sequence[Array]) – Three arrays in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below hi_limit once normalized. Should be between 0 and 1.
hi_limit (float) – See quantile_target.
kwargs (Any) – Arguments passed to dask.array.histogram().

Returns:

Coefficient to normalize the HI with.

Return type:

float

coefficient_hi_numpy(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#

Compute final normalization coefficient for the HI.

Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters quantile_target and hi_limit).

Parameters:

components (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]]) – Three arrays in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below hi_limit once normalized. Should be between 0 and 1.
hi_limit (float) – See quantile_target.
kwargs (Any) – Arguments passed to numpy.histogram().

Returns:

Coefficient to normalize the HI with.

Return type:

float

coefficient_hi_xarray(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#

Compute final normalization coefficient for the HI.

Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters quantile_target and hi_limit).

Parameters:

components (Dataset | Sequence[DataArray]) – Either a xarray.Dataset containing the three components, such as returned from components_xarray(), or three arrays in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below hi_limit once normalized. Should be between 0 and 1.
hi_limit (float) – See quantile_target.
kwargs (Any) – Arguments passed to xarray_histogram.core.histogram().

Returns:

Coefficient to normalize the HI with.

Return type:

float

coefficients_components(components)#

Find normalization coefficients for all components.

Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.

Coefficients are computed over the full range of data contained in input parameter components.

Parameters:: components (Dataset | Sequence[DataArray] | Sequence[Array] | Sequence[NDArray]) – Either a xarray.Dataset containing the three components, such as returned from components_xarray(), or three arrays (from Numpy, Dask, or Xarray) in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
Returns:: Dictionnary containing coefficients for each component.
Return type:: dict[str, float]

coefficients_components_dask(components)#

Find normalization coefficients for all components.

Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.

Coefficients are computed over the full range of data contained in input parameter components.

Parameters:: components (Sequence[Array]) – Three arrays in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
Returns:: Dictionnary containing coefficients for each component.
Return type:: dict[str, float]

coefficients_components_numpy(components)#

Find normalization coefficients for all components.

Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.

Coefficients are computed over the full range of data contained in input parameter components.

Parameters:: components (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]]) – Three arrays in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
Returns:: Dictionnary containing coefficients for each component.
Return type:: dict[str, float]

coefficients_components_xarray(components)#

Find normalization coefficients for all components.

Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.

Coefficients are computed over the full range of data contained in input parameter components.

Parameters:: components (Dataset | Sequence[DataArray]) – Either a xarray.Dataset containing the three components, such as returned from components_xarray(), or three arrays in the order defined by COMPONENTS_NAMES (by default, stdev, skew, bimod).
Returns:: Dictionnary containing coefficients for each component.
Return type:: dict[str, float]

components_dask(input_field, window_size, bins_width=0.1, bins_shift=0.0, axes=None, **kwargs)#

Compute components from Dask array.

Parameters:

input_field (Array) – Array of the input field from which to compute the heterogeneity index.
window_size (int | Sequence[int]) – Total size of the moving window, in pixels. If an integer, the size is taken identical for both axis. Otherwise it must be a sequence of 2 integers specifying the window size along both axis. The order must then follow that of the data. For instance, for data arranged as (‘time’, ‘lat’, ‘lon’) if we specify window_size=[3, 5] the window will be of size 3 along latitude and size 5 for longitude.
bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality.
bins_shift (float) – If non-zero, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data.
axes (Sequence[int] | None) – Indices of the the y/lat and x/lon axes on which to work. If None (default), the last two axes are used.
kwargs – See available kwargs for universal functions at Generalized universal function API.

Returns:

Tuple of components, in the order of COMPONENTS_NAMES.

Return type:

tuple[Array, Array, Array]

components_numpy(input_field, window_size, bins_width=0.1, bins_shift=0.0, axes=None, **kwargs)#

Compute components from a Numpy array.

Parameters:

input_field (ndarray[tuple[int, ...], _DT]) – Array of the input field from which to compute the heterogeneity index.
window_size (int | Sequence[int]) – Total size of the moving window, in pixels. If an integer, the size is taken identical for both axis. Otherwise it must be a sequence of 2 integers specifying the window size along both axis. The order must then follow that of the data. For instance, for data arranged as (‘time’, ‘lat’, ‘lon’) if we specify window_size=[3, 5] the window will be of size 3 along latitude and size 5 for longitude.
bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality.
bins_shift (float) – If non-zero, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data.
axes (Sequence[int] | None) – Indices of the the y/lat and x/lon axes on which to work. If None (default), the last two axes are used.
kwargs – See available kwargs for universal functions at Generalized universal function API.

Returns:

Tuple of components, in the order of COMPONENTS_NAMES.

Return type:

tuple[ndarray[tuple[int, …], _DT], ndarray[tuple[int, …], _DT], ndarray[tuple[int, …], _DT]]

components_xarray(input_field, window_size, bins_width=0.1, bins_shift=True, dims=None)#

Compute components from Xarray data.

Parameters:

input_field (DataArray) – Array of the input field from which to compute the heterogeneity index.
window_size (int | Mapping[Hashable, int]) – Total size of the moving window, in pixels. If a single integer, the size is taken identical for both axis. Otherwise it can be a mapping of the dimensions names to the window size along this axis.
bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality.
bins_shift (float | bool) –
If a non-zero float, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data. If True (default), wether to shift and by which amount is determined using the input metadata.

Set to 0 or False to not shift bins.
axes – Indices of the the y/lat and x/lon axes on which to work. If None (default), the last two axes are used.
kwargs – See available kwargs for universal functions at Generalized universal function API.
dims (Collection[Hashable] | None) – Names of the dimensions along which to apply the algorithm. Order is irrelevant, no reordering will be made between the two dimensions. If the window_size argument is given as a mapping, its keys are used instead. If not specified, is taken as module-wide variable DEFAULT_DIMS which defaults to {'lat', 'lon'}.

Returns:

Tuple of components, in the order of COMPONENTS_NAMES.

Return type:

Dataset

get_components_from_values(values, bins_width, bins_shift)#

Compiled with Numba

Options: nopython: True, nogil: True
Signatures:
- (Array(float32, 1, ‘A’, False, aligned=True), float64, float64) -> array(float32, 1d, A)
- (Array(float64, 1, ‘A’, False, aligned=True), float64, float64) -> array(float64, 1d, A)

Compute components from sequence of values (in the sliding window).

Parameters:

values (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Array of values from the sliding window. Should only contain valid (finite) values.
bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality. Must have same units and same data type as the input array.
bins_shift (float) – If non-zero, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data.
kwargs – See available kwargs for universal functions at Generalized universal function API.

Returns:

Tuple of the three components (scalar values): standard deviation, skewness, and bimodality. In this order.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]