heterogeneity_index#
Heterogeneity Index.
Implementation details
The core function computing components should return three arrays. However Dask does
not support having multiple outputs. I tried to add a np.stack operation after
getting the components, but it would throw off Dask completely (it would compute the
components three times, a real mess).
Instead, the core function returns a single output array, with an additional dimension at the end that corresponds to the components.
Note
I decided to put the components dimension last (y,x,c) because the main loop is on y/x. I guess it is better to put it on the fastest loop, but I would have to run benchmarks to make sure it is more efficient.
To apply numba.guvectorize we have a slight issue since all the output dimensions
must also appear in the inputs. That means we have to pass a dummy argument of size 3
(c).
- COMPONENTS_NAMES = ['stdev', 'skew', 'bimod']#
Components short name, in their order of appearance in function signatures.
- DEFAULT_DIMS: list[Hashable] = ['lat', 'lon']#
Default dimensions names.
Used for Xarray input where the dims argument is None and window_size is not a Mapping.
- apply_coefficients(components: Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]], coefficients: Mapping[str, float]) ndarray[tuple[Any, ...], dtype[_ScalarT]]#
- apply_coefficients(components: Sequence[Array], coefficients: Mapping[str, float]) Array
- apply_coefficients(components: Dataset | Sequence[DataArray], coefficients: Mapping[str, float]) DataArray
Return Heterogeneity Index computed from un-normalized components.
- Parameters:
components – Either a
xarray.Datasetcontaining the three components, such as returned fromcomponents_xarray(), or three arrays (from Numpy, Dask, or Xarray) in the order defined byCOMPONENTS_NAMES(by default,stdev,skew,bimod).coefficients – Dictionnary of the components normalization coefficients. If the coefficient for the HI is present, it will be applied, otherwise it will be taken equal to 1.
- Returns:
Normalized HI (single variable).
- coefficient_hi(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#
Compute final normalization coefficient for the HI.
Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters
quantile_targetandhi_limit).- Parameters:
components (Dataset | Sequence[DataArray] | Sequence[Array] | Sequence[NDArray]) – Either a
xarray.Datasetcontaining the three components, such as returned fromcomponents_xarray(), or three arrays (from Numpy, Dask, or Xarray) in the order defined byCOMPONENTS_NAMES(by default,stdev,skew,bimod).coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below
hi_limitonce normalized. Should be between 0 and 1.hi_limit (float) – See
quantile_target.kwargs (Any) – Arguments passed to either
numpy.histogram(),dask.array.histogram()orxarray_histogram.core.histogram().
- Returns:
Coefficient to normalize the HI with.
- Return type:
- coefficient_hi_dask(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#
Compute final normalization coefficient for the HI.
Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters
quantile_targetandhi_limit).- Parameters:
components (Sequence[Array]) – Three arrays in the order defined by
COMPONENTS_NAMES(by default,stdev,skew,bimod).coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below
hi_limitonce normalized. Should be between 0 and 1.hi_limit (float) – See
quantile_target.kwargs (Any) – Arguments passed to
dask.array.histogram().
- Returns:
Coefficient to normalize the HI with.
- Return type:
- coefficient_hi_numpy(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#
Compute final normalization coefficient for the HI.
Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters
quantile_targetandhi_limit).- Parameters:
components (Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]]) – Three arrays in the order defined by
COMPONENTS_NAMES(by default,stdev,skew,bimod).coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below
hi_limitonce normalized. Should be between 0 and 1.hi_limit (float) – See
quantile_target.kwargs (Any) – Arguments passed to
numpy.histogram().
- Returns:
Coefficient to normalize the HI with.
- Return type:
- coefficient_hi_xarray(components, coefficients, quantile_target=0.95, hi_limit=9.5, **kwargs)#
Compute final normalization coefficient for the HI.
Returns a coefficient to normalize the HI (the sum of the three normalized components) such that 95% of its values are below a limit value of 9.5. (These are the default values but can be changed with the parameters
quantile_targetandhi_limit).- Parameters:
components (Dataset | Sequence[DataArray]) – Either a
xarray.Datasetcontaining the three components, such as returned fromcomponents_xarray(), or three arrays in the order defined byCOMPONENTS_NAMES(by default,stdev,skew,bimod).coefficients (Mapping[str, float]) – Dictionnary of the components normalization coefficients.
quantile_target (float) – Fraction of the quantity of HI values that should be below
hi_limitonce normalized. Should be between 0 and 1.hi_limit (float) – See
quantile_target.kwargs (Any) – Arguments passed to
xarray_histogram.core.histogram().
- Returns:
Coefficient to normalize the HI with.
- Return type:
- coefficients_components(components)#
Find normalization coefficients for all components.
Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.
Coefficients are computed over the full range of data contained in input parameter
components.- Parameters:
components (Dataset | Sequence[DataArray] | Sequence[Array] | Sequence[NDArray]) – Either a
xarray.Datasetcontaining the three components, such as returned fromcomponents_xarray(), or three arrays (from Numpy, Dask, or Xarray) in the order defined byCOMPONENTS_NAMES(by default,stdev,skew,bimod).- Returns:
Dictionnary containing coefficients for each component.
- Return type:
- coefficients_components_dask(components)#
Find normalization coefficients for all components.
Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.
Coefficients are computed over the full range of data contained in input parameter
components.- Parameters:
components (Sequence[Array]) – Three arrays in the order defined by
COMPONENTS_NAMES(by default,stdev,skew,bimod).- Returns:
Dictionnary containing coefficients for each component.
- Return type:
- coefficients_components_numpy(components)#
Find normalization coefficients for all components.
Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.
Coefficients are computed over the full range of data contained in input parameter
components.
- coefficients_components_xarray(components)#
Find normalization coefficients for all components.
Coefficients are defined such that components contribute equally to the final HI variance. This function does not modify components, only returns the coefficients.
Coefficients are computed over the full range of data contained in input parameter
components.- Parameters:
components (Dataset | Sequence[DataArray]) – Either a
xarray.Datasetcontaining the three components, such as returned fromcomponents_xarray(), or three arrays in the order defined byCOMPONENTS_NAMES(by default,stdev,skew,bimod).- Returns:
Dictionnary containing coefficients for each component.
- Return type:
- components_dask(input_field, window_size, bins_width=0.1, bins_shift=0.0, axes=None, **kwargs)#
Compute components from Dask array.
- Parameters:
input_field (Array) – Array of the input field from which to compute the heterogeneity index.
window_size (int | Sequence[int]) – Total size of the moving window, in pixels. If an integer, the size is taken identical for both axis. Otherwise it must be a sequence of 2 integers specifying the window size along both axis. The order must then follow that of the data. For instance, for data arranged as (‘time’, ‘lat’, ‘lon’) if we specify
window_size=[3, 5]the window will be of size 3 along latitude and size 5 for longitude.bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality.
bins_shift (float) – If non-zero, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data.
axes (Sequence[int] | None) – Indices of the the y/lat and x/lon axes on which to work. If None (default), the last two axes are used.
kwargs – See available kwargs for universal functions at Generalized universal function API.
- Returns:
Tuple of components, in the order of
COMPONENTS_NAMES.- Return type:
- components_numpy(input_field, window_size, bins_width=0.1, bins_shift=0.0, axes=None, **kwargs)#
Compute components from a Numpy array.
- Parameters:
input_field (ndarray[tuple[int, ...], _DT]) – Array of the input field from which to compute the heterogeneity index.
window_size (int | Sequence[int]) – Total size of the moving window, in pixels. If an integer, the size is taken identical for both axis. Otherwise it must be a sequence of 2 integers specifying the window size along both axis. The order must then follow that of the data. For instance, for data arranged as (‘time’, ‘lat’, ‘lon’) if we specify
window_size=[3, 5]the window will be of size 3 along latitude and size 5 for longitude.bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality.
bins_shift (float) – If non-zero, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data.
axes (Sequence[int] | None) – Indices of the the y/lat and x/lon axes on which to work. If None (default), the last two axes are used.
kwargs – See available kwargs for universal functions at Generalized universal function API.
- Returns:
Tuple of components, in the order of
COMPONENTS_NAMES.- Return type:
tuple[ndarray[tuple[int, …], _DT], ndarray[tuple[int, …], _DT], ndarray[tuple[int, …], _DT]]
- components_xarray(input_field, window_size, bins_width=0.1, bins_shift=True, dims=None)#
Compute components from Xarray data.
- Parameters:
input_field (DataArray) – Array of the input field from which to compute the heterogeneity index.
window_size (int | Mapping[Hashable, int]) – Total size of the moving window, in pixels. If a single integer, the size is taken identical for both axis. Otherwise it can be a mapping of the dimensions names to the window size along this axis.
bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality.
If a non-zero
float, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data. If True (default), wether to shift and by which amount is determined using the input metadata.Set to 0 or False to not shift bins.
axes – Indices of the the y/lat and x/lon axes on which to work. If None (default), the last two axes are used.
kwargs – See available kwargs for universal functions at Generalized universal function API.
dims (Collection[Hashable] | None) – Names of the dimensions along which to apply the algorithm. Order is irrelevant, no reordering will be made between the two dimensions. If the window_size argument is given as a mapping, its keys are used instead. If not specified, is taken as module-wide variable
DEFAULT_DIMSwhich defaults to{'lat', 'lon'}.
- Returns:
Tuple of components, in the order of
COMPONENTS_NAMES.- Return type:
- get_components_from_values(values, bins_width, bins_shift)#
Compiled with Numba
Options: nopython: True, nogil: True
- Signatures:
(Array(float32, 1, ‘A’, False, aligned=True), float64, float64) -> array(float32, 1d, A)
(Array(float64, 1, ‘A’, False, aligned=True), float64, float64) -> array(float64, 1d, A)
Compute components from sequence of values (in the sliding window).
- Parameters:
values (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – Array of values from the sliding window. Should only contain valid (finite) values.
bins_width (float) – Width of the bins used to construct the histogram when computing the bimodality. Must have same units and same data type as the input array.
bins_shift (float) – If non-zero, shift the leftmost and rightmost edges of the bins by this amount to avoid artefacts caused by the discretization of the input field data.
kwargs – See available kwargs for universal functions at Generalized universal function API.
- Returns:
Tuple of the three components (scalar values): standard deviation, skewness, and bimodality. In this order.
- Return type: