bluemath_tk.interpolation package

Submodules

bluemath_tk.interpolation.analogs module

bluemath_tk.interpolation.gps module

bluemath_tk.interpolation.rbf module

class bluemath_tk.interpolation.rbf.RBF(sigma_min: float = 0.001, sigma_max: float = 0.1, sigma_diff: float = 0.0001, sigma_opt: float = None, kernel: str = 'gaussian', smooth: float = 1e-05)[source]

Bases: BaseInterpolation

Radial Basis Function (RBF) interpolation model.

sigma_min

The minimum value for the sigma parameter. This value might change in the optimization process.

Type:: float

sigma_max

The maximum value for the sigma parameter. This value might change in the optimization process.

Type:: float

sigma_diff

The difference between the optimal bounded sigma and the minimum and maximum sigma values. If the difference is less than this value, the optimization process continues.

Type:: float

kernel

The kernel to use for the RBF model. The available kernels are:

gaussian : exp(-1/2 * (r / const)**2)

multiquadratic : sqrt(1 + (r / const)**2)

inverse : 1 / sqrt(1 + (r / const)**2)

cubic : r**3

thin_plate : r**2 * log(r / const)

Type:: str

kernel_func

The kernel function to use for the RBF model.

Type:: function

smooth

The smoothness parameter.

Type:: float

subset_data

The subset data used to fit the model.

Type:: pd.DataFrame

normalized_subset_data

The normalized subset data used to fit the model.

Type:: pd.DataFrame

target_data

The target data used to fit the model.

Type:: pd.DataFrame

normalized_target_data

The normalized target data used to fit the model. This attribute is only set if normalize_target_data is True in the fit method.

Type:: pd.DataFrame

subset_directional_variables

The subset directional variables.

Type:: List[str]

target_directional_variables

The target directional variables.

Type:: List[str]

subset_processed_variables

The subset processed variables.

Type:: List[str]

target_processed_variables

The target processed variables.

Type:: List[str]

subset_custom_scale_factor

The custom scale factor for the subset data.

Type:: dict

target_custom_scale_factor

The custom scale factor for the target data.

Type:: dict

subset_scale_factor

The scale factor for the subset data.

Type:: dict

target_scale_factor

The scale factor for the target data.

Type:: dict

rbf_coeffs

The RBF coefficients for the target variables.

Type:: pd.DataFrame

opt_sigmas

The optimal sigmas for the target variables.

Type:: dict

fit -> None: Fits the model to the data.

predict -> pd.DataFrame: Predicts the data for the provided dataset.

fit_predict -> pd.DataFrame: Fits the model to the subset and predicts the interpolated dataset.

Notes

TODO: For the moment, this class only supports optimization for one parameter kernels.: For this reason, we only have sigma as the parameter to optimize. This sigma refers to the sigma parameter in the Gaussian kernel (but is used for all kernels).

Main reference for sigma optimization: https://link.springer.com/article/10.1023/A:1018975909870

Examples

import numpy as np
import pandas as pd
from bluemath_tk.interpolation.rbf import RBF

dataset = pd.DataFrame(
    {
        "Hs": np.random.rand(1000) * 7,
        "Tp": np.random.rand(1000) * 20,
        "Dir": np.random.rand(1000) * 360,
    }
)
subset = dataset.sample(frac=0.25)
target = pd.DataFrame(
    {
        "HsPred": subset["Hs"] * 2 + subset["Tp"] * 3,
        "DirPred": - subset["Dir"],
    }
)

rbf = RBF()

predictions = rbf.fit_predict(
    subset_data=subset,
    subset_directional_variables=["Dir"],
    target_data=target,
    target_directional_variables=["DirPred"],
    normalize_target_data=True,
    dataset=dataset,
    num_workers=4,
    iteratively_update_sigma=True,
)
print(predictions.head())

        ---------------------------------------------------------------------------------
        | Initializing RBF interpolation model with the following parameters:
        |    - sigma_min: 0.001
        |    - sigma_max: 0.1
        |    - sigma_diff: 0.0001
        |    - sigma_opt: None
        |    - kernel: gaussian
        |    - smooth: 1e-05
        | For more information, please refer to the documentation.
        | Recommended lecture: https://link.springer.com/article/10.1023/A:1018975909870
        ---------------------------------------------------------------------------------
        

      HsPred  DirPred_u  DirPred_v     DirPred
48.242093  -0.958856  -0.283893  253.507330
12.606319  -0.469256  -0.883062  207.986035
60.773883  -0.845550  -0.533897  237.730863
19.626148   0.083734   0.996488    4.803215
59.707966  -0.924939   0.380115  292.340822

References

[1] https://en.wikipedia.org/wiki/Radial_basis_function [2] https://en.wikipedia.org/wiki/Gaussian_function [3] https://link.springer.com/article/10.1023/A:1018975909870

fit(subset_data: DataFrame, target_data: DataFrame, subset_directional_variables: List[str] = [], target_directional_variables: List[str] = [], subset_custom_scale_factor: dict = {}, normalize_target_data: bool = True, target_custom_scale_factor: dict = {}, num_workers: int = None, iteratively_update_sigma: bool = False) → None[source]

Fits the model to the data.

Parameters:

subset_data (pd.DataFrame) – The subset data used to fit the model.
target_data (pd.DataFrame) – The target data used to fit the model.
subset_directional_variables (List[str], optional) – The subset directional variables. Default is [].
target_directional_variables (List[str], optional) – The target directional variables. Default is [].
subset_custom_scale_factor (dict, optional) – The custom scale factor for the subset data. Default is {}.
normalize_target_data (bool, optional) – Whether to normalize the target data. Default is True.
target_custom_scale_factor (dict, optional) – The custom scale factor for the target data. Default is {}.
num_workers (int, optional) – The number of workers to use for the optimization. Default is None.
iteratively_update_sigma (bool, optional) – Whether to iteratively update the sigma parameter. Default is False.

Notes

This function fits the RBF model to the data by:
1. Preprocessing the subset and target data.
2. Calculating the optimal sigma for the target variables.
3. Storing the RBF coefficients and optimal sigmas.
The number of threads to use for the optimization can be specified.

fit_predict(subset_data: DataFrame, target_data: DataFrame, dataset: DataFrame, subset_directional_variables: List[str] = [], target_directional_variables: List[str] = [], subset_custom_scale_factor: dict = {}, normalize_target_data: bool = True, target_custom_scale_factor: dict = {}, num_workers: int = None, iteratively_update_sigma: bool = False) → DataFrame[source]

Fits the model to the subset and predicts the interpolated dataset.

Parameters:

subset_data (pd.DataFrame) – The subset data used to fit the model.
target_data (pd.DataFrame) – The target data used to fit the model.
dataset (pd.DataFrame) – The dataset to predict (must have same variables than subset).
subset_directional_variables (List[str], optional) – The subset directional variables. Default is [].
target_directional_variables (List[str], optional) – The target directional variables. Default is [].
subset_custom_scale_factor (dict, optional) – The custom scale factor for the subset data. Default is {}.
normalize_target_data (bool, optional) – Whether to normalize the target data. Default is True.
target_custom_scale_factor (dict, optional) – The custom scale factor for the target data. Default is {}.
num_workers (int, optional) – The number of workers to use for the optimization. Default is None.
iteratively_update_sigma (bool, optional) – Whether to iteratively update the sigma parameter. Default is False.

Returns:

The interpolated dataset.

Return type:

pd.DataFrame

Notes

This function fits the model to the subset and predicts the interpolated dataset.

property kernel: str

property kernel_func: Callable

property normalized_subset_data: DataFrame

property normalized_target_data: DataFrame

property opt_sigmas: dict

predict(dataset: DataFrame, num_workers: int = None) → DataFrame[source]

Predicts the data for the provided dataset.

Parameters:

dataset (pd.DataFrame) – The dataset to predict (must have same variables than subset).
num_workers (int, optional) – The number of workers to use for the interpolation. Default is None.

Returns:

The interpolated dataset.

Return type:

pd.DataFrame

Raises:

ValueError – If the model is not fitted.

Notes

This function predicts the data by:
1. Reconstructing the data using the fitted coefficients.
2. Denormalizing the target data if normalize_target_data is True.
3. Calculating the degrees for the target directional variables.

property rbf_coeffs: DataFrame

rbf_kernels = {'cubic': <function cubic_kernel>, 'gaussian': <function gaussian_kernel>, 'inverse': <function inverse_kernel>, 'multiquadratic': <function multiquadratic_kernel>, 'thin_plate': <function thin_plate_kernel>}

property sigma_diff: float

property sigma_max: float

property sigma_min: float

property sigma_opt: float

property smooth: float

property subset_custom_scale_factor: dict

property subset_data: DataFrame

property subset_directional_variables: List[str]

property subset_processed_variables: List[str]

property subset_scale_factor: dict

property target_custom_scale_factor: dict

property target_data: DataFrame

property target_directional_variables: List[str]

property target_processed_variables: List[str]

property target_scale_factor: dict

exception bluemath_tk.interpolation.rbf.RBFError(message: str = 'RBF error occurred.')[source]

Bases: Exception

Custom exception for RBF class.

bluemath_tk.interpolation.rbf.cubic_kernel(r, const)[source]

bluemath_tk.interpolation.rbf.gaussian_kernel(r: float, const: float) → float[source]

This function calculates the Gaussian kernel for the given distance and constant.

Parameters:

r (float) – The distance.
const (float) – The constant (default name is usually sigma for gaussian kernel).

Returns:

The Gaussian kernel value.

Return type:

float

Notes

The Gaussian kernel is defined as: K(r) = exp(r^2 / 2 * const^2) (https://en.wikipedia.org/wiki/Gaussian_function)
Here, we are assuming the mean is 0.

bluemath_tk.interpolation.rbf.inverse_kernel(r, const)[source]

bluemath_tk.interpolation.rbf.multiquadratic_kernel(r, const)[source]

bluemath_tk.interpolation.rbf.thin_plate_kernel(r, const)[source]

bluemath_tk.interpolation.rbf_scipy module

class bluemath_tk.interpolation.rbf_scipy.RBF(sigma_min: float = 0.001, sigma_max: float = 1.0, sigma_opt: float = None, kernel: str = 'thin_plate_spline', smoothing: float = 0.0, degree: int = None, neighbors: int = None)[source]

Bases: BaseInterpolation

Radial Basis Function (RBF) interpolation model.

Here, scipy’s RBFInterpolator is used to interpolate the data. https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RBFInterpolator.html

Warning

This class is a Beta, results may not be accurate.

Module contents

Project: BlueMath_tk Sub-Module: interpolation Author: GeoOcean Research Group, Universidad de Cantabria Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)

class bluemath_tk.interpolation.RBF(sigma_min: float = 0.001, sigma_max: float = 0.1, sigma_diff: float = 0.0001, sigma_opt: float = None, kernel: str = 'gaussian', smooth: float = 1e-05)[source]

Bases: BaseInterpolation

Radial Basis Function (RBF) interpolation model.

sigma_min

The minimum value for the sigma parameter. This value might change in the optimization process.

Type:: float

sigma_max

The maximum value for the sigma parameter. This value might change in the optimization process.

Type:: float

sigma_diff

The difference between the optimal bounded sigma and the minimum and maximum sigma values. If the difference is less than this value, the optimization process continues.

Type:: float

kernel

The kernel to use for the RBF model. The available kernels are:

gaussian : exp(-1/2 * (r / const)**2)

multiquadratic : sqrt(1 + (r / const)**2)

inverse : 1 / sqrt(1 + (r / const)**2)

cubic : r**3

thin_plate : r**2 * log(r / const)

Type:: str

kernel_func

The kernel function to use for the RBF model.

Type:: function

smooth

The smoothness parameter.

Type:: float

subset_data

The subset data used to fit the model.

Type:: pd.DataFrame

normalized_subset_data

The normalized subset data used to fit the model.

Type:: pd.DataFrame

target_data

The target data used to fit the model.

Type:: pd.DataFrame

normalized_target_data

The normalized target data used to fit the model. This attribute is only set if normalize_target_data is True in the fit method.

Type:: pd.DataFrame

subset_directional_variables

The subset directional variables.

Type:: List[str]

target_directional_variables

The target directional variables.

Type:: List[str]

subset_processed_variables

The subset processed variables.

Type:: List[str]

target_processed_variables

The target processed variables.

Type:: List[str]

subset_custom_scale_factor

The custom scale factor for the subset data.

Type:: dict

target_custom_scale_factor

The custom scale factor for the target data.

Type:: dict

subset_scale_factor

The scale factor for the subset data.

Type:: dict

target_scale_factor

The scale factor for the target data.

Type:: dict

rbf_coeffs

The RBF coefficients for the target variables.

Type:: pd.DataFrame

opt_sigmas

The optimal sigmas for the target variables.

Type:: dict

fit -> None: Fits the model to the data.

predict -> pd.DataFrame: Predicts the data for the provided dataset.

fit_predict -> pd.DataFrame: Fits the model to the subset and predicts the interpolated dataset.

Notes

TODO: For the moment, this class only supports optimization for one parameter kernels.: For this reason, we only have sigma as the parameter to optimize. This sigma refers to the sigma parameter in the Gaussian kernel (but is used for all kernels).

Main reference for sigma optimization: https://link.springer.com/article/10.1023/A:1018975909870

Examples

import numpy as np
import pandas as pd
from bluemath_tk.interpolation.rbf import RBF

dataset = pd.DataFrame(
    {
        "Hs": np.random.rand(1000) * 7,
        "Tp": np.random.rand(1000) * 20,
        "Dir": np.random.rand(1000) * 360,
    }
)
subset = dataset.sample(frac=0.25)
target = pd.DataFrame(
    {
        "HsPred": subset["Hs"] * 2 + subset["Tp"] * 3,
        "DirPred": - subset["Dir"],
    }
)

rbf = RBF()

predictions = rbf.fit_predict(
    subset_data=subset,
    subset_directional_variables=["Dir"],
    target_data=target,
    target_directional_variables=["DirPred"],
    normalize_target_data=True,
    dataset=dataset,
    num_workers=4,
    iteratively_update_sigma=True,
)
print(predictions.head())

        ---------------------------------------------------------------------------------
        | Initializing RBF interpolation model with the following parameters:
        |    - sigma_min: 0.001
        |    - sigma_max: 0.1
        |    - sigma_diff: 0.0001
        |    - sigma_opt: None
        |    - kernel: gaussian
        |    - smooth: 1e-05
        | For more information, please refer to the documentation.
        | Recommended lecture: https://link.springer.com/article/10.1023/A:1018975909870
        ---------------------------------------------------------------------------------
        

      HsPred  DirPred_u  DirPred_v     DirPred
30.979464   0.157348   0.987543    9.053025
45.205554   0.956569  -0.291507  106.948215
56.210565  -0.963312   0.268385  285.568214
21.300834   0.695310  -0.718710  135.948109
25.424005   0.164326  -0.986406  170.541909

References

[1] https://en.wikipedia.org/wiki/Radial_basis_function [2] https://en.wikipedia.org/wiki/Gaussian_function [3] https://link.springer.com/article/10.1023/A:1018975909870

fit(subset_data: DataFrame, target_data: DataFrame, subset_directional_variables: List[str] = [], target_directional_variables: List[str] = [], subset_custom_scale_factor: dict = {}, normalize_target_data: bool = True, target_custom_scale_factor: dict = {}, num_workers: int = None, iteratively_update_sigma: bool = False) → None[source]

Fits the model to the data.

Parameters:

subset_data (pd.DataFrame) – The subset data used to fit the model.
target_data (pd.DataFrame) – The target data used to fit the model.
subset_directional_variables (List[str], optional) – The subset directional variables. Default is [].
target_directional_variables (List[str], optional) – The target directional variables. Default is [].
subset_custom_scale_factor (dict, optional) – The custom scale factor for the subset data. Default is {}.
normalize_target_data (bool, optional) – Whether to normalize the target data. Default is True.
target_custom_scale_factor (dict, optional) – The custom scale factor for the target data. Default is {}.
num_workers (int, optional) – The number of workers to use for the optimization. Default is None.
iteratively_update_sigma (bool, optional) – Whether to iteratively update the sigma parameter. Default is False.

Notes

This function fits the RBF model to the data by:
1. Preprocessing the subset and target data.
2. Calculating the optimal sigma for the target variables.
3. Storing the RBF coefficients and optimal sigmas.
The number of threads to use for the optimization can be specified.

fit_predict(subset_data: DataFrame, target_data: DataFrame, dataset: DataFrame, subset_directional_variables: List[str] = [], target_directional_variables: List[str] = [], subset_custom_scale_factor: dict = {}, normalize_target_data: bool = True, target_custom_scale_factor: dict = {}, num_workers: int = None, iteratively_update_sigma: bool = False) → DataFrame[source]

Fits the model to the subset and predicts the interpolated dataset.

Parameters:

subset_data (pd.DataFrame) – The subset data used to fit the model.
target_data (pd.DataFrame) – The target data used to fit the model.
dataset (pd.DataFrame) – The dataset to predict (must have same variables than subset).
subset_directional_variables (List[str], optional) – The subset directional variables. Default is [].
target_directional_variables (List[str], optional) – The target directional variables. Default is [].
subset_custom_scale_factor (dict, optional) – The custom scale factor for the subset data. Default is {}.
normalize_target_data (bool, optional) – Whether to normalize the target data. Default is True.
target_custom_scale_factor (dict, optional) – The custom scale factor for the target data. Default is {}.
num_workers (int, optional) – The number of workers to use for the optimization. Default is None.
iteratively_update_sigma (bool, optional) – Whether to iteratively update the sigma parameter. Default is False.

Returns:

The interpolated dataset.

Return type:

pd.DataFrame

Notes

This function fits the model to the subset and predicts the interpolated dataset.

is_fitted: bool

is_target_normalized: bool

property kernel: str

property kernel_func: Callable

property normalized_subset_data: DataFrame

property normalized_target_data: DataFrame

num_workers: int

property opt_sigmas: dict

predict(dataset: DataFrame, num_workers: int = None) → DataFrame[source]

Predicts the data for the provided dataset.

Parameters:

dataset (pd.DataFrame) – The dataset to predict (must have same variables than subset).
num_workers (int, optional) – The number of workers to use for the interpolation. Default is None.

Returns:

The interpolated dataset.

Return type:

pd.DataFrame

Raises:

ValueError – If the model is not fitted.

Notes

This function predicts the data by:
1. Reconstructing the data using the fitted coefficients.
2. Denormalizing the target data if normalize_target_data is True.
3. Calculating the degrees for the target directional variables.

property rbf_coeffs: DataFrame

rbf_kernels = {'cubic': <function cubic_kernel>, 'gaussian': <function gaussian_kernel>, 'inverse': <function inverse_kernel>, 'multiquadratic': <function multiquadratic_kernel>, 'thin_plate': <function thin_plate_kernel>}

row_chunks: int

property sigma_diff: float

property sigma_max: float

property sigma_min: float

property sigma_opt: float

property smooth: float

property subset_custom_scale_factor: dict

property subset_data: DataFrame

property subset_directional_variables: List[str]

property subset_processed_variables: List[str]

property subset_scale_factor: dict

property target_custom_scale_factor: dict

property target_data: DataFrame

property target_directional_variables: List[str]

property target_processed_variables: List[str]

property target_scale_factor: dict