bluemath_tk.core package

Subpackages

Submodules

bluemath_tk.core.decorators module

bluemath_tk.core.decorators.validate_data_kma(func)[source]

Decorator to validate data in KMA class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_lhs(func)[source]

Decorator to validate data in LHS class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_mda(func)[source]

Decorator to validate data in MDA class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_pca(func)[source]

Decorator to validate data in PCA class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_rbf(func)[source]

Decorator to validate data in RBF class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_som(func)[source]

Decorator to validate data in SOM class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_xwt(func)[source]

Decorator to validate data in XWT class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.logging module

bluemath_tk.core.logging.get_file_logger(name: str, logs_path: str = None, level: int | str = 'INFO', console: bool = True, console_level: int | str = 'WARNING') Logger[source]

Creates and returns a logger that writes log messages to a file.

Parameters:
  • name (str) – The name of the logger.

  • logs_path (str, optional) – The file path where the log messages will be written. Default is None.

  • level (Union[int, str], optional) – The logging level. Default is “INFO”.

  • console (bool) – Whether to add or not console / terminal logs. Default is True.

  • console_level (Union[int, str], optional) – The logging level for console / terminal logs. Default is “WARNING”.

Returns:

Configured logger instance.

Return type:

logging.Logger

Examples

>>> from bluemath_tk.core.logging import get_file_logger
>>> # Create a logger that writes to "app.log"
>>> logger = get_file_logger("my_app_logger", "app.log")
>>> # Log messages
>>> logger.info("This is an info message.")
>>> logger.warning("This is a warning message.")
>>> logger.error("This is an error message.")
>>> # The output will be saved in "app.log" with the format:
>>> # 2023-10-22 14:55:23,456 - my_app_logger - INFO - This is an info message.
>>> # 2023-10-22 14:55:23,457 - my_app_logger - WARNING - This is a warning message.
>>> # 2023-10-22 14:55:23,458 - my_app_logger - ERROR - This is an error message.

bluemath_tk.core.models module

class bluemath_tk.core.models.BlueMathModel[source]

Bases: ABC

Abstract base class for handling default functionalities across the project.

check_nans(data: ndarray | Series | DataFrame | DataArray | Dataset, replace_value: float | callable = None, raise_error: bool = False) ndarray | Series | DataFrame | DataArray | Dataset[source]

Checks for NaNs in the data and optionally replaces them.

Parameters:
  • data (np.ndarray, pd.Series, pd.DataFrame, xr.DataArray or xr.Dataset) – The data to check for NaNs.

  • replace_value (float or callable, optional) – The value to replace NaNs with. If None, NaNs will not be replaced. If a callable is provided, it will be called and the result will be returned. Default is None.

  • raise_error (bool, optional) – Whether to raise an error if NaNs are found. Default is False.

Returns:

data – The data with NaNs optionally replaced.

Return type:

np.ndarray, pd.Series, pd.DataFrame, xr.DataArray or xr.Dataset

Raises:

ValueError – If NaNs are found and raise_error is True.

Notes

  • This method is intended to be used in classes that inherit from the BlueMathModel class.

  • The method checks for NaNs in the data and optionally replaces them with the specified value.

denormalize(normalized_data: DataFrame, scale_factor: dict) DataFrame[source]

Denormalize data using provided scale_factor. More info in bluemath_tk.core.operations.denormalize.

Parameters:
  • normalized_data (pd.DataFrame) – The normalized data to denormalize.

  • scale_factor (dict) – The scale factors used for denormalization.

Returns:

data – The denormalized data.

Return type:

pd.DataFrame

destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler. More info in bluemath_tk.core.operations.destandarize.

Parameters:
  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.

  • scaler (StandardScaler) – Scaler object used for standarization.

Returns:

data – Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

static get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)


(u=-1, v=0) <———> (u=1, v=0)


(u=0, v=-1)

Parameters:
  • xu (np.ndarray) – The u component.

  • xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

static get_metrics(data1: DataFrame | Dataset, data2: DataFrame | Dataset) DataFrame[source]

Gets the metrics of the model.

Parameters:
  • data1 (pd.DataFrame or xr.Dataset) – The first dataset.

  • data2 (pd.DataFrame or xr.Dataset) – The second dataset.

Returns:

metrics – The metrics of the model.

Return type:

pd.DataFrame

Raises:
  • ValueError – If the DataFrames or Datasets have different shapes.

  • TypeError – If the inputs are not both DataFrames or both xarray Datasets.

get_num_processors_available() int[source]

Gets the number of processors available.

Returns:

The number of processors available.

Return type:

int

static get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,

beign 0° the North direction, and increasing clockwise.

0° N


270° W <———> 90° E


90° S

Parameters:

x_deg (np.ndarray) – The directional data in degrees.

Returns:

The u and v components.

Return type:

Tuple[np.ndarray, np.ndarray]

gravity = 9.80665
list_class_attributes() list[source]

Lists the attributes of the class.

Returns:

The attributes of the class.

Return type:

list

list_class_methods() list[source]

Lists the methods of the class.

Returns:

The methods of the class.

Return type:

list

load_model(model_path: str) BlueMathModel[source]

Loads the model from a file.

property logger: Logger
normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}) Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach. More info in bluemath_tk.core.operations.normalize.

Parameters:
  • data (pd.DataFrame or xr.Dataset) – The data to normalize.

  • custom_scale_factor (dict, optional) – Custom scale factors for normalization.

Returns:

  • normalized_data (pd.DataFrame or xr.Dataset) – The normalized data.

  • scale_factor (dict) – The scale factors used for normalization.

parallel_execute(func: Callable, items: List[Any], num_workers: int, cpu_intensive: bool = False, **kwargs) Dict[int, Any][source]

Execute a function in parallel using concurrent.futures.

Parameters:
  • func (Callable) – Function to execute for each item.

  • items (List[Any]) – List of items to process.

  • num_workers (int) – Number of parallel workers.

  • cpu_intensive (bool, optional) – Whether the function is CPU intensive. Default is False.

  • **kwargs (dict) – Additional keyword arguments for func.

Returns:

Dictionary with the results of the function execution. The keys are the indices of the items in the original list. The values are the results of the function execution.

Return type:

Dict[int, Any]

Warning

  • When using ThreadPoolExecutor, the function sometimes fails when reading / writing

to the same / different files. Might be the GIL (Global Interpreter Lock) in Python. - cpu_intensive = True does not work with non-pickable objects (Under development).

save_model(model_path: str, exclude_attributes: List[str] = None) None[source]

Saves the model to a file.

set_logger_name(name: str, level: str = 'INFO', console: bool = True) None[source]

Sets the name of the logger.

set_num_processors_to_use(num_processors: int) None[source]

Sets the number of processors to use for parallel processing.

Parameters:

num_processors (int) – The number of processors to use. If -1, all available processors will be used.

set_omp_num_threads(num_threads: int) None[source]

Sets the number of threads for OpenMP.

Parameters:

num_threads (int) – The number of threads.

Warning

  • This methos is under development.

standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler][source]

Standarize data using StandardScaler. More info in bluemath_tk.core.operations.standarize.

Parameters:
  • data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.

  • scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.

  • transform (bool) – Whether to just transform the data. Default to False.

Returns:

  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.

  • scaler (StandardScaler) – Scaler object used for standarization.

bluemath_tk.core.operations module

bluemath_tk.core.operations.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray][source]

This method converts Longitude and Latitude to UTM coordinates.

Parameters:
  • lon (np.ndarray) – The longitude values.

  • lat (np.ndarray) – The latitude values.

  • projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The x and y coordinates in UTM.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray][source]

This method converts UTM coordinates to Longitude and Latitude.

Parameters:
  • utm_x (np.ndarray) – The x values in UTM.

  • utm_y (np.ndarray) – The y values in UTM.

  • projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The longitude and latitude values.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) DataFrame | Dataset[source]

Denormalize data using provided scale_factor.

Parameters:
  • normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.

  • scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.

Returns:

data – Denormalized data.

Return type:

pd.DataFrame or xr.Dataset

Notes

  • This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.

  • The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.

  • Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operation import denormalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000),
...         "Tp": np.random.rand(1000),
...         "Dir": np.random.rand(1000),
...     }
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor)
>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.operations import denormalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000)),
...         "Tp": (("time",), np.random.rand(1000)),
...         "Dir": (("time",), np.random.rand(1000)),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)
bluemath_tk.core.operations.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler.

Parameters:
  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.

  • scaler (StandardScaler) – Scaler object used for standarization.

Returns:

Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

Examples

>>> import numpy as np
>>> from bluemath_tk.core.data import standarize, destandarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)
>>> data = destandarize(standarized_data=standarized_data, scaler=scaler)
bluemath_tk.core.operations.get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)


(u=-1, v=0) <———> (u=1, v=0)


(u=0, v=-1)

Parameters:
  • xu (np.ndarray) – The u component.

  • xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

bluemath_tk.core.operations.get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,

beign 0° the North direction, and increasing clockwise.

0° N | |

270° W <———> 90° E


90° S

Parameters:

x_deg (np.ndarray) – The directional data in degrees.

Returns:

The u and v components.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.mathematical_to_nautical(math_degrees: ndarray) ndarray[source]

Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)

Parameters:

math_degrees (float or array-like) – Directional angle in mathematical convention

Returns:

Directional angle in nautical convention

Return type:

np.ndarray

bluemath_tk.core.operations.nautical_to_mathematical(nautical_degrees: ndarray) ndarray[source]

Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)

Parameters:

nautical_degrees (np.ndarray) – Directional angle in nautical convention

Returns:

Directional angle in mathematical convention

Return type:

np.ndarray

bluemath_tk.core.operations.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach.

Parameters:
  • data (pd.DataFrame or xr.Dataset) – Input data to be normalized.

  • custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.

  • logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.

Returns:

  • normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.

  • scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.

Notes

  • This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.

  • The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.

  • If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operations import normalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000) * 7,
...         "Tp": np.random.rand(1000) * 20,
...         "Dir": np.random.rand(1000) * 360,
...     }
... )
>>> normalized_data, scale_factor = normalize(data=df)
>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.data import normalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000) * 7),
...         "Tp": (("time",), np.random.rand(1000) * 20),
...         "Dir": (("time",), np.random.rand(1000) * 360),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> normalized_data, scale_factor = normalize(data=ds)
bluemath_tk.core.operations.spatial_gradient(data: DataArray) DataArray[source]

Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).

Parameters:

data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).

Returns:

Gradient magnitude with same dimensions as input.

Return type:

xr.DataArray

Notes

The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.

bluemath_tk.core.operations.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler][source]

Standarize data to have mean 0 and std 1.

Parameters:
  • data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.

  • scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.

  • transform (bool) – Whether to just transform the data. Default to False.

Returns:

  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.

  • scaler (StandardScaler) – Scaler object used for standarization.

Examples

>>> import numpy as np
>>> from bluemath_tk.core.operations import standarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)

Module contents

Project: BlueMath_tk Sub-Module: core Author: GeoOcean Research Group, Universidad de Cantabria Creation Date: 9 December 2024 Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)

bluemath_tk.core.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray][source]

This method converts Longitude and Latitude to UTM coordinates.

Parameters:
  • lon (np.ndarray) – The longitude values.

  • lat (np.ndarray) – The latitude values.

  • projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The x and y coordinates in UTM.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray][source]

This method converts UTM coordinates to Longitude and Latitude.

Parameters:
  • utm_x (np.ndarray) – The x values in UTM.

  • utm_y (np.ndarray) – The y values in UTM.

  • projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The longitude and latitude values.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) DataFrame | Dataset[source]

Denormalize data using provided scale_factor.

Parameters:
  • normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.

  • scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.

Returns:

data – Denormalized data.

Return type:

pd.DataFrame or xr.Dataset

Notes

  • This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.

  • The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.

  • Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operation import denormalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000),
...         "Tp": np.random.rand(1000),
...         "Dir": np.random.rand(1000),
...     }
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor)
>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.operations import denormalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000)),
...         "Tp": (("time",), np.random.rand(1000)),
...         "Dir": (("time",), np.random.rand(1000)),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)
bluemath_tk.core.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler.

Parameters:
  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.

  • scaler (StandardScaler) – Scaler object used for standarization.

Returns:

Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

Examples

>>> import numpy as np
>>> from bluemath_tk.core.data import standarize, destandarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)
>>> data = destandarize(standarized_data=standarized_data, scaler=scaler)
bluemath_tk.core.get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)


(u=-1, v=0) <———> (u=1, v=0)


(u=0, v=-1)

Parameters:
  • xu (np.ndarray) – The u component.

  • xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

bluemath_tk.core.get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,

beign 0° the North direction, and increasing clockwise.

0° N | |

270° W <———> 90° E


90° S

Parameters:

x_deg (np.ndarray) – The directional data in degrees.

Returns:

The u and v components.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.mathematical_to_nautical(math_degrees: ndarray) ndarray[source]

Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)

Parameters:

math_degrees (float or array-like) – Directional angle in mathematical convention

Returns:

Directional angle in nautical convention

Return type:

np.ndarray

bluemath_tk.core.nautical_to_mathematical(nautical_degrees: ndarray) ndarray[source]

Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)

Parameters:

nautical_degrees (np.ndarray) – Directional angle in nautical convention

Returns:

Directional angle in mathematical convention

Return type:

np.ndarray

bluemath_tk.core.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach.

Parameters:
  • data (pd.DataFrame or xr.Dataset) – Input data to be normalized.

  • custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.

  • logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.

Returns:

  • normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.

  • scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.

Notes

  • This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.

  • The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.

  • If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operations import normalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000) * 7,
...         "Tp": np.random.rand(1000) * 20,
...         "Dir": np.random.rand(1000) * 360,
...     }
... )
>>> normalized_data, scale_factor = normalize(data=df)
>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.data import normalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000) * 7),
...         "Tp": (("time",), np.random.rand(1000) * 20),
...         "Dir": (("time",), np.random.rand(1000) * 360),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> normalized_data, scale_factor = normalize(data=ds)
bluemath_tk.core.setup_dask_client(n_workers: int = None, memory_limit: str = 0.5)[source]

Setup a Dask client with controlled resources.

Parameters:
  • n_workers (int, optional) – Number of workers. Default is None.

  • memory_limit (str, optional) – Memory limit per worker. Default is 0.5.

Returns:

Dask distributed client

Return type:

Client

Notes

  • Resources might vary depending on the hardware and the load of the machine. Be very careful when setting the number of workers and memory limit, as it might affect the performance of the machine, or in the worse case scenario, the performance of other users in the same machine (cluster case).

bluemath_tk.core.spatial_gradient(data: DataArray) DataArray[source]

Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).

Parameters:

data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).

Returns:

Gradient magnitude with same dimensions as input.

Return type:

xr.DataArray

Notes

The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.

bluemath_tk.core.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler][source]

Standarize data to have mean 0 and std 1.

Parameters:
  • data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.

  • scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.

  • transform (bool) – Whether to just transform the data. Default to False.

Returns:

  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.

  • scaler (StandardScaler) – Scaler object used for standarization.

Examples

>>> import numpy as np
>>> from bluemath_tk.core.operations import standarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)