bluemath_tk.core package

Subpackages

Submodules

bluemath_tk.core.constants module

bluemath_tk.core.dask module

bluemath_tk.core.dask.get_available_ram() → int[source]

Get the available RAM in the system.

Returns:: The available RAM in bytes.
Return type:: int

bluemath_tk.core.dask.get_total_ram() → int[source]

Get the total RAM in the system.

Returns:: The total RAM in bytes.
Return type:: int

bluemath_tk.core.dask.setup_dask_client(n_workers: int = None, memory_limit: str = 0.5)[source]

Setup a Dask client with controlled resources.

Parameters:

n_workers (int, optional) – Number of workers. Default is None.
memory_limit (str, optional) – Memory limit per worker. Default is 0.5.

Returns:

Dask distributed client

Return type:

Client

Notes

Resources might vary depending on the hardware and the load of the machine. Be very careful when setting the number of workers and memory limit, as it might affect the performance of the machine, or in the worse case scenario, the performance of other users in the same machine (cluster case).

bluemath_tk.core.decorators module

bluemath_tk.core.decorators.validate_data_calval(func)[source]

Decorator to validate data in CalVal class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_kma(func)[source]

Decorator to validate data in KMA class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_lhs(func)[source]

Decorator to validate data in LHS class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_mda(func)[source]

Decorator to validate data in MDA class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_pca(func)[source]

Decorator to validate data in PCA class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_rbf(func)[source]

Decorator to validate data in RBF class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_som(func)[source]

Decorator to validate data in SOM class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.decorators.validate_data_xwt(func)[source]

Decorator to validate data in XWT class fit method.

Parameters:: func (callable) – The function to be decorated
Returns:: The decorated function
Return type:: callable

bluemath_tk.core.geo module

bluemath_tk.core.geo.buffer_area_for_polygon(polygon: Polygon, area_factor: float) → Polygon[source]

Buffer the polygon by a factor of its area divided by its length. This is a heuristic to ensure that the buffer is proportional to the size of the polygon.

Parameters:

polygon (Polygon) – The polygon to be buffered.
mas (float) – The buffer factor.

Returns:

The buffered polygon.

Return type:

Polygon

Example

>>> from shapely.geometry import Polygon
>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> mas = 0.1
>>> buffered_polygon = buffer_area_for_polygon(polygon, mas)
>>> print(buffered_polygon)
POLYGON ((-0.1 -0.1, 1.1 -0.1, 1.1 1.1, -0.1 1.1, -0.1 -0.1))

bluemath_tk.core.geo.convert_to_radians(*args: float | ndarray) → tuple[source]

Convert degree inputs to radians.

Parameters:: *args (Union[float, np.ndarray]) – Variable number of inputs in degrees to convert to radians. Can be either scalar floats or numpy arrays.
Returns:: Tuple of input values converted to radians, preserving input types.
Return type:: tuple

Examples

>>> convert_to_radians(90.0)
(1.5707963267948966,)
>>> convert_to_radians(90.0, 180.0)
(1.5707963267948966, 3.141592653589793)

bluemath_tk.core.geo.create_polygon(coordinates: List[Tuple[float, float]]) → Polygon[source]

Create a polygon from a list of (longitude, latitude) coordinates.

Parameters:: coordinates (List[Tuple[float, float]]) – List of (longitude, latitude) coordinate pairs that define the polygon vertices. The first and last points should be the same to close the polygon.
Returns:: A shapely Polygon object.
Return type:: Polygon

Examples

>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]  # Square
>>> poly = create_polygon(coords)

bluemath_tk.core.geo.filter_points_in_polygon(lon: List[float] | ndarray, lat: List[float] | ndarray, polygon: Polygon) → Tuple[ndarray, ndarray][source]

Filter points to keep only those inside a polygon.

Parameters:

lon (Union[List[float], np.ndarray]) – Array or list of longitude values.
lat (Union[List[float], np.ndarray]) – Array or list of latitude values. Must have the same shape as lon.
polygon (Polygon) – A shapely Polygon object.

Returns:

Tuple containing: - filtered_lon : Array of longitudes inside the polygon - filtered_lat : Array of latitudes inside the polygon

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

ValueError – If lon and lat arrays have different shapes.

Examples

>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]  # Square
>>> poly = create_polygon(coords)
>>> lon = [0.5, 2.0]
>>> lat = [0.5, 2.0]
>>> filtered_lon, filtered_lat = filter_points_in_polygon(lon, lat, poly)
>>> print(filtered_lon)  # [0.5]
>>> print(filtered_lat)  # [0.5]

bluemath_tk.core.geo.geo_distance_cartesian(y_matrix: float | ndarray, x_matrix: float | ndarray, y_point: float | ndarray, x_point: float | ndarray) → ndarray[source]

Returns cartesian distance between y,x matrix and y,x point. Optimized using vectorized operations.

Parameters:

y_matrix (Union[float, np.ndarray]) – 2D array of y-coordinates (latitude or y in Cartesian).
x_matrix (Union[float, np.ndarray]) – 2D array of x-coordinates (longitude or x in Cartesian).
y_point (Union[float, np.ndarray]) – y-coordinate of the point (latitude or y in Cartesian).
x_point (Union[float, np.ndarray]) – x-coordinate of the point (longitude or x in Cartesian).

Returns:

Array of distances in the same units as x_matrix and y_matrix.

Return type:

np.ndarray

Calculate azimuth between two points on Earth.

Parameters:

lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees
lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees
lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees
lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees

Returns:

Azimuth(s) in degrees from North

Return type:

Union[float, np.ndarray]

Notes

The azimuth is the angle between true north and the direction to the second point, measured clockwise from north. Special cases are handled for points at the poles.

Examples

>>> geodesic_azimuth(0, 0, 0, 90)
90.0
>>> geodesic_azimuth([0, 45], [0, -90], [0, -45], [90, 90])
array([90., 90.])

Calculate great circle distance between two points on Earth.

Parameters:

lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees
lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees
lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees
lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees

Returns:

Great circle distance(s) in degrees

Return type:

Union[float, np.ndarray]

Notes

Uses the haversine formula to calculate great circle distance. The result is in degrees of arc on a sphere.

Examples

>>> geodesic_distance(0, 0, 0, 90)
90.0
>>> geodesic_distance([0, 45], [0, -90], [0, -45], [90, 90])
array([90., 180.])

Calculate both great circle distance and azimuth between two points.

Parameters:

lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees
lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees
lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees
lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees

Returns:

Tuple containing: - distance(s) : Great circle distance(s) in degrees - azimuth(s) : Azimuth(s) in degrees from North

Return type:

Tuple[Union[float, np.ndarray], Union[float, np.ndarray]]

bluemath_tk.core.io module

bluemath_tk.core.io.load_model(model_path: str) → BlueMathModel[source]

Loads a BlueMathModel from a file.

Parameters:: model_path (str) – The path to the model file.
Returns:: The loaded BlueMathModel.
Return type:: BlueMathModel

bluemath_tk.core.logging module

bluemath_tk.core.logging.get_file_logger(name: str, logs_path: str = None, level: int | str = 'INFO', console: bool = True, console_level: int | str = 'WARNING') → Logger[source]

Creates and returns a logger that writes log messages to a file.

Parameters:

name (str) – The name of the logger.
logs_path (str, optional) – The file path where the log messages will be written. Default is None.
level (Union[int, str], optional) – The logging level. Default is “INFO”.
console (bool) – Whether to add or not console / terminal logs. Default is True.
console_level (Union[int, str], optional) – The logging level for console / terminal logs. Default is “WARNING”.

Returns:

Configured logger instance.

Return type:

logging.Logger

Examples

>>> from bluemath_tk.core.logging import get_file_logger
>>> # Create a logger that writes to "app.log"
>>> logger = get_file_logger("my_app_logger", "app.log")
>>> # Log messages
>>> logger.info("This is an info message.")
>>> logger.warning("This is a warning message.")
>>> logger.error("This is an error message.")
>>> # The output will be saved in "app.log" with the format:
>>> # 2023-10-22 14:55:23,456 - my_app_logger - INFO - This is an info message.
>>> # 2023-10-22 14:55:23,457 - my_app_logger - WARNING - This is a warning message.
>>> # 2023-10-22 14:55:23,458 - my_app_logger - ERROR - This is an error message.

bluemath_tk.core.models module

class bluemath_tk.core.models.BlueMathModel[source]

Bases: ABC

Abstract base class for handling default functionalities across the project.

This class provides core functionality used by all BlueMath models including: - Model saving and loading - Data normalization and denormalization - Parallel processing capabilities - Logging functionality - NaN handling - Directional data processing

gravity

Gravitational constant from scipy.constants.

Type:: float

earth_radius

Earth radius in km.

Type:: float

num_workers

Number of parallel workers to use for processing.

Type:: int

logger

Logger instance for the model.

Type:: logging.Logger

Notes

All BlueMath models should inherit from this class to ensure consistent behavior and functionality across the project.

Check for NaNs in the data and optionally replace them.

Parameters:

data (Union[np.ndarray, pd.Series, pd.DataFrame, xr.DataArray, xr.Dataset]) – The data to check for NaNs.
replace_value (Union[float, callable], optional) – Value to replace NaNs with. If callable, the function will be called on the data. Default is None (no replacement).
raise_error (bool, optional) – Whether to raise an error if NaNs are found. Default is False.

Returns:

data – The data with NaNs optionally replaced.

Return type:

Union[np.ndarray, pd.Series, pd.DataFrame, xr.DataArray, xr.Dataset]

Raises:

ValueError – If NaNs are found and raise_error is True.

Notes

For numpy arrays, uses np.isnan() to check for NaNs
For pandas objects, uses isnull() to check for NaNs
For xarray objects, uses isnull() to check for NaNs
If replace_value is callable, it takes precedence over other options

Examples

>>> import numpy as np
>>> import pandas as pd
>>> model = BlueMathModel()
>>> df = pd.DataFrame({'a': [1, np.nan, 3]})
>>> cleaned_df = model.check_nans(df, replace_value=0)
>>> print(cleaned_df)
   a
0  1
1  0
2  3

denormalize(normalized_data: DataFrame, scale_factor: dict) → DataFrame[source]

Denormalize data using provided scale_factor. More info in bluemath_tk.core.operations.denormalize.

Parameters:

normalized_data (pd.DataFrame) – The normalized data to denormalize.
scale_factor (dict) – The scale factors used for denormalization.

Returns:

data – The denormalized data.

Return type:

pd.DataFrame

destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) → ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler. More info in bluemath_tk.core.operations.destandarize.

Parameters:

standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.

Returns:

data – Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

earth_radius = 6378.135

static get_degrees_from_uv(xu: ndarray, xv: ndarray) → ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)

(u=-1, v=0) <———> (u=1, v=0)

(u=0, v=-1)

Parameters:

xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

static get_metrics(data1: DataFrame | Dataset, data2: DataFrame | Dataset) → DataFrame[source]

Gets the metrics of the model.

Parameters:

data1 (pd.DataFrame or xr.Dataset) – The first dataset.
data2 (pd.DataFrame or xr.Dataset) – The second dataset.

Returns:

metrics – The metrics of the model.

Return type:

pd.DataFrame

Raises:

ValueError – If the DataFrames or Datasets have different shapes.
TypeError – If the inputs are not both DataFrames or both xarray Datasets.

get_num_processors_available() → int[source]

Gets the number of processors available.

Returns:: The number of processors available.
Return type:: int

static get_uv_components(x_deg: ndarray) → Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,

beign 0° the North direction, and increasing clockwise.

0° N

270° W <———> 90° E

90° S

Parameters:: x_deg (np.ndarray) – The directional data in degrees.
Returns:: The u and v components.
Return type:: Tuple[np.ndarray, np.ndarray]

gravity = 9.80665

list_class_attributes() → list[source]

List all non-callable attributes of the class.

Returns:: Names of all non-callable, non-private attributes.
Return type:: list

Notes

Excludes methods and private attributes (starting with __)
Includes properties and class variables
Useful for introspection and debugging

Examples

>>> model = BlueMathModel()
>>> attrs = model.list_class_attributes()
>>> print(attrs)
['gravity', 'num_workers', '_logger']

list_class_methods() → list[source]

List all callable methods of the class.

Returns:: Names of all callable, non-private methods.
Return type:: list

Notes

Excludes attributes and private methods (starting with __)
Includes instance methods and properties
Useful for introspection and debugging

Examples

>>> model = BlueMathModel()
>>> methods = model.list_class_methods()
>>> print(methods)
['normalize', 'denormalize', 'check_nans']

load_model(model_path: str) → BlueMathModel[source]: Loads the model from a file.

property logger: Logger

Get the logger instance for this model.

Returns:: The logger instance. Creates a new file logger if none exists.
Return type:: logging.Logger

Notes

Lazily instantiates logger on first access
Uses class name as default logger name
Thread-safe logger creation

normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}) → Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach. More info in bluemath_tk.core.operations.normalize.

Parameters:

data (pd.DataFrame or xr.Dataset) – The data to normalize.
custom_scale_factor (dict, optional) – Custom scale factors for normalization.

Returns:

normalized_data (pd.DataFrame or xr.Dataset) – The normalized data.
scale_factor (dict) – The scale factors used for normalization.

parallel_execute(func: Callable, items: List[Any], num_workers: int, cpu_intensive: bool = False, **kwargs) → Dict[int, Any][source]

Execute a function in parallel across multiple items.

Parameters:

func (Callable) – The function to execute. Should accept single item and **kwargs.
items (List[Any]) – List of items to process in parallel.
num_workers (int) – Number of parallel workers to use.
cpu_intensive (bool, optional) – If True, uses ProcessPoolExecutor, otherwise ThreadPoolExecutor. Default is False.
**kwargs (dict) – Additional keyword arguments passed to func.

Returns:

Dictionary mapping item indices to function results.

Return type:

Dict[int, Any]

Raises:

Exception – Any exception raised by func is logged and the job continues.

Notes

Uses ThreadPoolExecutor for I/O-bound tasks
Uses ProcessPoolExecutor for CPU-bound tasks
Results maintain original item order via index mapping
Failed jobs are logged but don’t stop execution

Warning

ThreadPoolExecutor may have GIL limitations
ProcessPoolExecutor doesn’t work with non-picklable objects
File operations may fail with ThreadPoolExecutor

Examples

>>> def square(x):
...     return x * x
>>> model = BlueMathModel()
>>> results = model.parallel_execute(square, [1, 2, 3], num_workers=2)
>>> print(results)
{0: 1, 1: 4, 2: 9}

save_model(model_path: str, exclude_attributes: List[str] = None) → None[source]

Save the model to a file using pickle.

Parameters:

model_path (str) – Path where the model will be saved.
exclude_attributes (List[str], optional) – List of attribute names to exclude from saving. Default is None. If provided, it will override the default _exclude_attributes.

Notes

Uses pickle for serialization
Warns if any xarray Datasets/DataArrays are being pickled
Creates parent directories if they don’t exist
Excludes specified attributes from serialization

Warning

Pickle files can be security risks if loaded from untrusted sources
xarray objects in the model will be pickled and may be large

Examples

>>> model = MyBlueMathModel()
>>> model.save_model('model.pkl', exclude_attributes=['_logger'])

set_logger_name(name: str, level: str = 'INFO', console: bool = True) → None[source]

Configure the model’s logger with a new name and settings.

Parameters:

name (str) – The name to give to the logger.
level (str, optional) – The logging level to use. Default is “INFO”. Valid values are: “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”
console (bool, optional) – Whether to output logs to console. Default is True.

Notes

Creates a new file logger with specified settings
Previous logger settings are overwritten
Log files are created in the default logging directory

Examples

>>> model = BlueMathModel()
>>> model.set_logger_name("my_model", level="DEBUG", console=False)

set_num_processors_to_use(num_processors: int) → None[source]

Set the number of processors to use for parallel processing.

Parameters:: num_processors (int) – Number of processors to use. If -1, uses all available processors minus one for system processes.
Raises:: ValueError – If num_processors is <= 0 (except -1).

Notes

Automatically adjusts if requesting too many processors
Sets the num_workers attribute used by parallel processing methods
Takes into account system resources to avoid overload

bluemath_tk.core.operations module

bluemath_tk.core.operations.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) → Tuple[ndarray, ndarray][source]

This method converts Longitude and Latitude to UTM coordinates.

Parameters:

lon (np.ndarray) – The longitude values.
lat (np.ndarray) – The latitude values.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The x and y coordinates in UTM.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) → Tuple[ndarray, ndarray][source]

This method converts UTM coordinates to Longitude and Latitude.

Parameters:

utm_x (np.ndarray) – The x values in UTM.
utm_y (np.ndarray) – The y values in UTM.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The longitude and latitude values.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) → DataFrame | Dataset[source]

Denormalize data using provided scale_factor.

Parameters:

normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.

Returns:

data – Denormalized data.

Return type:

pd.DataFrame or xr.Dataset

Notes

This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.
The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.
Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operation import denormalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000),
...         "Tp": np.random.rand(1000),
...         "Dir": np.random.rand(1000),
...     }
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor)

>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.operations import denormalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000)),
...         "Tp": (("time",), np.random.rand(1000)),
...         "Dir": (("time",), np.random.rand(1000)),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)

bluemath_tk.core.operations.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) → ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler.

Parameters:

standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.

Returns:

Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

Examples

>>> import numpy as np
>>> from bluemath_tk.core.data import standarize, destandarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)
>>> data = destandarize(standarized_data=standarized_data, scaler=scaler)

bluemath_tk.core.operations.get_degrees_from_uv(xu: ndarray, xv: ndarray) → ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)

(u=-1, v=0) <———> (u=1, v=0)

(u=0, v=-1)

Parameters:

xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

bluemath_tk.core.operations.get_uv_components(x_deg: ndarray) → Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,: beign 0° the North direction, and increasing clockwise.

0° N | |
270° W <———> 90° E: 90° S

Parameters:: x_deg (np.ndarray) – The directional data in degrees.
Returns:: The u and v components.
Return type:: Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.mathematical_to_nautical(math_degrees: ndarray) → ndarray[source]

Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)

Parameters:: math_degrees (float or array-like) – Directional angle in mathematical convention
Returns:: Directional angle in nautical convention
Return type:: np.ndarray

bluemath_tk.core.operations.nautical_to_mathematical(nautical_degrees: ndarray) → ndarray[source]

Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)

Parameters:: nautical_degrees (np.ndarray) – Directional angle in nautical convention
Returns:: Directional angle in mathematical convention
Return type:: np.ndarray

bluemath_tk.core.operations.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) → Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach.

Parameters:

data (pd.DataFrame or xr.Dataset) – Input data to be normalized.
custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.
logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.

Returns:

normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.

Notes

This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.
The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.
If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operations import normalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000) * 7,
...         "Tp": np.random.rand(1000) * 20,
...         "Dir": np.random.rand(1000) * 360,
...     }
... )
>>> normalized_data, scale_factor = normalize(data=df)

>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.operations import normalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000) * 7),
...         "Tp": (("time",), np.random.rand(1000) * 20),
...         "Dir": (("time",), np.random.rand(1000) * 360),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> normalized_data, scale_factor = normalize(data=ds)

bluemath_tk.core.operations.spatial_gradient(data: DataArray) → DataArray[source]

Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).

Parameters:: data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).
Returns:: Gradient magnitude with same dimensions as input.
Return type:: xr.DataArray

Notes

The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.

bluemath_tk.core.operations.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) → Tuple[ndarray | DataFrame | Dataset, StandardScaler][source]

Standarize data to have mean 0 and std 1.

Parameters:

data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.
scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.
transform (bool) – Whether to just transform the data. Default to False.

Returns:

standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.
scaler (StandardScaler) – Scaler object used for standarization.

Examples

>>> import numpy as np
>>> from bluemath_tk.core.operations import standarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)

bluemath_tk.core.pipeline module

class bluemath_tk.core.pipeline.BlueMathPipeline(steps: List[Dict[str, Any]])[source]

Bases: object

A flexible, modular pipeline for chaining together BlueMath models and data processing steps.

This class allows you to define a sequence of steps, where each step must be a BlueMathModel. Each step is defined by a dictionary specifying:

‘name’: str, a unique identifier for the step.

‘model’: the model instance to use (or will be created via ‘model_init’ and ‘model_init_params’).

‘model_init’: (optional) a callable/class to instantiate the model.

‘model_init_params’: (optional) dict of parameters for model initialization.

‘fit_method’: (optional) str, the method name to call for fitting (default is based on model type).

‘fit_params’: (optional) dict, parameters for the fit method.

‘pipeline_attributes_to_store’: (optional) list of attribute names to store for later use.

The pipeline supports advanced parameter passing, including referencing outputs from previous steps and using callables for dynamic parameter computation.

fit(data: ndarray | DataFrame | Dataset = None)[source]

Fit all models in the pipeline sequentially, passing the output of each step as input to the next.

For each step, the model is (optionally) initialized, then fit using the specified method and parameters. Parameters and model initialization arguments can be dynamically computed using callables or references to previous pipeline attributes.

Parameters:: data (Union[np.ndarray, pd.DataFrame, xr.Dataset], optional) – The input data to fit the models. If None, the pipeline expects each step to handle its own data.
Return type:: The output of the final step in the pipeline (could be transformed data, predictions, etc.).

grid_search(data: ndarray | DataFrame, param_grid: List[Dict[str, Any]], metric: Callable = None, target_data: ndarray | DataFrame = None, plot: bool = False) → Dict[str, Any][source]

Perform a grid search over all possible parameter combinations for all steps in the pipeline.

This method evaluates every possible combination of parameters (from the provided grids) for each step, fits the pipeline, and scores the result using the provided metric or the last model’s score method. The best parameter set (lowest score) is selected and the pipeline is updated accordingly.

Parameters:

data (Union[np.ndarray, pd.DataFrame]) – The input data to fit the models.
param_grid (List[Dict[str, Any]]) – List of parameter grids for each step in the pipeline. Each grid is a dict mapping parameter names to lists of values to try. Parameters can be for either model_init_params or fit_params.
metric (Callable, optional) – Function to evaluate the final output. Should take (y_true, y_pred) as arguments. If None, will use the last model’s built-in score method if available.
target_data (Union[np.ndarray, pd.DataFrame], optional) – Target data to evaluate against if using a custom metric. Required if metric is provided.
plot (bool, optional) – If True, plot the score for each parameter combination after grid search. Default is False.

Returns:

Dictionary containing:

’best_params’: the best parameter set for each step
’best_score’: the best score achieved
’best_output’: the output of the pipeline for the best parameters
’all_results’: a list of all parameter sets and their scores/outputs

Return type:

Dict[str, Any]

Raises:

ValueError – If the number of parameter grids does not match the number of pipeline steps, or if a metric is provided but no target_data is given.

property pipeline_attributes: Dict[str, Dict[str, Any]]

Get the stored model attributes from each pipeline step.

Returns:: A dictionary mapping step names to dictionaries of stored attributes.
Return type:: Dict[str, Dict[str, Any]]
Raises:: ValueError – If the pipeline has not been fit yet and no attributes are stored.

Module contents

Project: BlueMath_tk Sub-Module: core Author: GeoOcean Research Group, Universidad de Cantabria Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)