bluemath_tk.core package

Subpackages

Submodules

bluemath_tk.core.constants module

bluemath_tk.core.dask module

bluemath_tk.core.dask.get_available_ram() int[source]

Get the available RAM in the system.

Returns:

The available RAM in bytes.

Return type:

int

bluemath_tk.core.dask.get_total_ram() int[source]

Get the total RAM in the system.

Returns:

The total RAM in bytes.

Return type:

int

bluemath_tk.core.dask.setup_dask_client(n_workers: int = None, memory_limit: str = 0.5)[source]

Setup a Dask client with controlled resources.

Parameters:
  • n_workers (int, optional) – Number of workers. Default is None.

  • memory_limit (str, optional) – Memory limit per worker. Default is 0.5.

Returns:

Dask distributed client

Return type:

Client

Notes

  • Resources might vary depending on the hardware and the load of the machine. Be very careful when setting the number of workers and memory limit, as it might affect the performance of the machine, or in the worse case scenario, the performance of other users in the same machine (cluster case).

bluemath_tk.core.decorators module

bluemath_tk.core.decorators.validate_data_calval(func)[source]

Decorator to validate data in CalVal class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_kma(func)[source]

Decorator to validate data in KMA class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_lhs(func)[source]

Decorator to validate data in LHS class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_mda(func)[source]

Decorator to validate data in MDA class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_pca(func)[source]

Decorator to validate data in PCA class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_rbf(func)[source]

Decorator to validate data in RBF class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_som(func)[source]

Decorator to validate data in SOM class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.decorators.validate_data_xwt(func)[source]

Decorator to validate data in XWT class fit method.

Parameters:

func (callable) – The function to be decorated

Returns:

The decorated function

Return type:

callable

bluemath_tk.core.geo module

bluemath_tk.core.geo.buffer_area_for_polygon(polygon: Polygon, area_factor: float) Polygon[source]

Buffer the polygon by a factor of its area divided by its length. This is a heuristic to ensure that the buffer is proportional to the size of the polygon.

Parameters:
  • polygon (Polygon) – The polygon to be buffered.

  • mas (float) – The buffer factor.

Returns:

The buffered polygon.

Return type:

Polygon

Example

>>> from shapely.geometry import Polygon
>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> mas = 0.1
>>> buffered_polygon = buffer_area_for_polygon(polygon, mas)
>>> print(buffered_polygon)
POLYGON ((-0.1 -0.1, 1.1 -0.1, 1.1 1.1, -0.1 1.1, -0.1 -0.1))
bluemath_tk.core.geo.convert_to_radians(*args: float | ndarray) tuple[source]

Convert degree inputs to radians.

Parameters:

*args (Union[float, np.ndarray]) – Variable number of inputs in degrees to convert to radians. Can be either scalar floats or numpy arrays.

Returns:

Tuple of input values converted to radians, preserving input types.

Return type:

tuple

Examples

>>> convert_to_radians(90.0)
(1.5707963267948966,)
>>> convert_to_radians(90.0, 180.0)
(1.5707963267948966, 3.141592653589793)
bluemath_tk.core.geo.create_polygon(coordinates: List[Tuple[float, float]]) Polygon[source]

Create a polygon from a list of (longitude, latitude) coordinates.

Parameters:

coordinates (List[Tuple[float, float]]) – List of (longitude, latitude) coordinate pairs that define the polygon vertices. The first and last points should be the same to close the polygon.

Returns:

A shapely Polygon object.

Return type:

Polygon

Examples

>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]  # Square
>>> poly = create_polygon(coords)
bluemath_tk.core.geo.filter_points_in_polygon(lon: List[float] | ndarray, lat: List[float] | ndarray, polygon: Polygon) Tuple[ndarray, ndarray][source]

Filter points to keep only those inside a polygon.

Parameters:
  • lon (Union[List[float], np.ndarray]) – Array or list of longitude values.

  • lat (Union[List[float], np.ndarray]) – Array or list of latitude values. Must have the same shape as lon.

  • polygon (Polygon) – A shapely Polygon object.

Returns:

Tuple containing: - filtered_lon : Array of longitudes inside the polygon - filtered_lat : Array of latitudes inside the polygon

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

ValueError – If lon and lat arrays have different shapes.

Examples

>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]  # Square
>>> poly = create_polygon(coords)
>>> lon = [0.5, 2.0]
>>> lat = [0.5, 2.0]
>>> filtered_lon, filtered_lat = filter_points_in_polygon(lon, lat, poly)
>>> print(filtered_lon)  # [0.5]
>>> print(filtered_lat)  # [0.5]
bluemath_tk.core.geo.geo_distance_cartesian(y_matrix: float | ndarray, x_matrix: float | ndarray, y_point: float | ndarray, x_point: float | ndarray) ndarray[source]

Returns cartesian distance between y,x matrix and y,x point. Optimized using vectorized operations.

Parameters:
  • y_matrix (Union[float, np.ndarray]) – 2D array of y-coordinates (latitude or y in Cartesian).

  • x_matrix (Union[float, np.ndarray]) – 2D array of x-coordinates (longitude or x in Cartesian).

  • y_point (Union[float, np.ndarray]) – y-coordinate of the point (latitude or y in Cartesian).

  • x_point (Union[float, np.ndarray]) – x-coordinate of the point (longitude or x in Cartesian).

Returns:

Array of distances in the same units as x_matrix and y_matrix.

Return type:

np.ndarray

bluemath_tk.core.geo.geodesic_azimuth(lat1: float | ndarray, lon1: float | ndarray, lat2: float | ndarray, lon2: float | ndarray) float | ndarray[source]

Calculate azimuth between two points on Earth.

Parameters:
  • lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees

  • lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees

  • lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees

  • lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees

Returns:

Azimuth(s) in degrees from North

Return type:

Union[float, np.ndarray]

Notes

The azimuth is the angle between true north and the direction to the second point, measured clockwise from north. Special cases are handled for points at the poles.

Examples

>>> geodesic_azimuth(0, 0, 0, 90)
90.0
>>> geodesic_azimuth([0, 45], [0, -90], [0, -45], [90, 90])
array([90., 90.])
bluemath_tk.core.geo.geodesic_distance(lat1: float | ndarray, lon1: float | ndarray, lat2: float | ndarray, lon2: float | ndarray) float | ndarray[source]

Calculate great circle distance between two points on Earth.

Parameters:
  • lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees

  • lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees

  • lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees

  • lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees

Returns:

Great circle distance(s) in degrees

Return type:

Union[float, np.ndarray]

Notes

Uses the haversine formula to calculate great circle distance. The result is in degrees of arc on a sphere.

Examples

>>> geodesic_distance(0, 0, 0, 90)
90.0
>>> geodesic_distance([0, 45], [0, -90], [0, -45], [90, 90])
array([90., 180.])
bluemath_tk.core.geo.geodesic_distance_azimuth(lat1: float | ndarray, lon1: float | ndarray, lat2: float | ndarray, lon2: float | ndarray) Tuple[float | ndarray, float | ndarray][source]

Calculate both great circle distance and azimuth between two points.

Parameters:
  • lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees

  • lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees

  • lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees

  • lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees

Returns:

Tuple containing: - distance(s) : Great circle distance(s) in degrees - azimuth(s) : Azimuth(s) in degrees from North

Return type:

Tuple[Union[float, np.ndarray], Union[float, np.ndarray]]

See also

geodesic_distance

Calculate only the great circle distance

geodesic_azimuth

Calculate only the azimuth

Examples

>>> dist, az = geodesic_distance_azimuth(0, 0, 0, 90)
>>> dist
90.0
>>> az
90.0
bluemath_tk.core.geo.mask_points_outside_polygon(elements: ndarray, node_coords: ndarray, poly: Polygon) ndarray[source]

Returns a boolean mask indicating which triangle elements have at least two vertices outside the polygon.

This version uses matplotlib.path.Path for high-performance point-in-polygon testing.

Parameters:
  • elements ((n_elements, 3) np.ndarray) – Array containing indices of triangle vertices.

  • node_coords ((n_nodes, 2) np.ndarray) – Array of node coordinates as (x, y) pairs.

  • poly (shapely.geometry.Polygon) – Polygon used for containment checks.

Returns:

mask – Boolean array where True means at least two vertices of the triangle lie outside the polygon.

Return type:

(n_elements,) np.ndarray

Example

>>> import numpy as np
>>> from shapely.geometry import Polygon
>>> elements = np.array([[0, 1, 2], [1, 2, 3]])
>>> node_coords = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
>>> poly = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> mask = mask_points_outside_polygon(elements, node_coords, poly)
>>> print(mask)
[False False]
bluemath_tk.core.geo.points_in_polygon(lon: List[float] | ndarray, lat: List[float] | ndarray, polygon: Polygon) ndarray[source]

Check which points are inside a polygon.

Parameters:
  • lon (Union[List[float], np.ndarray]) – Array or list of longitude values.

  • lat (Union[List[float], np.ndarray]) – Array or list of latitude values. Must have the same shape as lon.

  • polygon (Polygon) – A shapely Polygon object.

Returns:

Boolean array indicating which points are inside the polygon.

Return type:

np.ndarray

Raises:

ValueError – If lon and lat arrays have different shapes.

Examples

>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]  # Square
>>> poly = create_polygon(coords)
>>> lon = [0.5, 2.0]
>>> lat = [0.5, 2.0]
>>> mask = points_in_polygon(lon, lat, poly)
>>> print(mask)  # [True, False]
bluemath_tk.core.geo.shoot(lon: float | ndarray, lat: float | ndarray, azimuth: float | ndarray, maxdist: float | ndarray) Tuple[float | ndarray, float | ndarray, float | ndarray][source]

Calculate endpoint given starting point, azimuth and distance.

Parameters:
  • lon (Union[float, np.ndarray]) – Starting longitude(s) in degrees

  • lat (Union[float, np.ndarray]) – Starting latitude(s) in degrees

  • azimuth (Union[float, np.ndarray]) – Initial azimuth(s) in degrees

  • maxdist (Union[float, np.ndarray]) – Distance(s) to travel in kilometers

Returns:

Tuple containing: - final_lon : Final longitude(s) in degrees - final_lat : Final latitude(s) in degrees - back_azimuth : Back azimuth(s) in degrees

Return type:

Tuple[Union[float, np.ndarray], Union[float, np.ndarray], Union[float, np.ndarray]]

Notes

This function implements a geodesic shooting algorithm based on T. Vincenty’s method. It accounts for the Earth’s ellipsoidal shape.

Raises:

ValueError – If attempting to shoot from a pole in a direction not along a meridian.

Examples

>>> lon_f, lat_f, baz = shoot(0, 0, 90, 111.195)  # ~1 degree at equator
>>> round(lon_f, 6)
1.0
>>> round(lat_f, 6)
0.0
>>> round(baz, 6)
270.0

bluemath_tk.core.io module

bluemath_tk.core.io.load_model(model_path: str) BlueMathModel[source]

Loads a BlueMathModel from a file.

Parameters:

model_path (str) – The path to the model file.

Returns:

The loaded BlueMathModel.

Return type:

BlueMathModel

bluemath_tk.core.logging module

bluemath_tk.core.logging.get_file_logger(name: str, logs_path: str = None, level: int | str = 'INFO', console: bool = True, console_level: int | str = 'WARNING') Logger[source]

Creates and returns a logger that writes log messages to a file.

Parameters:
  • name (str) – The name of the logger.

  • logs_path (str, optional) – The file path where the log messages will be written. Default is None.

  • level (Union[int, str], optional) – The logging level. Default is “INFO”.

  • console (bool) – Whether to add or not console / terminal logs. Default is True.

  • console_level (Union[int, str], optional) – The logging level for console / terminal logs. Default is “WARNING”.

Returns:

Configured logger instance.

Return type:

logging.Logger

Examples

>>> from bluemath_tk.core.logging import get_file_logger
>>> # Create a logger that writes to "app.log"
>>> logger = get_file_logger("my_app_logger", "app.log")
>>> # Log messages
>>> logger.info("This is an info message.")
>>> logger.warning("This is a warning message.")
>>> logger.error("This is an error message.")
>>> # The output will be saved in "app.log" with the format:
>>> # 2023-10-22 14:55:23,456 - my_app_logger - INFO - This is an info message.
>>> # 2023-10-22 14:55:23,457 - my_app_logger - WARNING - This is a warning message.
>>> # 2023-10-22 14:55:23,458 - my_app_logger - ERROR - This is an error message.

bluemath_tk.core.models module

class bluemath_tk.core.models.BlueMathModel[source]

Bases: ABC

Abstract base class for handling default functionalities across the project.

This class provides core functionality used by all BlueMath models including: - Model saving and loading - Data normalization and denormalization - Parallel processing capabilities - Logging functionality - NaN handling - Directional data processing

gravity

Gravitational constant from scipy.constants.

Type:

float

earth_radius

Earth radius in km.

Type:

float

num_workers

Number of parallel workers to use for processing.

Type:

int

logger

Logger instance for the model.

Type:

logging.Logger

Notes

All BlueMath models should inherit from this class to ensure consistent behavior and functionality across the project.

check_nans(data: ndarray | Series | DataFrame | DataArray | Dataset, replace_value: float | callable = None, raise_error: bool = False) ndarray | Series | DataFrame | DataArray | Dataset[source]

Check for NaNs in the data and optionally replace them.

Parameters:
  • data (Union[np.ndarray, pd.Series, pd.DataFrame, xr.DataArray, xr.Dataset]) – The data to check for NaNs.

  • replace_value (Union[float, callable], optional) – Value to replace NaNs with. If callable, the function will be called on the data. Default is None (no replacement).

  • raise_error (bool, optional) – Whether to raise an error if NaNs are found. Default is False.

Returns:

data – The data with NaNs optionally replaced.

Return type:

Union[np.ndarray, pd.Series, pd.DataFrame, xr.DataArray, xr.Dataset]

Raises:

ValueError – If NaNs are found and raise_error is True.

Notes

  • For numpy arrays, uses np.isnan() to check for NaNs

  • For pandas objects, uses isnull() to check for NaNs

  • For xarray objects, uses isnull() to check for NaNs

  • If replace_value is callable, it takes precedence over other options

Examples

>>> import numpy as np
>>> import pandas as pd
>>> model = BlueMathModel()
>>> df = pd.DataFrame({'a': [1, np.nan, 3]})
>>> cleaned_df = model.check_nans(df, replace_value=0)
>>> print(cleaned_df)
   a
0  1
1  0
2  3
denormalize(normalized_data: DataFrame, scale_factor: dict) DataFrame[source]

Denormalize data using provided scale_factor. More info in bluemath_tk.core.operations.denormalize.

Parameters:
  • normalized_data (pd.DataFrame) – The normalized data to denormalize.

  • scale_factor (dict) – The scale factors used for denormalization.

Returns:

data – The denormalized data.

Return type:

pd.DataFrame

destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler. More info in bluemath_tk.core.operations.destandarize.

Parameters:
  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.

  • scaler (StandardScaler) – Scaler object used for standarization.

Returns:

data – Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

earth_radius = 6378.135
static get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)


(u=-1, v=0) <———> (u=1, v=0)


(u=0, v=-1)

Parameters:
  • xu (np.ndarray) – The u component.

  • xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

static get_metrics(data1: DataFrame | Dataset, data2: DataFrame | Dataset) DataFrame[source]

Gets the metrics of the model.

Parameters:
  • data1 (pd.DataFrame or xr.Dataset) – The first dataset.

  • data2 (pd.DataFrame or xr.Dataset) – The second dataset.

Returns:

metrics – The metrics of the model.

Return type:

pd.DataFrame

Raises:
  • ValueError – If the DataFrames or Datasets have different shapes.

  • TypeError – If the inputs are not both DataFrames or both xarray Datasets.

get_num_processors_available() int[source]

Gets the number of processors available.

Returns:

The number of processors available.

Return type:

int

static get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,

beign 0° the North direction, and increasing clockwise.

0° N


270° W <———> 90° E


90° S

Parameters:

x_deg (np.ndarray) – The directional data in degrees.

Returns:

The u and v components.

Return type:

Tuple[np.ndarray, np.ndarray]

gravity = 9.80665
list_class_attributes() list[source]

List all non-callable attributes of the class.

Returns:

Names of all non-callable, non-private attributes.

Return type:

list

Notes

  • Excludes methods and private attributes (starting with __)

  • Includes properties and class variables

  • Useful for introspection and debugging

Examples

>>> model = BlueMathModel()
>>> attrs = model.list_class_attributes()
>>> print(attrs)
['gravity', 'num_workers', '_logger']
list_class_methods() list[source]

List all callable methods of the class.

Returns:

Names of all callable, non-private methods.

Return type:

list

Notes

  • Excludes attributes and private methods (starting with __)

  • Includes instance methods and properties

  • Useful for introspection and debugging

Examples

>>> model = BlueMathModel()
>>> methods = model.list_class_methods()
>>> print(methods)
['normalize', 'denormalize', 'check_nans']
load_model(model_path: str) BlueMathModel[source]

Loads the model from a file.

property logger: Logger

Get the logger instance for this model.

Returns:

The logger instance. Creates a new file logger if none exists.

Return type:

logging.Logger

Notes

  • Lazily instantiates logger on first access

  • Uses class name as default logger name

  • Thread-safe logger creation

normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}) Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach. More info in bluemath_tk.core.operations.normalize.

Parameters:
  • data (pd.DataFrame or xr.Dataset) – The data to normalize.

  • custom_scale_factor (dict, optional) – Custom scale factors for normalization.

Returns:

  • normalized_data (pd.DataFrame or xr.Dataset) – The normalized data.

  • scale_factor (dict) – The scale factors used for normalization.

parallel_execute(func: Callable, items: List[Any], num_workers: int, cpu_intensive: bool = False, **kwargs) Dict[int, Any][source]

Execute a function in parallel across multiple items.

Parameters:
  • func (Callable) – The function to execute. Should accept single item and **kwargs.

  • items (List[Any]) – List of items to process in parallel.

  • num_workers (int) – Number of parallel workers to use.

  • cpu_intensive (bool, optional) – If True, uses ProcessPoolExecutor, otherwise ThreadPoolExecutor. Default is False.

  • **kwargs (dict) – Additional keyword arguments passed to func.

Returns:

Dictionary mapping item indices to function results.

Return type:

Dict[int, Any]

Raises:

Exception – Any exception raised by func is logged and the job continues.

Notes

  • Uses ThreadPoolExecutor for I/O-bound tasks

  • Uses ProcessPoolExecutor for CPU-bound tasks

  • Results maintain original item order via index mapping

  • Failed jobs are logged but don’t stop execution

Warning

  • ThreadPoolExecutor may have GIL limitations

  • ProcessPoolExecutor doesn’t work with non-picklable objects

  • File operations may fail with ThreadPoolExecutor

Examples

>>> def square(x):
...     return x * x
>>> model = BlueMathModel()
>>> results = model.parallel_execute(square, [1, 2, 3], num_workers=2)
>>> print(results)
{0: 1, 1: 4, 2: 9}
save_model(model_path: str, exclude_attributes: List[str] = None) None[source]

Save the model to a file using pickle.

Parameters:
  • model_path (str) – Path where the model will be saved.

  • exclude_attributes (List[str], optional) – List of attribute names to exclude from saving. Default is None. If provided, it will override the default _exclude_attributes.

Notes

  • Uses pickle for serialization

  • Warns if any xarray Datasets/DataArrays are being pickled

  • Creates parent directories if they don’t exist

  • Excludes specified attributes from serialization

Warning

  • Pickle files can be security risks if loaded from untrusted sources

  • xarray objects in the model will be pickled and may be large

Examples

>>> model = MyBlueMathModel()
>>> model.save_model('model.pkl', exclude_attributes=['_logger'])
set_logger_name(name: str, level: str = 'INFO', console: bool = True) None[source]

Configure the model’s logger with a new name and settings.

Parameters:
  • name (str) – The name to give to the logger.

  • level (str, optional) – The logging level to use. Default is “INFO”. Valid values are: “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”

  • console (bool, optional) – Whether to output logs to console. Default is True.

Notes

  • Creates a new file logger with specified settings

  • Previous logger settings are overwritten

  • Log files are created in the default logging directory

Examples

>>> model = BlueMathModel()
>>> model.set_logger_name("my_model", level="DEBUG", console=False)
set_num_processors_to_use(num_processors: int) None[source]

Set the number of processors to use for parallel processing.

Parameters:

num_processors (int) – Number of processors to use. If -1, uses all available processors minus one for system processes.

Raises:

ValueError – If num_processors is <= 0 (except -1).

Notes

  • Automatically adjusts if requesting too many processors

  • Sets the num_workers attribute used by parallel processing methods

  • Takes into account system resources to avoid overload

See also

get_num_processors_available

Get number of available processors

parallel_execute

Execute functions in parallel

set_omp_num_threads(num_threads: int) None[source]

Set the number of OpenMP threads for parallel operations.

Parameters:

num_threads (int) – Number of OpenMP threads to use.

Notes

  • Sets the OMP_NUM_THREADS environment variable

  • Reloads numpy to ensure new thread settings take effect

  • May affect other libraries using OpenMP

Warning

  • This method is under development and behavior may change

  • Reloading numpy may have side effects in running calculations

See also

set_num_processors_to_use

Set number of processors for BlueMath parallel processing

standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler][source]

Standarize data using StandardScaler. More info in bluemath_tk.core.operations.standarize.

Parameters:
  • data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.

  • scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.

  • transform (bool) – Whether to just transform the data. Default to False.

Returns:

  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.

  • scaler (StandardScaler) – Scaler object used for standarization.

bluemath_tk.core.operations module

bluemath_tk.core.operations.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray][source]

This method converts Longitude and Latitude to UTM coordinates.

Parameters:
  • lon (np.ndarray) – The longitude values.

  • lat (np.ndarray) – The latitude values.

  • projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The x and y coordinates in UTM.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray][source]

This method converts UTM coordinates to Longitude and Latitude.

Parameters:
  • utm_x (np.ndarray) – The x values in UTM.

  • utm_y (np.ndarray) – The y values in UTM.

  • projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.

Returns:

The longitude and latitude values.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) DataFrame | Dataset[source]

Denormalize data using provided scale_factor.

Parameters:
  • normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.

  • scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.

Returns:

data – Denormalized data.

Return type:

pd.DataFrame or xr.Dataset

Notes

  • This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.

  • The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.

  • Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operation import denormalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000),
...         "Tp": np.random.rand(1000),
...         "Dir": np.random.rand(1000),
...     }
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor)
>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.operations import denormalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000)),
...         "Tp": (("time",), np.random.rand(1000)),
...         "Dir": (("time",), np.random.rand(1000)),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> scale_factor = {
...     "Hs": [0, 7],
...     "Tp": [0, 20],
...     "Dir": [0, 360],
... }
>>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)
bluemath_tk.core.operations.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset[source]

Destandarize data using provided scaler.

Parameters:
  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.

  • scaler (StandardScaler) – Scaler object used for standarization.

Returns:

Destandarized data.

Return type:

np.ndarray, pd.DataFrame or xr.Dataset

Examples

>>> import numpy as np
>>> from bluemath_tk.core.data import standarize, destandarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)
>>> data = destandarize(standarized_data=standarized_data, scaler=scaler)
bluemath_tk.core.operations.get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray[source]

This method calculates the degrees from the u and v components.

Here, we assume u and v represent angles between 0 and 360 degrees,

where 0° is the North direction, and increasing clockwise.

(u=0, v=1)


(u=-1, v=0) <———> (u=1, v=0)


(u=0, v=-1)

Parameters:
  • xu (np.ndarray) – The u component.

  • xv (np.ndarray) – The v component.

Returns:

The degrees.

Return type:

np.ndarray

bluemath_tk.core.operations.get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray][source]

This method calculates the u and v components for the given directional data.

Here, we assume that the directional data is in degrees,

beign 0° the North direction, and increasing clockwise.

0° N | |

270° W <———> 90° E


90° S

Parameters:

x_deg (np.ndarray) – The directional data in degrees.

Returns:

The u and v components.

Return type:

Tuple[np.ndarray, np.ndarray]

bluemath_tk.core.operations.mathematical_to_nautical(math_degrees: ndarray) ndarray[source]

Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)

Parameters:

math_degrees (float or array-like) – Directional angle in mathematical convention

Returns:

Directional angle in nautical convention

Return type:

np.ndarray

bluemath_tk.core.operations.nautical_to_mathematical(nautical_degrees: ndarray) ndarray[source]

Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)

Parameters:

nautical_degrees (np.ndarray) – Directional angle in nautical convention

Returns:

Directional angle in mathematical convention

Return type:

np.ndarray

bluemath_tk.core.operations.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) Tuple[DataFrame | Dataset, dict][source]

Normalize data to 0-1 using min max scaler approach.

Parameters:
  • data (pd.DataFrame or xr.Dataset) – Input data to be normalized.

  • custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.

  • logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.

Returns:

  • normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.

  • scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.

Notes

  • This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.

  • The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.

  • If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.core.operations import normalize
>>> df = pd.DataFrame(
...     {
...         "Hs": np.random.rand(1000) * 7,
...         "Tp": np.random.rand(1000) * 20,
...         "Dir": np.random.rand(1000) * 360,
...     }
... )
>>> normalized_data, scale_factor = normalize(data=df)
>>> import numpy as np
>>> import xarray as xr
>>> from bluemath_tk.core.operations import normalize
>>> ds = xr.Dataset(
...     {
...         "Hs": (("time",), np.random.rand(1000) * 7),
...         "Tp": (("time",), np.random.rand(1000) * 20),
...         "Dir": (("time",), np.random.rand(1000) * 360),
...     },
...     coords={"time": pd.date_range("2000-01-01", periods=1000)},
... )
>>> normalized_data, scale_factor = normalize(data=ds)
bluemath_tk.core.operations.spatial_gradient(data: DataArray) DataArray[source]

Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).

Parameters:

data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).

Returns:

Gradient magnitude with same dimensions as input.

Return type:

xr.DataArray

Notes

The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.

bluemath_tk.core.operations.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler][source]

Standarize data to have mean 0 and std 1.

Parameters:
  • data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.

  • scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.

  • transform (bool) – Whether to just transform the data. Default to False.

Returns:

  • standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.

  • scaler (StandardScaler) – Scaler object used for standarization.

Examples

>>> import numpy as np
>>> from bluemath_tk.core.operations import standarize
>>> data = np.random.rand(1000, 3) * 10.0
>>> standarized_data, scaler = standarize(data=data)

bluemath_tk.core.pipeline module

class bluemath_tk.core.pipeline.BlueMathPipeline(steps: List[Dict[str, Any]])[source]

Bases: object

A flexible, modular pipeline for chaining together BlueMath models and data processing steps.

This class allows you to define a sequence of steps, where each step must be a BlueMathModel. Each step is defined by a dictionary specifying:

  • ‘name’: str, a unique identifier for the step.

  • ‘model’: the model instance to use (or will be created via ‘model_init’ and ‘model_init_params’).

  • ‘model_init’: (optional) a callable/class to instantiate the model.

  • ‘model_init_params’: (optional) dict of parameters for model initialization.

  • ‘fit_method’: (optional) str, the method name to call for fitting (default is based on model type).

  • ‘fit_params’: (optional) dict, parameters for the fit method.

  • ‘pipeline_attributes_to_store’: (optional) list of attribute names to store for later use.

The pipeline supports advanced parameter passing, including referencing outputs from previous steps and using callables for dynamic parameter computation.

fit(data: ndarray | DataFrame | Dataset = None)[source]

Fit all models in the pipeline sequentially, passing the output of each step as input to the next.

For each step, the model is (optionally) initialized, then fit using the specified method and parameters. Parameters and model initialization arguments can be dynamically computed using callables or references to previous pipeline attributes.

Parameters:

data (Union[np.ndarray, pd.DataFrame, xr.Dataset], optional) – The input data to fit the models. If None, the pipeline expects each step to handle its own data.

Return type:

The output of the final step in the pipeline (could be transformed data, predictions, etc.).

Perform a grid search over all possible parameter combinations for all steps in the pipeline.

This method evaluates every possible combination of parameters (from the provided grids) for each step, fits the pipeline, and scores the result using the provided metric or the last model’s score method. The best parameter set (lowest score) is selected and the pipeline is updated accordingly.

Parameters:
  • data (Union[np.ndarray, pd.DataFrame]) – The input data to fit the models.

  • param_grid (List[Dict[str, Any]]) – List of parameter grids for each step in the pipeline. Each grid is a dict mapping parameter names to lists of values to try. Parameters can be for either model_init_params or fit_params.

  • metric (Callable, optional) – Function to evaluate the final output. Should take (y_true, y_pred) as arguments. If None, will use the last model’s built-in score method if available.

  • target_data (Union[np.ndarray, pd.DataFrame], optional) – Target data to evaluate against if using a custom metric. Required if metric is provided.

  • plot (bool, optional) – If True, plot the score for each parameter combination after grid search. Default is False.

Returns:

Dictionary containing:
  • ’best_params’: the best parameter set for each step

  • ’best_score’: the best score achieved

  • ’best_output’: the output of the pipeline for the best parameters

  • ’all_results’: a list of all parameter sets and their scores/outputs

Return type:

Dict[str, Any]

Raises:

ValueError – If the number of parameter grids does not match the number of pipeline steps, or if a metric is provided but no target_data is given.

property pipeline_attributes: Dict[str, Dict[str, Any]]

Get the stored model attributes from each pipeline step.

Returns:

A dictionary mapping step names to dictionaries of stored attributes.

Return type:

Dict[str, Dict[str, Any]]

Raises:

ValueError – If the pipeline has not been fit yet and no attributes are stored.

Module contents

Project: BlueMath_tk Sub-Module: core Author: GeoOcean Research Group, Universidad de Cantabria Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)