bluemath_tk.core package
Subpackages
- bluemath_tk.core.data package
- bluemath_tk.core.plotting package
- Submodules
- bluemath_tk.core.plotting.base_plotting module
BasePlotting
DefaultInteractivePlotting
DefaultStaticPlotting
DefaultStaticPlotting.get_subplot()
DefaultStaticPlotting.get_subplots()
DefaultStaticPlotting.plot_line()
DefaultStaticPlotting.plot_map()
DefaultStaticPlotting.plot_pie()
DefaultStaticPlotting.plot_scatter()
DefaultStaticPlotting.set_grid()
DefaultStaticPlotting.set_title()
DefaultStaticPlotting.set_xlabel()
DefaultStaticPlotting.set_xlim()
DefaultStaticPlotting.set_ylabel()
DefaultStaticPlotting.set_ylim()
DefaultStaticPlotting.templates
- bluemath_tk.core.plotting.colors module
- bluemath_tk.core.plotting.scatter module
- bluemath_tk.core.plotting.utils module
- Module contents
Submodules
bluemath_tk.core.constants module
bluemath_tk.core.dask module
- bluemath_tk.core.dask.get_available_ram() int [source]
Get the available RAM in the system.
- Returns:
The available RAM in bytes.
- Return type:
int
- bluemath_tk.core.dask.get_total_ram() int [source]
Get the total RAM in the system.
- Returns:
The total RAM in bytes.
- Return type:
int
- bluemath_tk.core.dask.setup_dask_client(n_workers: int = None, memory_limit: str = 0.5)[source]
Setup a Dask client with controlled resources.
- Parameters:
n_workers (int, optional) – Number of workers. Default is None.
memory_limit (str, optional) – Memory limit per worker. Default is 0.5.
- Returns:
Dask distributed client
- Return type:
Client
Notes
Resources might vary depending on the hardware and the load of the machine. Be very careful when setting the number of workers and memory limit, as it might affect the performance of the machine, or in the worse case scenario, the performance of other users in the same machine (cluster case).
bluemath_tk.core.decorators module
- bluemath_tk.core.decorators.validate_data_calval(func)[source]
Decorator to validate data in CalVal class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_kma(func)[source]
Decorator to validate data in KMA class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_lhs(func)[source]
Decorator to validate data in LHS class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_mda(func)[source]
Decorator to validate data in MDA class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_pca(func)[source]
Decorator to validate data in PCA class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_rbf(func)[source]
Decorator to validate data in RBF class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
bluemath_tk.core.geo module
- bluemath_tk.core.geo.buffer_area_for_polygon(polygon: Polygon, area_factor: float) Polygon [source]
Buffer the polygon by a factor of its area divided by its length. This is a heuristic to ensure that the buffer is proportional to the size of the polygon.
- Parameters:
polygon (Polygon) – The polygon to be buffered.
mas (float) – The buffer factor.
- Returns:
The buffered polygon.
- Return type:
Polygon
Example
>>> from shapely.geometry import Polygon >>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]) >>> mas = 0.1 >>> buffered_polygon = buffer_area_for_polygon(polygon, mas) >>> print(buffered_polygon) POLYGON ((-0.1 -0.1, 1.1 -0.1, 1.1 1.1, -0.1 1.1, -0.1 -0.1))
- bluemath_tk.core.geo.convert_to_radians(*args: float | ndarray) tuple [source]
Convert degree inputs to radians.
- Parameters:
*args (Union[float, np.ndarray]) – Variable number of inputs in degrees to convert to radians. Can be either scalar floats or numpy arrays.
- Returns:
Tuple of input values converted to radians, preserving input types.
- Return type:
tuple
Examples
>>> convert_to_radians(90.0) (1.5707963267948966,) >>> convert_to_radians(90.0, 180.0) (1.5707963267948966, 3.141592653589793)
- bluemath_tk.core.geo.create_polygon(coordinates: List[Tuple[float, float]]) Polygon [source]
Create a polygon from a list of (longitude, latitude) coordinates.
- Parameters:
coordinates (List[Tuple[float, float]]) – List of (longitude, latitude) coordinate pairs that define the polygon vertices. The first and last points should be the same to close the polygon.
- Returns:
A shapely Polygon object.
- Return type:
Polygon
Examples
>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)] # Square >>> poly = create_polygon(coords)
- bluemath_tk.core.geo.filter_points_in_polygon(lon: List[float] | ndarray, lat: List[float] | ndarray, polygon: Polygon) Tuple[ndarray, ndarray] [source]
Filter points to keep only those inside a polygon.
- Parameters:
lon (Union[List[float], np.ndarray]) – Array or list of longitude values.
lat (Union[List[float], np.ndarray]) – Array or list of latitude values. Must have the same shape as lon.
polygon (Polygon) – A shapely Polygon object.
- Returns:
Tuple containing: - filtered_lon : Array of longitudes inside the polygon - filtered_lat : Array of latitudes inside the polygon
- Return type:
Tuple[np.ndarray, np.ndarray]
- Raises:
ValueError – If lon and lat arrays have different shapes.
Examples
>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)] # Square >>> poly = create_polygon(coords) >>> lon = [0.5, 2.0] >>> lat = [0.5, 2.0] >>> filtered_lon, filtered_lat = filter_points_in_polygon(lon, lat, poly) >>> print(filtered_lon) # [0.5] >>> print(filtered_lat) # [0.5]
- bluemath_tk.core.geo.geo_distance_cartesian(y_matrix: float | ndarray, x_matrix: float | ndarray, y_point: float | ndarray, x_point: float | ndarray) ndarray [source]
Returns cartesian distance between y,x matrix and y,x point. Optimized using vectorized operations.
- Parameters:
y_matrix (Union[float, np.ndarray]) – 2D array of y-coordinates (latitude or y in Cartesian).
x_matrix (Union[float, np.ndarray]) – 2D array of x-coordinates (longitude or x in Cartesian).
y_point (Union[float, np.ndarray]) – y-coordinate of the point (latitude or y in Cartesian).
x_point (Union[float, np.ndarray]) – x-coordinate of the point (longitude or x in Cartesian).
- Returns:
Array of distances in the same units as x_matrix and y_matrix.
- Return type:
np.ndarray
- bluemath_tk.core.geo.geodesic_azimuth(lat1: float | ndarray, lon1: float | ndarray, lat2: float | ndarray, lon2: float | ndarray) float | ndarray [source]
Calculate azimuth between two points on Earth.
- Parameters:
lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees
lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees
lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees
lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees
- Returns:
Azimuth(s) in degrees from North
- Return type:
Union[float, np.ndarray]
Notes
The azimuth is the angle between true north and the direction to the second point, measured clockwise from north. Special cases are handled for points at the poles.
Examples
>>> geodesic_azimuth(0, 0, 0, 90) 90.0 >>> geodesic_azimuth([0, 45], [0, -90], [0, -45], [90, 90]) array([90., 90.])
- bluemath_tk.core.geo.geodesic_distance(lat1: float | ndarray, lon1: float | ndarray, lat2: float | ndarray, lon2: float | ndarray) float | ndarray [source]
Calculate great circle distance between two points on Earth.
- Parameters:
lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees
lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees
lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees
lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees
- Returns:
Great circle distance(s) in degrees
- Return type:
Union[float, np.ndarray]
Notes
Uses the haversine formula to calculate great circle distance. The result is in degrees of arc on a sphere.
Examples
>>> geodesic_distance(0, 0, 0, 90) 90.0 >>> geodesic_distance([0, 45], [0, -90], [0, -45], [90, 90]) array([90., 180.])
- bluemath_tk.core.geo.geodesic_distance_azimuth(lat1: float | ndarray, lon1: float | ndarray, lat2: float | ndarray, lon2: float | ndarray) Tuple[float | ndarray, float | ndarray] [source]
Calculate both great circle distance and azimuth between two points.
- Parameters:
lat1 (Union[float, np.ndarray]) – Latitude of first point(s) in degrees
lon1 (Union[float, np.ndarray]) – Longitude of first point(s) in degrees
lat2 (Union[float, np.ndarray]) – Latitude of second point(s) in degrees
lon2 (Union[float, np.ndarray]) – Longitude of second point(s) in degrees
- Returns:
Tuple containing: - distance(s) : Great circle distance(s) in degrees - azimuth(s) : Azimuth(s) in degrees from North
- Return type:
Tuple[Union[float, np.ndarray], Union[float, np.ndarray]]
See also
geodesic_distance
Calculate only the great circle distance
geodesic_azimuth
Calculate only the azimuth
Examples
>>> dist, az = geodesic_distance_azimuth(0, 0, 0, 90) >>> dist 90.0 >>> az 90.0
- bluemath_tk.core.geo.mask_points_outside_polygon(elements: ndarray, node_coords: ndarray, poly: Polygon) ndarray [source]
Returns a boolean mask indicating which triangle elements have at least two vertices outside the polygon.
This version uses matplotlib.path.Path for high-performance point-in-polygon testing.
- Parameters:
elements ((n_elements, 3) np.ndarray) – Array containing indices of triangle vertices.
node_coords ((n_nodes, 2) np.ndarray) – Array of node coordinates as (x, y) pairs.
poly (shapely.geometry.Polygon) – Polygon used for containment checks.
- Returns:
mask – Boolean array where True means at least two vertices of the triangle lie outside the polygon.
- Return type:
(n_elements,) np.ndarray
Example
>>> import numpy as np >>> from shapely.geometry import Polygon >>> elements = np.array([[0, 1, 2], [1, 2, 3]]) >>> node_coords = np.array([[0, 0], [1, 0], [0, 1], [1, 1]]) >>> poly = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]) >>> mask = mask_points_outside_polygon(elements, node_coords, poly) >>> print(mask) [False False]
- bluemath_tk.core.geo.points_in_polygon(lon: List[float] | ndarray, lat: List[float] | ndarray, polygon: Polygon) ndarray [source]
Check which points are inside a polygon.
- Parameters:
lon (Union[List[float], np.ndarray]) – Array or list of longitude values.
lat (Union[List[float], np.ndarray]) – Array or list of latitude values. Must have the same shape as lon.
polygon (Polygon) – A shapely Polygon object.
- Returns:
Boolean array indicating which points are inside the polygon.
- Return type:
np.ndarray
- Raises:
ValueError – If lon and lat arrays have different shapes.
Examples
>>> coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)] # Square >>> poly = create_polygon(coords) >>> lon = [0.5, 2.0] >>> lat = [0.5, 2.0] >>> mask = points_in_polygon(lon, lat, poly) >>> print(mask) # [True, False]
- bluemath_tk.core.geo.shoot(lon: float | ndarray, lat: float | ndarray, azimuth: float | ndarray, maxdist: float | ndarray) Tuple[float | ndarray, float | ndarray, float | ndarray] [source]
Calculate endpoint given starting point, azimuth and distance.
- Parameters:
lon (Union[float, np.ndarray]) – Starting longitude(s) in degrees
lat (Union[float, np.ndarray]) – Starting latitude(s) in degrees
azimuth (Union[float, np.ndarray]) – Initial azimuth(s) in degrees
maxdist (Union[float, np.ndarray]) – Distance(s) to travel in kilometers
- Returns:
Tuple containing: - final_lon : Final longitude(s) in degrees - final_lat : Final latitude(s) in degrees - back_azimuth : Back azimuth(s) in degrees
- Return type:
Tuple[Union[float, np.ndarray], Union[float, np.ndarray], Union[float, np.ndarray]]
Notes
This function implements a geodesic shooting algorithm based on T. Vincenty’s method. It accounts for the Earth’s ellipsoidal shape.
- Raises:
ValueError – If attempting to shoot from a pole in a direction not along a meridian.
Examples
>>> lon_f, lat_f, baz = shoot(0, 0, 90, 111.195) # ~1 degree at equator >>> round(lon_f, 6) 1.0 >>> round(lat_f, 6) 0.0 >>> round(baz, 6) 270.0
bluemath_tk.core.io module
- bluemath_tk.core.io.load_model(model_path: str) BlueMathModel [source]
Loads a BlueMathModel from a file.
- Parameters:
model_path (str) – The path to the model file.
- Returns:
The loaded BlueMathModel.
- Return type:
bluemath_tk.core.logging module
- bluemath_tk.core.logging.get_file_logger(name: str, logs_path: str = None, level: int | str = 'INFO', console: bool = True, console_level: int | str = 'WARNING') Logger [source]
Creates and returns a logger that writes log messages to a file.
- Parameters:
name (str) – The name of the logger.
logs_path (str, optional) – The file path where the log messages will be written. Default is None.
level (Union[int, str], optional) – The logging level. Default is “INFO”.
console (bool) – Whether to add or not console / terminal logs. Default is True.
console_level (Union[int, str], optional) – The logging level for console / terminal logs. Default is “WARNING”.
- Returns:
Configured logger instance.
- Return type:
logging.Logger
Examples
>>> from bluemath_tk.core.logging import get_file_logger >>> # Create a logger that writes to "app.log" >>> logger = get_file_logger("my_app_logger", "app.log") >>> # Log messages >>> logger.info("This is an info message.") >>> logger.warning("This is a warning message.") >>> logger.error("This is an error message.") >>> # The output will be saved in "app.log" with the format: >>> # 2023-10-22 14:55:23,456 - my_app_logger - INFO - This is an info message. >>> # 2023-10-22 14:55:23,457 - my_app_logger - WARNING - This is a warning message. >>> # 2023-10-22 14:55:23,458 - my_app_logger - ERROR - This is an error message.
bluemath_tk.core.models module
- class bluemath_tk.core.models.BlueMathModel[source]
Bases:
ABC
Abstract base class for handling default functionalities across the project.
This class provides core functionality used by all BlueMath models including: - Model saving and loading - Data normalization and denormalization - Parallel processing capabilities - Logging functionality - NaN handling - Directional data processing
- gravity
Gravitational constant from scipy.constants.
- Type:
float
- earth_radius
Earth radius in km.
- Type:
float
- num_workers
Number of parallel workers to use for processing.
- Type:
int
- logger
Logger instance for the model.
- Type:
logging.Logger
Notes
All BlueMath models should inherit from this class to ensure consistent behavior and functionality across the project.
- check_nans(data: ndarray | Series | DataFrame | DataArray | Dataset, replace_value: float | callable = None, raise_error: bool = False) ndarray | Series | DataFrame | DataArray | Dataset [source]
Check for NaNs in the data and optionally replace them.
- Parameters:
data (Union[np.ndarray, pd.Series, pd.DataFrame, xr.DataArray, xr.Dataset]) – The data to check for NaNs.
replace_value (Union[float, callable], optional) – Value to replace NaNs with. If callable, the function will be called on the data. Default is None (no replacement).
raise_error (bool, optional) – Whether to raise an error if NaNs are found. Default is False.
- Returns:
data – The data with NaNs optionally replaced.
- Return type:
Union[np.ndarray, pd.Series, pd.DataFrame, xr.DataArray, xr.Dataset]
- Raises:
ValueError – If NaNs are found and raise_error is True.
Notes
For numpy arrays, uses np.isnan() to check for NaNs
For pandas objects, uses isnull() to check for NaNs
For xarray objects, uses isnull() to check for NaNs
If replace_value is callable, it takes precedence over other options
Examples
>>> import numpy as np >>> import pandas as pd >>> model = BlueMathModel() >>> df = pd.DataFrame({'a': [1, np.nan, 3]}) >>> cleaned_df = model.check_nans(df, replace_value=0) >>> print(cleaned_df) a 0 1 1 0 2 3
- denormalize(normalized_data: DataFrame, scale_factor: dict) DataFrame [source]
Denormalize data using provided scale_factor. More info in bluemath_tk.core.operations.denormalize.
- Parameters:
normalized_data (pd.DataFrame) – The normalized data to denormalize.
scale_factor (dict) – The scale factors used for denormalization.
- Returns:
data – The denormalized data.
- Return type:
pd.DataFrame
- destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset [source]
Destandarize data using provided scaler. More info in bluemath_tk.core.operations.destandarize.
- Parameters:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.
- Returns:
data – Destandarized data.
- Return type:
np.ndarray, pd.DataFrame or xr.Dataset
- earth_radius = 6378.135
- static get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray [source]
This method calculates the degrees from the u and v components.
- Here, we assume u and v represent angles between 0 and 360 degrees,
where 0° is the North direction, and increasing clockwise.
- (u=0, v=1)
- (u=-1, v=0) <———> (u=1, v=0)
(u=0, v=-1)
- Parameters:
xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.
- Returns:
The degrees.
- Return type:
np.ndarray
- static get_metrics(data1: DataFrame | Dataset, data2: DataFrame | Dataset) DataFrame [source]
Gets the metrics of the model.
- Parameters:
data1 (pd.DataFrame or xr.Dataset) – The first dataset.
data2 (pd.DataFrame or xr.Dataset) – The second dataset.
- Returns:
metrics – The metrics of the model.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the DataFrames or Datasets have different shapes.
TypeError – If the inputs are not both DataFrames or both xarray Datasets.
- get_num_processors_available() int [source]
Gets the number of processors available.
- Returns:
The number of processors available.
- Return type:
int
- static get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray] [source]
This method calculates the u and v components for the given directional data.
- Here, we assume that the directional data is in degrees,
beign 0° the North direction, and increasing clockwise.
- 0° N
- 270° W <———> 90° E
90° S
- Parameters:
x_deg (np.ndarray) – The directional data in degrees.
- Returns:
The u and v components.
- Return type:
Tuple[np.ndarray, np.ndarray]
- gravity = 9.80665
- list_class_attributes() list [source]
List all non-callable attributes of the class.
- Returns:
Names of all non-callable, non-private attributes.
- Return type:
list
Notes
Excludes methods and private attributes (starting with __)
Includes properties and class variables
Useful for introspection and debugging
Examples
>>> model = BlueMathModel() >>> attrs = model.list_class_attributes() >>> print(attrs) ['gravity', 'num_workers', '_logger']
- list_class_methods() list [source]
List all callable methods of the class.
- Returns:
Names of all callable, non-private methods.
- Return type:
list
Notes
Excludes attributes and private methods (starting with __)
Includes instance methods and properties
Useful for introspection and debugging
Examples
>>> model = BlueMathModel() >>> methods = model.list_class_methods() >>> print(methods) ['normalize', 'denormalize', 'check_nans']
- load_model(model_path: str) BlueMathModel [source]
Loads the model from a file.
- property logger: Logger
Get the logger instance for this model.
- Returns:
The logger instance. Creates a new file logger if none exists.
- Return type:
logging.Logger
Notes
Lazily instantiates logger on first access
Uses class name as default logger name
Thread-safe logger creation
- normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}) Tuple[DataFrame | Dataset, dict] [source]
Normalize data to 0-1 using min max scaler approach. More info in bluemath_tk.core.operations.normalize.
- Parameters:
data (pd.DataFrame or xr.Dataset) – The data to normalize.
custom_scale_factor (dict, optional) – Custom scale factors for normalization.
- Returns:
normalized_data (pd.DataFrame or xr.Dataset) – The normalized data.
scale_factor (dict) – The scale factors used for normalization.
- parallel_execute(func: Callable, items: List[Any], num_workers: int, cpu_intensive: bool = False, **kwargs) Dict[int, Any] [source]
Execute a function in parallel across multiple items.
- Parameters:
func (Callable) – The function to execute. Should accept single item and **kwargs.
items (List[Any]) – List of items to process in parallel.
num_workers (int) – Number of parallel workers to use.
cpu_intensive (bool, optional) – If True, uses ProcessPoolExecutor, otherwise ThreadPoolExecutor. Default is False.
**kwargs (dict) – Additional keyword arguments passed to func.
- Returns:
Dictionary mapping item indices to function results.
- Return type:
Dict[int, Any]
- Raises:
Exception – Any exception raised by func is logged and the job continues.
Notes
Uses ThreadPoolExecutor for I/O-bound tasks
Uses ProcessPoolExecutor for CPU-bound tasks
Results maintain original item order via index mapping
Failed jobs are logged but don’t stop execution
Warning
ThreadPoolExecutor may have GIL limitations
ProcessPoolExecutor doesn’t work with non-picklable objects
File operations may fail with ThreadPoolExecutor
Examples
>>> def square(x): ... return x * x >>> model = BlueMathModel() >>> results = model.parallel_execute(square, [1, 2, 3], num_workers=2) >>> print(results) {0: 1, 1: 4, 2: 9}
- save_model(model_path: str, exclude_attributes: List[str] = None) None [source]
Save the model to a file using pickle.
- Parameters:
model_path (str) – Path where the model will be saved.
exclude_attributes (List[str], optional) – List of attribute names to exclude from saving. Default is None. If provided, it will override the default _exclude_attributes.
Notes
Uses pickle for serialization
Warns if any xarray Datasets/DataArrays are being pickled
Creates parent directories if they don’t exist
Excludes specified attributes from serialization
Warning
Pickle files can be security risks if loaded from untrusted sources
xarray objects in the model will be pickled and may be large
Examples
>>> model = MyBlueMathModel() >>> model.save_model('model.pkl', exclude_attributes=['_logger'])
- set_logger_name(name: str, level: str = 'INFO', console: bool = True) None [source]
Configure the model’s logger with a new name and settings.
- Parameters:
name (str) – The name to give to the logger.
level (str, optional) – The logging level to use. Default is “INFO”. Valid values are: “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”
console (bool, optional) – Whether to output logs to console. Default is True.
Notes
Creates a new file logger with specified settings
Previous logger settings are overwritten
Log files are created in the default logging directory
Examples
>>> model = BlueMathModel() >>> model.set_logger_name("my_model", level="DEBUG", console=False)
- set_num_processors_to_use(num_processors: int) None [source]
Set the number of processors to use for parallel processing.
- Parameters:
num_processors (int) – Number of processors to use. If -1, uses all available processors minus one for system processes.
- Raises:
ValueError – If num_processors is <= 0 (except -1).
Notes
Automatically adjusts if requesting too many processors
Sets the num_workers attribute used by parallel processing methods
Takes into account system resources to avoid overload
See also
get_num_processors_available
Get number of available processors
parallel_execute
Execute functions in parallel
- set_omp_num_threads(num_threads: int) None [source]
Set the number of OpenMP threads for parallel operations.
- Parameters:
num_threads (int) – Number of OpenMP threads to use.
Notes
Sets the OMP_NUM_THREADS environment variable
Reloads numpy to ensure new thread settings take effect
May affect other libraries using OpenMP
Warning
This method is under development and behavior may change
Reloading numpy may have side effects in running calculations
See also
set_num_processors_to_use
Set number of processors for BlueMath parallel processing
- standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler] [source]
Standarize data using StandardScaler. More info in bluemath_tk.core.operations.standarize.
- Parameters:
data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.
scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.
transform (bool) – Whether to just transform the data. Default to False.
- Returns:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.
scaler (StandardScaler) – Scaler object used for standarization.
bluemath_tk.core.operations module
- bluemath_tk.core.operations.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray] [source]
This method converts Longitude and Latitude to UTM coordinates.
- Parameters:
lon (np.ndarray) – The longitude values.
lat (np.ndarray) – The latitude values.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.
- Returns:
The x and y coordinates in UTM.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.operations.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray] [source]
This method converts UTM coordinates to Longitude and Latitude.
- Parameters:
utm_x (np.ndarray) – The x values in UTM.
utm_y (np.ndarray) – The y values in UTM.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.
- Returns:
The longitude and latitude values.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.operations.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) DataFrame | Dataset [source]
Denormalize data using provided scale_factor.
- Parameters:
normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.
- Returns:
data – Denormalized data.
- Return type:
pd.DataFrame or xr.Dataset
Notes
This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.
The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.
Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.
Examples
>>> import numpy as np >>> import pandas as pd >>> from bluemath_tk.core.operation import denormalize >>> df = pd.DataFrame( ... { ... "Hs": np.random.rand(1000), ... "Tp": np.random.rand(1000), ... "Dir": np.random.rand(1000), ... } ... ) >>> scale_factor = { ... "Hs": [0, 7], ... "Tp": [0, 20], ... "Dir": [0, 360], ... } >>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor)
>>> import numpy as np >>> import xarray as xr >>> from bluemath_tk.core.operations import denormalize >>> ds = xr.Dataset( ... { ... "Hs": (("time",), np.random.rand(1000)), ... "Tp": (("time",), np.random.rand(1000)), ... "Dir": (("time",), np.random.rand(1000)), ... }, ... coords={"time": pd.date_range("2000-01-01", periods=1000)}, ... ) >>> scale_factor = { ... "Hs": [0, 7], ... "Tp": [0, 20], ... "Dir": [0, 360], ... } >>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)
- bluemath_tk.core.operations.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset [source]
Destandarize data using provided scaler.
- Parameters:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.
- Returns:
Destandarized data.
- Return type:
np.ndarray, pd.DataFrame or xr.Dataset
Examples
>>> import numpy as np >>> from bluemath_tk.core.data import standarize, destandarize >>> data = np.random.rand(1000, 3) * 10.0 >>> standarized_data, scaler = standarize(data=data) >>> data = destandarize(standarized_data=standarized_data, scaler=scaler)
- bluemath_tk.core.operations.get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray [source]
This method calculates the degrees from the u and v components.
- Here, we assume u and v represent angles between 0 and 360 degrees,
where 0° is the North direction, and increasing clockwise.
- (u=0, v=1)
- (u=-1, v=0) <———> (u=1, v=0)
(u=0, v=-1)
- Parameters:
xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.
- Returns:
The degrees.
- Return type:
np.ndarray
- bluemath_tk.core.operations.get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray] [source]
This method calculates the u and v components for the given directional data.
- Here, we assume that the directional data is in degrees,
beign 0° the North direction, and increasing clockwise.
0° N | |
- 270° W <———> 90° E
90° S
- Parameters:
x_deg (np.ndarray) – The directional data in degrees.
- Returns:
The u and v components.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.operations.mathematical_to_nautical(math_degrees: ndarray) ndarray [source]
Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)
- Parameters:
math_degrees (float or array-like) – Directional angle in mathematical convention
- Returns:
Directional angle in nautical convention
- Return type:
np.ndarray
- bluemath_tk.core.operations.nautical_to_mathematical(nautical_degrees: ndarray) ndarray [source]
Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)
- Parameters:
nautical_degrees (np.ndarray) – Directional angle in nautical convention
- Returns:
Directional angle in mathematical convention
- Return type:
np.ndarray
- bluemath_tk.core.operations.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) Tuple[DataFrame | Dataset, dict] [source]
Normalize data to 0-1 using min max scaler approach.
- Parameters:
data (pd.DataFrame or xr.Dataset) – Input data to be normalized.
custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.
logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.
- Returns:
normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.
Notes
This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.
The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.
If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.
Examples
>>> import numpy as np >>> import pandas as pd >>> from bluemath_tk.core.operations import normalize >>> df = pd.DataFrame( ... { ... "Hs": np.random.rand(1000) * 7, ... "Tp": np.random.rand(1000) * 20, ... "Dir": np.random.rand(1000) * 360, ... } ... ) >>> normalized_data, scale_factor = normalize(data=df)
>>> import numpy as np >>> import xarray as xr >>> from bluemath_tk.core.operations import normalize >>> ds = xr.Dataset( ... { ... "Hs": (("time",), np.random.rand(1000) * 7), ... "Tp": (("time",), np.random.rand(1000) * 20), ... "Dir": (("time",), np.random.rand(1000) * 360), ... }, ... coords={"time": pd.date_range("2000-01-01", periods=1000)}, ... ) >>> normalized_data, scale_factor = normalize(data=ds)
- bluemath_tk.core.operations.spatial_gradient(data: DataArray) DataArray [source]
Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).
- Parameters:
data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).
- Returns:
Gradient magnitude with same dimensions as input.
- Return type:
xr.DataArray
Notes
The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.
- bluemath_tk.core.operations.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler] [source]
Standarize data to have mean 0 and std 1.
- Parameters:
data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.
scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.
transform (bool) – Whether to just transform the data. Default to False.
- Returns:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.
scaler (StandardScaler) – Scaler object used for standarization.
Examples
>>> import numpy as np >>> from bluemath_tk.core.operations import standarize >>> data = np.random.rand(1000, 3) * 10.0 >>> standarized_data, scaler = standarize(data=data)
bluemath_tk.core.pipeline module
- class bluemath_tk.core.pipeline.BlueMathPipeline(steps: List[Dict[str, Any]])[source]
Bases:
object
A flexible, modular pipeline for chaining together BlueMath models and data processing steps.
This class allows you to define a sequence of steps, where each step must be a BlueMathModel. Each step is defined by a dictionary specifying:
‘name’: str, a unique identifier for the step.
‘model’: the model instance to use (or will be created via ‘model_init’ and ‘model_init_params’).
‘model_init’: (optional) a callable/class to instantiate the model.
‘model_init_params’: (optional) dict of parameters for model initialization.
‘fit_method’: (optional) str, the method name to call for fitting (default is based on model type).
‘fit_params’: (optional) dict, parameters for the fit method.
‘pipeline_attributes_to_store’: (optional) list of attribute names to store for later use.
The pipeline supports advanced parameter passing, including referencing outputs from previous steps and using callables for dynamic parameter computation.
- fit(data: ndarray | DataFrame | Dataset = None)[source]
Fit all models in the pipeline sequentially, passing the output of each step as input to the next.
For each step, the model is (optionally) initialized, then fit using the specified method and parameters. Parameters and model initialization arguments can be dynamically computed using callables or references to previous pipeline attributes.
- Parameters:
data (Union[np.ndarray, pd.DataFrame, xr.Dataset], optional) – The input data to fit the models. If None, the pipeline expects each step to handle its own data.
- Return type:
The output of the final step in the pipeline (could be transformed data, predictions, etc.).
- grid_search(data: ndarray | DataFrame, param_grid: List[Dict[str, Any]], metric: Callable = None, target_data: ndarray | DataFrame = None, plot: bool = False) Dict[str, Any] [source]
Perform a grid search over all possible parameter combinations for all steps in the pipeline.
This method evaluates every possible combination of parameters (from the provided grids) for each step, fits the pipeline, and scores the result using the provided metric or the last model’s score method. The best parameter set (lowest score) is selected and the pipeline is updated accordingly.
- Parameters:
data (Union[np.ndarray, pd.DataFrame]) – The input data to fit the models.
param_grid (List[Dict[str, Any]]) – List of parameter grids for each step in the pipeline. Each grid is a dict mapping parameter names to lists of values to try. Parameters can be for either model_init_params or fit_params.
metric (Callable, optional) – Function to evaluate the final output. Should take (y_true, y_pred) as arguments. If None, will use the last model’s built-in score method if available.
target_data (Union[np.ndarray, pd.DataFrame], optional) – Target data to evaluate against if using a custom metric. Required if metric is provided.
plot (bool, optional) – If True, plot the score for each parameter combination after grid search. Default is False.
- Returns:
- Dictionary containing:
’best_params’: the best parameter set for each step
’best_score’: the best score achieved
’best_output’: the output of the pipeline for the best parameters
’all_results’: a list of all parameter sets and their scores/outputs
- Return type:
Dict[str, Any]
- Raises:
ValueError – If the number of parameter grids does not match the number of pipeline steps, or if a metric is provided but no target_data is given.
- property pipeline_attributes: Dict[str, Dict[str, Any]]
Get the stored model attributes from each pipeline step.
- Returns:
A dictionary mapping step names to dictionaries of stored attributes.
- Return type:
Dict[str, Dict[str, Any]]
- Raises:
ValueError – If the pipeline has not been fit yet and no attributes are stored.
Module contents
Project: BlueMath_tk Sub-Module: core Author: GeoOcean Research Group, Universidad de Cantabria Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)