bluemath_tk.core package
Subpackages
- bluemath_tk.core.data package
- bluemath_tk.core.plotting package
- Submodules
- bluemath_tk.core.plotting.base_plotting module
BasePlotting
DefaultInteractivePlotting
DefaultStaticPlotting
DefaultStaticPlotting.get_subplot()
DefaultStaticPlotting.get_subplots()
DefaultStaticPlotting.plot_line()
DefaultStaticPlotting.plot_map()
DefaultStaticPlotting.plot_pie()
DefaultStaticPlotting.plot_scatter()
DefaultStaticPlotting.set_grid()
DefaultStaticPlotting.set_title()
DefaultStaticPlotting.set_xlabel()
DefaultStaticPlotting.set_xlim()
DefaultStaticPlotting.set_ylabel()
DefaultStaticPlotting.set_ylim()
DefaultStaticPlotting.templates
- bluemath_tk.core.plotting.colors module
- Module contents
Submodules
bluemath_tk.core.decorators module
- bluemath_tk.core.decorators.validate_data_kma(func)[source]
Decorator to validate data in KMA class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_lhs(func)[source]
Decorator to validate data in LHS class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_mda(func)[source]
Decorator to validate data in MDA class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_pca(func)[source]
Decorator to validate data in PCA class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
- bluemath_tk.core.decorators.validate_data_rbf(func)[source]
Decorator to validate data in RBF class fit method.
- Parameters:
func (callable) – The function to be decorated
- Returns:
The decorated function
- Return type:
callable
bluemath_tk.core.logging module
- bluemath_tk.core.logging.get_file_logger(name: str, logs_path: str = None, level: int | str = 'INFO', console: bool = True, console_level: int | str = 'WARNING') Logger [source]
Creates and returns a logger that writes log messages to a file.
- Parameters:
name (str) – The name of the logger.
logs_path (str, optional) – The file path where the log messages will be written. Default is None.
level (Union[int, str], optional) – The logging level. Default is “INFO”.
console (bool) – Whether to add or not console / terminal logs. Default is True.
console_level (Union[int, str], optional) – The logging level for console / terminal logs. Default is “WARNING”.
- Returns:
Configured logger instance.
- Return type:
logging.Logger
Examples
>>> from bluemath_tk.core.logging import get_file_logger >>> # Create a logger that writes to "app.log" >>> logger = get_file_logger("my_app_logger", "app.log") >>> # Log messages >>> logger.info("This is an info message.") >>> logger.warning("This is a warning message.") >>> logger.error("This is an error message.") >>> # The output will be saved in "app.log" with the format: >>> # 2023-10-22 14:55:23,456 - my_app_logger - INFO - This is an info message. >>> # 2023-10-22 14:55:23,457 - my_app_logger - WARNING - This is a warning message. >>> # 2023-10-22 14:55:23,458 - my_app_logger - ERROR - This is an error message.
bluemath_tk.core.models module
- class bluemath_tk.core.models.BlueMathModel[source]
Bases:
ABC
Abstract base class for handling default functionalities across the project.
- check_nans(data: ndarray | Series | DataFrame | DataArray | Dataset, replace_value: float | callable = None, raise_error: bool = False) ndarray | Series | DataFrame | DataArray | Dataset [source]
Checks for NaNs in the data and optionally replaces them.
- Parameters:
data (np.ndarray, pd.Series, pd.DataFrame, xr.DataArray or xr.Dataset) – The data to check for NaNs.
replace_value (float or callable, optional) – The value to replace NaNs with. If None, NaNs will not be replaced. If a callable is provided, it will be called and the result will be returned. Default is None.
raise_error (bool, optional) – Whether to raise an error if NaNs are found. Default is False.
- Returns:
data – The data with NaNs optionally replaced.
- Return type:
np.ndarray, pd.Series, pd.DataFrame, xr.DataArray or xr.Dataset
- Raises:
ValueError – If NaNs are found and raise_error is True.
Notes
This method is intended to be used in classes that inherit from the BlueMathModel class.
The method checks for NaNs in the data and optionally replaces them with the specified value.
- denormalize(normalized_data: DataFrame, scale_factor: dict) DataFrame [source]
Denormalize data using provided scale_factor. More info in bluemath_tk.core.operations.denormalize.
- Parameters:
normalized_data (pd.DataFrame) – The normalized data to denormalize.
scale_factor (dict) – The scale factors used for denormalization.
- Returns:
data – The denormalized data.
- Return type:
pd.DataFrame
- destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset [source]
Destandarize data using provided scaler. More info in bluemath_tk.core.operations.destandarize.
- Parameters:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.
- Returns:
data – Destandarized data.
- Return type:
np.ndarray, pd.DataFrame or xr.Dataset
- static get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray [source]
This method calculates the degrees from the u and v components.
- Here, we assume u and v represent angles between 0 and 360 degrees,
where 0° is the North direction, and increasing clockwise.
- (u=0, v=1)
- (u=-1, v=0) <———> (u=1, v=0)
(u=0, v=-1)
- Parameters:
xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.
- Returns:
The degrees.
- Return type:
np.ndarray
- static get_metrics(data1: DataFrame | Dataset, data2: DataFrame | Dataset) DataFrame [source]
Gets the metrics of the model.
- Parameters:
data1 (pd.DataFrame or xr.Dataset) – The first dataset.
data2 (pd.DataFrame or xr.Dataset) – The second dataset.
- Returns:
metrics – The metrics of the model.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the DataFrames or Datasets have different shapes.
TypeError – If the inputs are not both DataFrames or both xarray Datasets.
- get_num_processors_available() int [source]
Gets the number of processors available.
- Returns:
The number of processors available.
- Return type:
int
- static get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray] [source]
This method calculates the u and v components for the given directional data.
- Here, we assume that the directional data is in degrees,
beign 0° the North direction, and increasing clockwise.
- 0° N
- 270° W <———> 90° E
90° S
- Parameters:
x_deg (np.ndarray) – The directional data in degrees.
- Returns:
The u and v components.
- Return type:
Tuple[np.ndarray, np.ndarray]
- gravity = 9.80665
- list_class_attributes() list [source]
Lists the attributes of the class.
- Returns:
The attributes of the class.
- Return type:
list
- list_class_methods() list [source]
Lists the methods of the class.
- Returns:
The methods of the class.
- Return type:
list
- load_model(model_path: str) BlueMathModel [source]
Loads the model from a file.
- property logger: Logger
- normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}) Tuple[DataFrame | Dataset, dict] [source]
Normalize data to 0-1 using min max scaler approach. More info in bluemath_tk.core.operations.normalize.
- Parameters:
data (pd.DataFrame or xr.Dataset) – The data to normalize.
custom_scale_factor (dict, optional) – Custom scale factors for normalization.
- Returns:
normalized_data (pd.DataFrame or xr.Dataset) – The normalized data.
scale_factor (dict) – The scale factors used for normalization.
- parallel_execute(func: Callable, items: List[Any], num_workers: int, cpu_intensive: bool = False, **kwargs) Dict[int, Any] [source]
Execute a function in parallel using concurrent.futures.
- Parameters:
func (Callable) – Function to execute for each item.
items (List[Any]) – List of items to process.
num_workers (int) – Number of parallel workers.
cpu_intensive (bool, optional) – Whether the function is CPU intensive. Default is False.
**kwargs (dict) – Additional keyword arguments for func.
- Returns:
Dictionary with the results of the function execution. The keys are the indices of the items in the original list. The values are the results of the function execution.
- Return type:
Dict[int, Any]
Warning
When using ThreadPoolExecutor, the function sometimes fails when reading / writing
to the same / different files. Might be the GIL (Global Interpreter Lock) in Python. - cpu_intensive = True does not work with non-pickable objects (Under development).
- save_model(model_path: str, exclude_attributes: List[str] = None) None [source]
Saves the model to a file.
- set_logger_name(name: str, level: str = 'INFO', console: bool = True) None [source]
Sets the name of the logger.
- set_num_processors_to_use(num_processors: int) None [source]
Sets the number of processors to use for parallel processing.
- Parameters:
num_processors (int) – The number of processors to use. If -1, all available processors will be used.
- set_omp_num_threads(num_threads: int) None [source]
Sets the number of threads for OpenMP.
- Parameters:
num_threads (int) – The number of threads.
Warning
This methos is under development.
- standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler] [source]
Standarize data using StandardScaler. More info in bluemath_tk.core.operations.standarize.
- Parameters:
data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.
scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.
transform (bool) – Whether to just transform the data. Default to False.
- Returns:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.
scaler (StandardScaler) – Scaler object used for standarization.
bluemath_tk.core.operations module
- bluemath_tk.core.operations.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray] [source]
This method converts Longitude and Latitude to UTM coordinates.
- Parameters:
lon (np.ndarray) – The longitude values.
lat (np.ndarray) – The latitude values.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.
- Returns:
The x and y coordinates in UTM.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.operations.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray] [source]
This method converts UTM coordinates to Longitude and Latitude.
- Parameters:
utm_x (np.ndarray) – The x values in UTM.
utm_y (np.ndarray) – The y values in UTM.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.
- Returns:
The longitude and latitude values.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.operations.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) DataFrame | Dataset [source]
Denormalize data using provided scale_factor.
- Parameters:
normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.
- Returns:
data – Denormalized data.
- Return type:
pd.DataFrame or xr.Dataset
Notes
This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.
The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.
Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.
Examples
>>> import numpy as np >>> import pandas as pd >>> from bluemath_tk.core.operation import denormalize >>> df = pd.DataFrame( ... { ... "Hs": np.random.rand(1000), ... "Tp": np.random.rand(1000), ... "Dir": np.random.rand(1000), ... } ... ) >>> scale_factor = { ... "Hs": [0, 7], ... "Tp": [0, 20], ... "Dir": [0, 360], ... } >>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor) >>> import numpy as np >>> import xarray as xr >>> from bluemath_tk.core.operations import denormalize >>> ds = xr.Dataset( ... { ... "Hs": (("time",), np.random.rand(1000)), ... "Tp": (("time",), np.random.rand(1000)), ... "Dir": (("time",), np.random.rand(1000)), ... }, ... coords={"time": pd.date_range("2000-01-01", periods=1000)}, ... ) >>> scale_factor = { ... "Hs": [0, 7], ... "Tp": [0, 20], ... "Dir": [0, 360], ... } >>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)
- bluemath_tk.core.operations.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset [source]
Destandarize data using provided scaler.
- Parameters:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.
- Returns:
Destandarized data.
- Return type:
np.ndarray, pd.DataFrame or xr.Dataset
Examples
>>> import numpy as np >>> from bluemath_tk.core.data import standarize, destandarize >>> data = np.random.rand(1000, 3) * 10.0 >>> standarized_data, scaler = standarize(data=data) >>> data = destandarize(standarized_data=standarized_data, scaler=scaler)
- bluemath_tk.core.operations.get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray [source]
This method calculates the degrees from the u and v components.
- Here, we assume u and v represent angles between 0 and 360 degrees,
where 0° is the North direction, and increasing clockwise.
- (u=0, v=1)
- (u=-1, v=0) <———> (u=1, v=0)
(u=0, v=-1)
- Parameters:
xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.
- Returns:
The degrees.
- Return type:
np.ndarray
- bluemath_tk.core.operations.get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray] [source]
This method calculates the u and v components for the given directional data.
- Here, we assume that the directional data is in degrees,
beign 0° the North direction, and increasing clockwise.
0° N | |
- 270° W <———> 90° E
90° S
- Parameters:
x_deg (np.ndarray) – The directional data in degrees.
- Returns:
The u and v components.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.operations.mathematical_to_nautical(math_degrees: ndarray) ndarray [source]
Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)
- Parameters:
math_degrees (float or array-like) – Directional angle in mathematical convention
- Returns:
Directional angle in nautical convention
- Return type:
np.ndarray
- bluemath_tk.core.operations.nautical_to_mathematical(nautical_degrees: ndarray) ndarray [source]
Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)
- Parameters:
nautical_degrees (np.ndarray) – Directional angle in nautical convention
- Returns:
Directional angle in mathematical convention
- Return type:
np.ndarray
- bluemath_tk.core.operations.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) Tuple[DataFrame | Dataset, dict] [source]
Normalize data to 0-1 using min max scaler approach.
- Parameters:
data (pd.DataFrame or xr.Dataset) – Input data to be normalized.
custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.
logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.
- Returns:
normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.
Notes
This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.
The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.
If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.
Examples
>>> import numpy as np >>> import pandas as pd >>> from bluemath_tk.core.operations import normalize >>> df = pd.DataFrame( ... { ... "Hs": np.random.rand(1000) * 7, ... "Tp": np.random.rand(1000) * 20, ... "Dir": np.random.rand(1000) * 360, ... } ... ) >>> normalized_data, scale_factor = normalize(data=df) >>> import numpy as np >>> import xarray as xr >>> from bluemath_tk.core.data import normalize >>> ds = xr.Dataset( ... { ... "Hs": (("time",), np.random.rand(1000) * 7), ... "Tp": (("time",), np.random.rand(1000) * 20), ... "Dir": (("time",), np.random.rand(1000) * 360), ... }, ... coords={"time": pd.date_range("2000-01-01", periods=1000)}, ... ) >>> normalized_data, scale_factor = normalize(data=ds)
- bluemath_tk.core.operations.spatial_gradient(data: DataArray) DataArray [source]
Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).
- Parameters:
data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).
- Returns:
Gradient magnitude with same dimensions as input.
- Return type:
xr.DataArray
Notes
The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.
- bluemath_tk.core.operations.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler] [source]
Standarize data to have mean 0 and std 1.
- Parameters:
data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.
scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.
transform (bool) – Whether to just transform the data. Default to False.
- Returns:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.
scaler (StandardScaler) – Scaler object used for standarization.
Examples
>>> import numpy as np >>> from bluemath_tk.core.operations import standarize >>> data = np.random.rand(1000, 3) * 10.0 >>> standarized_data, scaler = standarize(data=data)
Module contents
Project: BlueMath_tk Sub-Module: core Author: GeoOcean Research Group, Universidad de Cantabria Creation Date: 9 December 2024 Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)
- bluemath_tk.core.convert_lonlat_to_utm(lon: ndarray, lat: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray] [source]
This method converts Longitude and Latitude to UTM coordinates.
- Parameters:
lon (np.ndarray) – The longitude values.
lat (np.ndarray) – The latitude values.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.
- Returns:
The x and y coordinates in UTM.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.convert_utm_to_lonlat(utm_x: ndarray, utm_y: ndarray, projection: int | str | dict | CRS) Tuple[ndarray, ndarray] [source]
This method converts UTM coordinates to Longitude and Latitude.
- Parameters:
utm_x (np.ndarray) – The x values in UTM.
utm_y (np.ndarray) – The y values in UTM.
projection (int, str, dict, pyproj.CRS) – The projection to use for the transformation.
- Returns:
The longitude and latitude values.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.denormalize(normalized_data: DataFrame | Dataset, scale_factor: dict) DataFrame | Dataset [source]
Denormalize data using provided scale_factor.
- Parameters:
normalized_data (pd.DataFrame or xr.Dataset) – Input data that has been normalized and needs to be denormalized.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to denormalize the variable.
- Returns:
data – Denormalized data.
- Return type:
pd.DataFrame or xr.Dataset
Notes
This method does not modify the input data, it creates a copy of the dataframe / dataset and denormalizes it.
The denormalization is done variable by variable, i.e. the minimum and maximum values are used to scale the data back to its original range.
Assumes that the scale_factor dictionary contains appropriate min and max values for each variable in the normalized_data.
Examples
>>> import numpy as np >>> import pandas as pd >>> from bluemath_tk.core.operation import denormalize >>> df = pd.DataFrame( ... { ... "Hs": np.random.rand(1000), ... "Tp": np.random.rand(1000), ... "Dir": np.random.rand(1000), ... } ... ) >>> scale_factor = { ... "Hs": [0, 7], ... "Tp": [0, 20], ... "Dir": [0, 360], ... } >>> denormalized_data = denormalize(normalized_data=df, scale_factor=scale_factor) >>> import numpy as np >>> import xarray as xr >>> from bluemath_tk.core.operations import denormalize >>> ds = xr.Dataset( ... { ... "Hs": (("time",), np.random.rand(1000)), ... "Tp": (("time",), np.random.rand(1000)), ... "Dir": (("time",), np.random.rand(1000)), ... }, ... coords={"time": pd.date_range("2000-01-01", periods=1000)}, ... ) >>> scale_factor = { ... "Hs": [0, 7], ... "Tp": [0, 20], ... "Dir": [0, 360], ... } >>> denormalized_data = denormalize(normalized_data=ds, scale_factor=scale_factor)
- bluemath_tk.core.destandarize(standarized_data: ndarray | DataFrame | Dataset, scaler: StandardScaler) ndarray | DataFrame | Dataset [source]
Destandarize data using provided scaler.
- Parameters:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data to be destandarized.
scaler (StandardScaler) – Scaler object used for standarization.
- Returns:
Destandarized data.
- Return type:
np.ndarray, pd.DataFrame or xr.Dataset
Examples
>>> import numpy as np >>> from bluemath_tk.core.data import standarize, destandarize >>> data = np.random.rand(1000, 3) * 10.0 >>> standarized_data, scaler = standarize(data=data) >>> data = destandarize(standarized_data=standarized_data, scaler=scaler)
- bluemath_tk.core.get_degrees_from_uv(xu: ndarray, xv: ndarray) ndarray [source]
This method calculates the degrees from the u and v components.
- Here, we assume u and v represent angles between 0 and 360 degrees,
where 0° is the North direction, and increasing clockwise.
- (u=0, v=1)
- (u=-1, v=0) <———> (u=1, v=0)
(u=0, v=-1)
- Parameters:
xu (np.ndarray) – The u component.
xv (np.ndarray) – The v component.
- Returns:
The degrees.
- Return type:
np.ndarray
- bluemath_tk.core.get_uv_components(x_deg: ndarray) Tuple[ndarray, ndarray] [source]
This method calculates the u and v components for the given directional data.
- Here, we assume that the directional data is in degrees,
beign 0° the North direction, and increasing clockwise.
0° N | |
- 270° W <———> 90° E
90° S
- Parameters:
x_deg (np.ndarray) – The directional data in degrees.
- Returns:
The u and v components.
- Return type:
Tuple[np.ndarray, np.ndarray]
- bluemath_tk.core.mathematical_to_nautical(math_degrees: ndarray) ndarray [source]
Convert mathematical degrees (0° at East, counterclockwise) to nautical degrees (0° at North, clockwise)
- Parameters:
math_degrees (float or array-like) – Directional angle in mathematical convention
- Returns:
Directional angle in nautical convention
- Return type:
np.ndarray
- bluemath_tk.core.nautical_to_mathematical(nautical_degrees: ndarray) ndarray [source]
Convert nautical degrees (0° at North, clockwise) to mathematical degrees (0° at East, counterclockwise)
- Parameters:
nautical_degrees (np.ndarray) – Directional angle in nautical convention
- Returns:
Directional angle in mathematical convention
- Return type:
np.ndarray
- bluemath_tk.core.normalize(data: DataFrame | Dataset, custom_scale_factor: dict = {}, logger: Logger = None) Tuple[DataFrame | Dataset, dict] [source]
Normalize data to 0-1 using min max scaler approach.
- Parameters:
data (pd.DataFrame or xr.Dataset) – Input data to be normalized.
custom_scale_factor (dict, optional) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable. If not provided, the minimum and maximum values of the variable are used.
logger (logging.Logger, optional) – Logger object to log warnings if the custom min or max is bigger or lower than the datapoints.
- Returns:
normalized_data (pd.DataFrame or xr.Dataset) – Normalized data.
scale_factor (dict) – Dictionary with variables as keys and a list with two values as values. The first value is the minimum and the second value is the maximum used to normalize the variable.
Notes
This method does not modify the input data, it creates a copy of the dataframe / dataset and normalizes it.
The normalization is done variable by variable, i.e. the minimum and maximum values are calculated for each variable.
If custom min or max is bigger or lower than the datapoints, it will be changed to the minimum or maximum of the datapoints and a warning will be logged.
Examples
>>> import numpy as np >>> import pandas as pd >>> from bluemath_tk.core.operations import normalize >>> df = pd.DataFrame( ... { ... "Hs": np.random.rand(1000) * 7, ... "Tp": np.random.rand(1000) * 20, ... "Dir": np.random.rand(1000) * 360, ... } ... ) >>> normalized_data, scale_factor = normalize(data=df) >>> import numpy as np >>> import xarray as xr >>> from bluemath_tk.core.data import normalize >>> ds = xr.Dataset( ... { ... "Hs": (("time",), np.random.rand(1000) * 7), ... "Tp": (("time",), np.random.rand(1000) * 20), ... "Dir": (("time",), np.random.rand(1000) * 360), ... }, ... coords={"time": pd.date_range("2000-01-01", periods=1000)}, ... ) >>> normalized_data, scale_factor = normalize(data=ds)
- bluemath_tk.core.setup_dask_client(n_workers: int = None, memory_limit: str = 0.5)[source]
Setup a Dask client with controlled resources.
- Parameters:
n_workers (int, optional) – Number of workers. Default is None.
memory_limit (str, optional) – Memory limit per worker. Default is 0.5.
- Returns:
Dask distributed client
- Return type:
Client
Notes
Resources might vary depending on the hardware and the load of the machine. Be very careful when setting the number of workers and memory limit, as it might affect the performance of the machine, or in the worse case scenario, the performance of other users in the same machine (cluster case).
- bluemath_tk.core.spatial_gradient(data: DataArray) DataArray [source]
Calculate spatial gradient of a DataArray with dimensions (time, latitude, longitude).
- Parameters:
data (xr.DataArray) – Input data with dimensions (time, latitude, longitude).
- Returns:
Gradient magnitude with same dimensions as input.
- Return type:
xr.DataArray
Notes
The gradient is calculated using central differences, accounting for latitude-dependent grid spacing in spherical coordinates.
- bluemath_tk.core.standarize(data: ndarray | DataFrame | Dataset, scaler: StandardScaler = None, transform: bool = False) Tuple[ndarray | DataFrame | Dataset, StandardScaler] [source]
Standarize data to have mean 0 and std 1.
- Parameters:
data (np.ndarray, pd.DataFrame or xr.Dataset) – Input data to be standarized.
scaler (StandardScaler, optional) – Scaler object to use for standarization. Default is None.
transform (bool) – Whether to just transform the data. Default to False.
- Returns:
standarized_data (np.ndarray, pd.DataFrame or xr.Dataset) – Standarized data.
scaler (StandardScaler) – Scaler object used for standarization.
Examples
>>> import numpy as np >>> from bluemath_tk.core.operations import standarize >>> data = np.random.rand(1000, 3) * 10.0 >>> standarized_data, scaler = standarize(data=data)