bluemath_tk.downloaders package

Subpackages

Module contents

Project: BlueMath_tk Sub-Module: downloaders Author: GeoOcean Research Group, Universidad de Cantabria Repository: https://github.com/GeoOcean/BlueMath_tk.git Status: Under development (Working)

class bluemath_tk.downloaders.CopernicusDownloader(product: str, base_path_to_download: str, token: str = None, debug: bool = True, check: bool = True)[source]

Bases: BaseDownloader

This is the main class to download data from the Copernicus Climate Data Store.

product

The product to download data from. Currently only ERA5 is supported.

Type:

str

product_config

The configuration for the product to download data from.

Type:

dict

client

The client to interact with the Copernicus Climate Data Store API.

Type:

cdsapi.Client

Examples

from bluemath_tk.downloaders.copernicus.copernicus_downloader import CopernicusDownloader

copernicus_downloader = CopernicusDownloader(
    product="ERA5",
    base_path_to_download="/path/to/Copernicus/",  # Will be created if not available
    token=None,
    check=True,
)
result = copernicus_downloader.download_data_era5(
    variables=["swh"],
    years=["2020"],
    months=["01", "03"],
)
print(result)

            Fully downloaded files:
            
            Not fully downloaded files:
            /path/to/Copernicus/ERA5/reanalysis-era5-single-levels/ocean/reanalysis/significant_height_of_combined_wind_waves_and_swell/swh_2020_01_03.nc
            Error files:
            
        
property client: Client
download_data(*args, **kwargs) str[source]

Downloads the data for the product.

Parameters:
  • *args – The arguments to pass to the download function.

  • **kwargs – The keyword arguments to pass to the download function.

Returns:

The message with the fully downloaded files and the not fully downloaded files.

Return type:

str

Raises:

ValueError – If the product is not supported.

download_data_era5(variables: List[str], years: List[str], months: List[str], days: List[str] = None, times: List[str] = None, area: List[float] = None, product_type: str = 'reanalysis', data_format: str = 'netcdf', download_format: str = 'unarchived', force: bool = False) str[source]

Downloads the data for the ERA5 product.

Parameters:
  • variables (List[str]) – The variables to download. If not provided, all variables in self.product_config will be downloaded.

  • years (List[str]) – The years to download. Years are downloaded one by one.

  • months (List[str]) – The months to download. Months are downloaded together.

  • days (List[str], optional) – The days to download. If None, all days in the month will be downloaded. Default is None.

  • times (List[str], optional) – The times to download. If None, all times in the day will be downloaded. Default is None.

  • area (List[float], optional) – The area to download. If None, the whole globe will be downloaded. Default is None.

  • product_type (str, optional) – The product type to download. Default is “reanalysis”.

  • data_format (str, optional) – The data format to download. Default is “netcdf”.

  • download_format (str, optional) – The download format to use. Default is “unarchived”.

  • force (bool, optional) – Whether to force the download. Default is False.

Returns:

The message with the fully downloaded files and the not fully downloaded files. Error files are also included.

Return type:

str

list_datasets() List[str][source]

Lists the datasets available for the product.

Returns:

The list of datasets available for the product.

Return type:

List[str]

list_variables(type: str = None) List[str][source]

Lists the variables available for the product. Filtering by type if provided.

Parameters:

type (str, optional) – The type of variables to list. Default is None.

Returns:

The list of variables available for the product.

Return type:

List[str]

property product: str
property product_config: dict
products_configs = {'ERA5': {'datasets': {'reanalysis-era5-complete': {'description': 'ERA5 Complete Dataset', 'mandatory_fields': ['class', 'date', 'direction', 'domain', 'expver', 'frequency', 'param', 'stream', 'time', 'type', 'area', 'format', 'grid'], 'optional_fields': [], 'template': {'class': 'ea', 'direction': '1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24', 'domain': 'g', 'expver': '1', 'format': 'netcdf', 'frequency': '1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30', 'grid': '0.5/0.5', 'param': '251.140', 'stream': 'wave', 'time': '00:00:00/01:00:00/02:00:00/03:00:00/04:00:00/05:00:00/06:00:00/07:00:00/08:00:00/09:00:00/10:00:00/11:00:00/12:00:00/13:00:00/14:00:00/15:00:00/16:00:00/17:00:00/18:00:00/19:00:00/20:00:00/21:00:00/22:00:00/23:00:00', 'type': 'an'}, 'types': ['ocean'], 'url': 'https://cds.climate.copernicus.eu/datasets/reanalysis-era5-complete?tab=overview'}, 'reanalysis-era5-pressure-levels': {'description': 'ERA5 Pressure Levels', 'mandatory_fields': ['product_type', 'variable', 'year', 'month', 'pressure_level', 'data_format', 'download_format'], 'optional_fields': ['day', 'time', 'area'], 'template': {'data_format': 'netcdf', 'day': ['01'], 'download_format': 'unarchived', 'month': ['01'], 'pressure_level': ['500'], 'product_type': ['reanalysis'], 'time': ['00:00'], 'variable': ['geopotential'], 'year': ['2019']}, 'types': ['pressure'], 'url': 'https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=overview'}, 'reanalysis-era5-single-levels': {'description': 'ERA5 Single Levels', 'mandatory_fields': ['product_type', 'variable', 'year', 'month', 'data_format', 'download_format'], 'optional_fields': ['day', 'time', 'area'], 'template': {'data_format': 'netcdf', 'day': ['01'], 'download_format': 'unarchived', 'month': ['01'], 'product_type': ['reanalysis'], 'time': ['00:00'], 'variable': ['significant_wave_height_of_first_swell_partition'], 'year': ['2019']}, 'types': ['atmosphere', 'ocean'], 'url': 'https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form'}}, 'variables': {'dwww': {'cds_name': 'wave_spectral_directional_width_for_wind_waves', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Wave spectral directional width for wind waves', 'nc_name': 'dwww', 'type': 'ocean', 'units': 'degrees'}, 'mdww': {'cds_name': 'mean_direction_of_wind_waves', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean direction of wind waves', 'nc_name': 'mdww', 'type': 'ocean', 'units': 'degrees'}, 'mpww': {'cds_name': 'mean_period_of_wind_waves', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean period of wind waves', 'nc_name': 'mpww', 'type': 'ocean', 'units': 's'}, 'mwd': {'cds_name': 'mean_wave_direction', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave direction', 'nc_name': 'mwd', 'type': 'ocean', 'units': 'degrees'}, 'mwp': {'cds_name': 'mean_wave_period', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave period', 'nc_name': 'mwp', 'type': 'ocean', 'units': 's'}, 'p140121': {'cds_name': 'significant_wave_height_of_first_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Significant wave height of first swell partition', 'nc_name': 'p140121', 'type': 'ocean', 'units': 'm'}, 'p140122': {'cds_name': 'mean_wave_direction_of_first_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave direction of first swell partition', 'nc_name': 'p140122', 'type': 'ocean', 'units': 'degrees'}, 'p140123': {'cds_name': 'mean_wave_period_of_first_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave period of first swell partition', 'nc_name': 'p140123', 'type': 'ocean', 'units': 's'}, 'p140124': {'cds_name': 'significant_wave_height_of_second_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Significant wave height of second swell partition', 'nc_name': 'p140124', 'type': 'ocean', 'units': 'm'}, 'p140125': {'cds_name': 'mean_wave_direction_of_second_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave direction of second swell partition', 'nc_name': 'p140125', 'type': 'ocean', 'units': 'degrees'}, 'p140126': {'cds_name': 'mean_wave_period_of_second_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave period of second swell partition', 'nc_name': 'p140126', 'type': 'ocean', 'units': 's'}, 'p140127': {'cds_name': 'significant_wave_height_of_third_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Significant wave height of third swell partition', 'nc_name': 'p140127', 'type': 'ocean', 'units': 'm'}, 'p140128': {'cds_name': 'mean_wave_direction_of_third_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave direction of third swell partition', 'nc_name': 'p140128', 'type': 'ocean', 'units': 'degrees'}, 'p140129': {'cds_name': 'mean_wave_period_of_third_swell_partition', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Mean wave period of third swell partition', 'nc_name': 'p140129', 'type': 'ocean', 'units': 's'}, 'pp1d': {'cds_name': 'peak_wave_period', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Peak wave period', 'nc_name': 'pp1d', 'type': 'ocean', 'units': 's'}, 'shww': {'cds_name': 'significant_height_of_wind_waves', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Significant height of wind waves', 'nc_name': 'shww', 'type': 'ocean', 'units': 'm'}, 'spectra': {'cds_name': 'full_wave_spectra', 'dataset': 'reanalysis-era5-complete', 'long_name': 'Full wave spectra', 'nc_name': 'spectra', 'type': 'ocean', 'units': 'm^2/Hz·radian'}, 'swh': {'cds_name': 'significant_height_of_combined_wind_waves_and_swell', 'dataset': 'reanalysis-era5-single-levels', 'long_name': 'Significant height of combined wind waves and swell', 'nc_name': 'swh', 'type': 'ocean', 'units': 'm'}}}}
show_markdown_table() None[source]

Create a Markdown table from the configuration dictionary and print it.

class bluemath_tk.downloaders.NOAADownloader(base_path_to_download: str, debug: bool = True, check: bool = False)[source]

Bases: BaseDownloader

This is the main class to download and read data from NOAA.

config

The configuration for NOAA data sources loaded from JSON file.

Type:

dict

base_path_to_download

Base path where the data is stored.

Type:

Path

debug

Whether to run in debug mode.

Type:

bool

Examples

from bluemath_tk.downloaders.noaa.noaa_downloader import NOAADownloader

noaa_downloader = NOAADownloader(
    base_path_to_download="/path/to/NOAA/",  # Will be created if not available
    debug=True,
    check=True,
)

# Download buoy bulk parameters and load DataFrame
result = noaa_downloader.download_data(
    data_type="bulk_parameters",
    buoy_id="41001",
    years=[2020, 2021, 2022],
    load_df=True
)
print(result)
None
config = {'data_types': {'bulk_parameters': {'columns': ['YYYY', 'MM', 'DD', 'hh', 'mm', 'WD', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'BAR', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'], 'dataset': 'buoy_data', 'description': 'Wind, wave, temperature, and pressure measurements', 'fallback_urls': ['view_text_file.php?filename={buoy_id}h{year}.txt.gz&dir=data/historical/stdmet/', 'stdmet/{year}/{buoy_id}h{year}.txt.gz'], 'file_format': 'txt.gz', 'long_name': 'Standard Meteorological Data', 'name': 'bulk_parameters', 'url_pattern': 'historical/stdmet/{buoy_id}h{year}.txt.gz'}, 'directional_spectra': {'coefficients': {'d': {'name': 'alpha1', 'url_pattern': 'historical/swdir/{buoy_id}d{year}.txt.gz'}, 'i': {'name': 'alpha2', 'url_pattern': 'historical/swdir2/{buoy_id}i{year}.txt.gz'}, 'j': {'name': 'r1', 'url_pattern': 'historical/swr1/{buoy_id}j{year}.txt.gz'}, 'k': {'name': 'r2', 'url_pattern': 'historical/swr2/{buoy_id}k{year}.txt.gz'}, 'w': {'name': 'c11', 'url_pattern': 'historical/swden/{buoy_id}w{year}.txt.gz'}}, 'dataset': 'buoy_data', 'description': 'Fourier coefficients for directional wave spectra', 'file_format': 'txt.gz', 'long_name': 'Directional Wave Spectra Coefficients', 'name': 'directional_spectra'}, 'wave_spectra': {'dataset': 'buoy_data', 'description': 'Wave energy density spectra', 'file_format': 'txt.gz', 'long_name': 'Wave Spectral Density', 'name': 'wave_spectra', 'url_pattern': 'historical/swden/{buoy_id}w{year}.txt.gz'}, 'wind_forecast': {'dataset': 'forecast_data', 'description': 'Wind speed and direction forecast from GFS model', 'file_format': 'netcdf', 'long_name': 'GFS Wind Forecast', 'name': 'wind_forecast', 'output_variables': {'u10': 'ugrd10m', 'v10': 'vgrd10m'}, 'variables': ['ugrd10m', 'vgrd10m']}}, 'datasets': {'buoy_data': {'base_url': 'https://www.ndbc.noaa.gov/data', 'description': 'Historical buoy measurements from NDBC', 'mandatory_fields': ['buoy_id', 'year'], 'name': 'NOAA Buoy Data', 'template': {'buoy_id': None, 'data_type': 'bulk_parameters', 'year': None}}, 'forecast_data': {'base_url': 'https://nomads.ncep.noaa.gov/dods/gfs_0p25_1hr', 'description': 'GFS 0.25 degree forecast data', 'mandatory_fields': ['date'], 'name': 'NOAA GFS Forecast Data', 'template': {'date': None}}}}
property data_types: dict
property datasets: dict
download_data(data_type: str, load_df: bool = False, **kwargs) DataFrame | Dataset | str[source]

Downloads the data for the specified data type.

Parameters:
  • data_type (str) – The data type to download. - ‘bulk_parameters’ - ‘wave_spectra’ - ‘directional_spectra’ - ‘wind_forecast’

  • load_df (bool, optional) – Whether to load and return the DataFrame after downloading. Default is False. If True and multiple years are specified, all years will be combined into a single DataFrame.

  • **kwargs – Additional keyword arguments specific to each data type.

Returns:

Downloaded data or status message.

Return type:

Union[pd.DataFrame, xr.Dataset, str]

Raises:

ValueError – If the data type is not supported.

list_data_types() List[str][source]

Lists the available data types.

Returns:

The list of available data types.

Return type:

List[str]

list_datasets() List[str][source]

Lists the available datasets.

Returns:

The list of available datasets.

Return type:

List[str]

read_bulk_parameters(buoy_id: str, years: int | List[int]) DataFrame | None[source]

Read bulk parameters for a specific buoy and year(s).

Parameters:
  • buoy_id (str) – The buoy ID.

  • years (Union[int, List[int]]) – The year(s) to read data for. Can be a single year or a list of years.

Returns:

DataFrame containing the bulk parameters, or None if data not found.

Return type:

Optional[pd.DataFrame]

read_directional_spectra(buoy_id: str, years: int | List[int]) Tuple[DataFrame | None, ...][source]

Read directional spectra data for a specific buoy and year(s).

Parameters:
  • buoy_id (str) – The buoy ID

  • years (Union[int, List[int]]) – The year(s) to read data for. Can be a single year or a list of years.

Returns:

Tuple containing DataFrames for alpha1, alpha2, r1, r2, and c11, or None for each if data not found

Return type:

Tuple[Optional[pd.DataFrame], …]

read_wave_spectra(buoy_id: str, years: int | List[int]) DataFrame | None[source]

Read wave spectra data for a specific buoy and year(s).

Parameters:
  • buoy_id (str) – The buoy ID.

  • years (Union[int, List[int]]) – The year(s) to read data for. Can be a single year or a list of years.

Returns:

DataFrame containing the wave spectra, or None if data not found

Return type:

Optional[pd.DataFrame]

show_markdown_table() None[source]

Create a Markdown table from the configuration dictionary and print it.