MDA
Bases: BaseClustering
Maximum Dissimilarity Algorithm (MDA) class.
This class performs the MDA algorithm on a given dataframe.
Attributes: |
|
---|
Methods:
Name | Description |
---|---|
fit |
Fit the MDA algorithm to the provided data. |
predict |
Predict the nearest centroid for the provided data. |
fit_predict |
Fits the MDA model to the data and predicts the nearest centroids. |
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.datamining.mda import MDA
>>> data = pd.DataFrame(
... {
... 'Hs': np.random.rand(1000) * 7,
... 'Tp': np.random.rand(1000) * 20,
... 'Dir': np.random.rand(1000) * 360
... }
... )
>>> mda = MDA(num_centers=10)
>>> nearest_centroids_idxs, nearest_centroids_df = mda.fit_predict(
... data=data,
... directional_variables=['Dir'],
... )
__init__(num_centers)
Initializes the MDA class.
Parameters: |
|
---|
Raises: |
|
---|
fit(data, directional_variables=[], custom_scale_factor={}, first_centroid_seed=None)
Fit the Maximum Dissimilarity Algorithm (MDA) to the provided data.
This method initializes centroids for the MDA algorithm using the provided dataframe, directional variables, and custom scale factor. It normalizes the data, iteratively selects centroids based on maximum dissimilarity, and denormalizes the centroids before returning them.
Parameters: |
|
---|
Notes
- The function assumes that the data is validated by the
validate_data_mda
decorator before execution. - When first_centroid_seed is not provided, max value centroid is used.
fit_predict(data, directional_variables=[], custom_scale_factor={}, first_centroid_seed=None)
Fits the MDA model to the data and predicts the nearest centroids.
Parameters: |
|
---|
Returns: |
|
---|
predict(data)
Predict the nearest centroid for the provided data.
Parameters: |
|
---|
Returns: |
|
---|
MDAError
Bases: Exception
Custom exception for MDA class.
KMA
Bases: BaseClustering
K-Means (KMA) class.
This class performs the K-Means algorithm on a given dataframe.
Attributes: |
|
---|
Methods:
Name | Description |
---|---|
fit |
Fit the K-Means algorithm to the provided data. |
predict |
Predict the nearest centroid for the provided data. |
fit_predict |
Fit the K-Means algorithm to the provided data and predict the nearest centroid for each data point. |
Notes
- The K-Means algorithm is used to cluster data points into k clusters.
- The K-Means algorithm is sensitive to the initial centroids.
- The K-Means algorithm is not suitable for large datasets.
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.datamining.kma import KMA
>>> data = pd.DataFrame(
... {
... 'Hs': np.random.rand(1000) * 7,
... 'Tp': np.random.rand(1000) * 20,
... 'Dir': np.random.rand(1000) * 360
... }
... )
>>> kma = KMA(num_clusters=5)
>>> nearest_centroids_idxs, nearest_centroids_df = kma.fit_predict(
... data=data,
... directional_variables=['Dir'],
... )
TODO
- Add customization for the K-Means algorithm.
__init__(num_clusters, seed=None, init='k-means++', n_init='auto', algorithm='lloyd')
Initializes the KMA class.
Parameters: |
|
---|
Raises: |
|
---|
fit(data, directional_variables=[], custom_scale_factor={})
Fit the K-Means algorithm to the provided data.
This method initializes centroids for the K-Means algorithm using the provided dataframe and custom scale factor. It normalizes the data, and returns the calculated centroids.
Parameters: |
|
---|
Notes
- The function assumes that the data is validated by the
validate_data_kma
decorator before execution.
fit_predict(data, directional_variables=[], custom_scale_factor={})
Fit the K-Means algorithm to the provided data and predict the nearest centroid for each data point.
Parameters: |
|
---|
Returns: |
|
---|
predict(data)
Predict the nearest centroid for the provided data.
Parameters: |
|
---|
Returns: |
|
---|
KMAError
Bases: Exception
Custom exception for KMA class.
SOM
Bases: BaseClustering
Self-Organizing Map (SOM) class.
This class performs the Self-Organizing Map algorithm on a given dataframe.
Attributes: |
|
---|
Methods:
Name | Description |
---|---|
activation_response |
Returns the activation response of the given data. |
get_centroids_probs_for_labels |
Returns the labels map of the given data. |
plot_centroids_probs_for_labels |
Plots the labels map of the given data. |
fit |
Fits the SOM model to the provided data. |
predict |
Predicts the nearest centroid for the provided data. |
fit_predict |
Fit the SOM algorithm to the provided data and predict the nearest centroid for each data point. |
Notes
- Check MiniSom documentation for more information: https://github.com/JustGlowing/minisom
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from bluemath_tk.datamining.som import SOM
>>> data = pd.DataFrame(
... {
... 'Hs': np.random.rand(1000) * 7,
... 'Tp': np.random.rand(1000) * 20,
... 'Dir': np.random.rand(1000) * 360
... }
... )
>>> som = SOM(som_shape=(3, 3), num_dimensions=4)
>>> nearest_centroids_idxs, nearest_centroids_df = som.fit_predict(
... data=data,
... directional_variables=['Dir'],
... )
TODO
- Add option to normalize data?
distance_map
property
Returns the distance map of the SOM.
__init__(som_shape, num_dimensions, sigma=1, learning_rate=0.5, decay_function='asymptotic_decay', neighborhood_function='gaussian', topology='rectangular', activation_distance='euclidean', random_seed=None, sigma_decay_function='asymptotic_decay')
Initializes a Self Organizing Maps.
A rule of thumb to set the size of the grid for a dimensionality reduction task is that it should contain 5*sqrt(N) neurons where N is the number of samples in the dataset to analyze.
E.g. if your dataset has 150 samples, 5*sqrt(150) = 61.23 hence a map 8-by-8 should perform well.
Parameters: |
|
---|
Raises: |
|
---|
activation_response(data=None)
Returns the activation response of the given data.
fit(data, directional_variables=[], num_iteration=1000)
Fits the SOM model to the provided data.
Parameters: |
|
---|
Notes
- The function assumes that the data is validated by the
validate_data_som
decorator before execution.
fit_predict(data, directional_variables=[], num_iteration=1000)
Fit the SOM algorithm to the provided data and predict the nearest centroid for each data point.
Parameters: |
|
---|
Returns: |
|
---|
get_centroids_probs_for_labels(data, labels)
Returns the labels map of the given data.
plot_centroids_probs_for_labels(probs_data)
Plots the labels map of the given data.
predict(data)
Predicts the nearest centroid for the provided data.
Parameters: |
|
---|
Returns: |
|
---|
SOMError
Bases: Exception
Custom exception for SOM class.