Skip to content
Snippets Groups Projects
Commit 5679e49e authored by Nicolas Barthes's avatar Nicolas Barthes
Browse files

Beginning of Documentation via DocStrings

parent d9f3b203
No related branches found
No related tags found
No related merge requests found
# Clustering Methods
## K-Means clustering
::: src.Class_Mod.KMEANS_.Sk_Kmeans
## HDBSCAN clustering
::: src.Class_Mod.HDBSCAN_Clustering.Hdbscan
# Dimensionality Reduction methods
## PCA
::: src.Class_Mod.PCA_
## UMAP
::: src.Class_Mod.UMAP_
# Welcome to NIRS Workflow documentation
This workflow aims at ...
## Samples Selection
## Dimension Reduction
## Clustering
[K-Means](Clustering.md#k-means-clustering)
[HDBSCAN](Clustering.md#hdbscan-clustering)
\ No newline at end of file
site_name: NIRS Workflow
nav:
- Home: 'index.md'
- Dimensionality Reduction: 'Dimensionality_Reduction.md'
- Clustering Methods: 'Clustering.md'
theme: theme:
name: NIRS Workflow name: material
locale: en
# custom_dir: my_theme_customizations/ features:
static_templates: - navigation.tabs
- sitemap.html - navigation.sections
include_sidebar: false - toc.integrate
\ No newline at end of file - navigation.top
- search.suggest
- search.highlight
- content.tabs.link
- content.code.annotation
- content.code.copy
language: en
palette:
- scheme: default
toggle:
icon: material/toggle-switch-off-outline
name: switch to dark mode
primary: teal
accent: purple
- scheme: slate
toggle:
icon: material/toggle-switch
name: Switch to light mode
primary: teal
accent: lime
plugins:
- mkdocstrings
copyright:
<a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">CNRS/UM CC BY 2024</a>
extra:
social:
- icon: fontawesome/brands/gitlab
link: https://src.koda.cnrs.fr/cefe/pace/nirs_workflow
\ No newline at end of file
from Packages import * from Packages import *
class Hdbscan: class Hdbscan:
"""Runs an automatic optimized sklearn.HDBSCAN clustering on Dimensionality reduced space. """Runs an automatically optimized sklearn.HDBSCAN clustering on dimensionality reduced space.
Vars:
data: the Dimensionality reduced space, raw result of the UMAP.fit() The HDBSCAN_scores_ @Property returns the cluster number of each sample (_labels) and the DBCV best score.
param_dist: the HDBSCAN optimization parameters to test
Density-Based Clustering Validation - DBCV (https://github.com/christopherjenness/DBCV/tree/master ; Returns:
Moulavi, Davoud, et al. "Density-based clustering validation." Proceedings of the 2014 SIAM _labels (pd.DataFrame): DataFrame with the cluster belonging number for each sample
International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2014.) _hdbscan_score (float): a float with the best DBCV score after optimization
is used as a metric to optimize HDBSCAN algorithm.
Functions DBCV, _core_dist, _mutual_reachability_dist, _mutual_reach_dist_graph, _mutual_reach_dist_graph,
_mutual_reach_dist_MST, _cluster_density_sparseness, _cluster_density_separation, _cluster_validity_index,
_clustering_validity_index and _get_label_members aim at DBCV computing.
_score is a dataframe with the DBCV value for each combination of param_dist. We search for the higher value and
compute an HDBSCAN with the best parameters.
The HDBSCAN_scores_ @property return the cluster number of each sample (_labels) and the DBCV best score.
Examples: Examples:
clustering = HDBSCAN((data) - clustering = HDBSCAN((data)
scores = clustering.HDBSCAN_scores_ - scores = clustering.HDBSCAN_scores_
""" """
def __init__(self, data): def __init__(self, data):
"""Initiate the HDBSCAN calculation
Args:
data (pd.DataFrame): the Dimensionality reduced space, raw result of the UMAP.fit()
param_dist (dictionary): the HDBSCAN optimization parameters to test
_score (pd.DataFrame): is a dataframe with the DBCV value for each combination of param_dist. We search for the higher value to then compute an HDBSCAN with the best parameters.
"""
# Really fast # Really fast
# self._param_dist = {'min_samples': [1], # self._param_dist = {'min_samples': [1],
# 'min_cluster_size':[5], # 'min_cluster_size':[5],
...@@ -66,8 +67,7 @@ class Hdbscan: ...@@ -66,8 +67,7 @@ class Hdbscan:
""" """
Implimentation of Density-Based Clustering Validation "DBCV" Implimentation of Density-Based Clustering Validation "DBCV"
Citation: Citation: Moulavi, Davoud, et al. "Density-based clustering validation."
Moulavi, Davoud, et al. "Density-based clustering validation."
Proceedings of the 2014 SIAM International Conference on Data Mining. Proceedings of the 2014 SIAM International Conference on Data Mining.
Society for Industrial and Applied Mathematics, 2014. Society for Industrial and Applied Mathematics, 2014.
...@@ -80,8 +80,8 @@ class Hdbscan: ...@@ -80,8 +80,8 @@ class Hdbscan:
dist_dunction (func): function to determine distance between objects dist_dunction (func): function to determine distance between objects
func args must be [np.array, np.array] where each array is a point func args must be [np.array, np.array] where each array is a point
Returns: cluster_validity (float) Returns:
score in range[-1, 1] indicating validity of clustering assignments cluster_validity (float): score in range[-1, 1] indicating validity of clustering assignments
""" """
graph = self._mutual_reach_dist_graph(X, labels, dist_function) graph = self._mutual_reach_dist_graph(X, labels, dist_function)
mst = self._mutual_reach_dist_MST(graph) mst = self._mutual_reach_dist_MST(graph)
...@@ -129,7 +129,7 @@ class Hdbscan: ...@@ -129,7 +129,7 @@ class Hdbscan:
array of all other points in object class of point i array of all other points in object class of point i
neighbors_j (np.ndarray): array of dims (n_neighbors, n_features): neighbors_j (np.ndarray): array of dims (n_neighbors, n_features):
array of all other points in object class of point j array of all other points in object class of point j
dist_dunction (func): function to determine distance between objects dist_function (func): function to determine distance between objects
func args must be [np.array, np.array] where each array is a point func args must be [np.array, np.array] where each array is a point
Returns: mutual_reachability (float) Returns: mutual_reachability (float)
......
from Packages import * from Packages import *
class Sk_Kmeans: class Sk_Kmeans:
"""K-Means clustering for Samples selection.
Returns:
inertia_ (pd.DataFrame): DataFrame with ...
x (pd.DataFrame): Initial data
clu (pd.DataFrame): Cluster name for each sample
model.cluster_centers_ (pd.DataFrame): Coordinates of the center of each cluster
"""
def __init__(self, x, max_clusters): def __init__(self, x, max_clusters):
"""Initiate the KMeans class.
Args:
x (pd.DataFrame): the original reduced data to cluster
max_cluster (Int): the max number of desired clusters.
"""
self.x = x self.x = x
self.max_clusters = max_clusters self.max_clusters = max_clusters
......
"""Here are all the classes to perform your analysis
"""
from .PCA_ import * from .PCA_ import *
from .KMEANS_ import Sk_Kmeans from .KMEANS_ import Sk_Kmeans
from .UMAP_ import Umap from .UMAP_ import Umap
......
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
This is a webapp with Streamlit. This is a webapp with Streamlit.
GUI shows whatever is needed for Samples Selection based on NIRS spectra and then, to compute a model to predict GUI shows whatever is needed for Samples Selection based on NIRS spectra and then, to compute a model to predict
chemical values on your samples. chemical values on your samples.
Examples: Examples:
streamlit run ./app.py streamlit run ./app.py
""" """
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment