Page

Table Of Contents


Built with 🛠 MkDocs - Theme 🖤 Github.

Unsupervised Learning Module

Overview

The Unsupervised Learning module in Neurenix provides a comprehensive suite of algorithms and tools for learning patterns and structures from unlabeled data. Built on Neurenix's high-performance multi-language architecture, this module delivers efficient implementations of classical and state-of-the-art unsupervised learning techniques.

The module features a unified interface that works seamlessly with Neurenix's tensor and neural network modules, enabling users to easily switch between different unsupervised learning algorithms without changing their data processing code. It supports various unsupervised learning paradigms, including dimensionality reduction, clustering, density estimation, generative modeling, and self-supervised learning.

Implemented with a combination of Rust and C++ for performance-critical components and Python for the user-friendly interface, the Unsupervised Learning module ensures both computational efficiency and ease of use. It provides native support for various hardware accelerators, including GPUs, TPUs, and specialized AI hardware, with automatic optimization for the available hardware.

Key Concepts

Dimensionality Reduction

The module includes various dimensionality reduction techniques for visualizing and preprocessing high-dimensional data:

Clustering

The module includes various clustering algorithms for grouping similar data points:

Generative Models

The module includes various generative models for learning data distributions and generating new samples:

Self-Supervised Learning

The module includes various self-supervised learning techniques for learning representations without labeled data:

API Reference

Dimensionality Reduction

import neurenix
from neurenix.unsupervised import PCA, TSNE, UMAP, Autoencoder

# Principal Component Analysis
pca = PCA(
    n_components=2,                    # Number of components to keep
    whiten=False,                      # Whether to whiten the data
    svd_solver="auto",                 # SVD solver to use
    random_state=42                    # Random state for reproducibility
)

# Fit PCA to data
pca.fit(X_train)

# Transform data
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

# t-SNE
tsne = TSNE(
    n_components=2,                    # Number of components to keep
    perplexity=30.0,                   # Perplexity parameter
    early_exaggeration=12.0,           # Early exaggeration factor
    learning_rate="auto",              # Learning rate
    n_iter=1000,                       # Number of iterations
    random_state=42                    # Random state for reproducibility
)

# Fit and transform data with t-SNE
X_tsne = tsne.fit_transform(X)

Clustering

from neurenix.unsupervised import KMeans, DBSCAN, SpectralClustering, GaussianMixture

# K-Means
kmeans = KMeans(
    n_clusters=5,                      # Number of clusters
    init="k-means++",                  # Initialization method
    n_init=10,                         # Number of initializations
    max_iter=300,                      # Maximum number of iterations
    tol=1e-4,                          # Tolerance for convergence
    random_state=42                    # Random state for reproducibility
)

# Fit K-Means to data
kmeans.fit(X)

# Get cluster assignments
labels = kmeans.predict(X)

# Get cluster centers
centers = kmeans.cluster_centers_

# DBSCAN
dbscan = DBSCAN(
    eps=0.5,                           # Maximum distance between points
    min_samples=5,                     # Minimum number of samples in a neighborhood
    metric="euclidean",                # Distance metric
    algorithm="auto",                  # Algorithm to use
    n_jobs=-1                          # Number of jobs to run in parallel
)

# Fit DBSCAN to data
dbscan.fit(X)

# Get cluster assignments
labels = dbscan.labels_

Generative Models

from neurenix.unsupervised import VAE, GAN, NormalizingFlow, DiffusionModel

# Variational Autoencoder
vae = VAE(
    input_dim=X.shape[1],              # Input dimension
    hidden_dims=[128, 64],             # Hidden layer dimensions
    latent_dim=10,                     # Latent dimension
    activation="relu",                 # Activation function
    dropout_rate=0.2,                  # Dropout rate
    learning_rate=0.001,               # Learning rate
    weight_decay=1e-5,                 # Weight decay
    optimizer="adam",                  # Optimizer
    reconstruction_loss="mse",         # Reconstruction loss function
    kl_weight=1.0,                     # Weight for KL divergence term
    device="cuda" if neurenix.cuda.is_available() else "cpu"  # Device to use
)

# Fit VAE to data
vae.fit(
    X_train=X_train,
    X_val=X_val,
    batch_size=32,
    epochs=100,
    callbacks=[early_stopping, model_checkpoint]
)

# Generate samples
samples = vae.generate(n_samples=100)

Framework Comparison

Neurenix Unsupervised Learning vs. TensorFlow Unsupervised Learning

Feature Neurenix Unsupervised Learning TensorFlow Unsupervised Learning
Performance Multi-language implementation with Rust/C++ backends Python implementation with TensorFlow backend
Hardware Support Comprehensive support for various hardware Primarily focused on TPUs and GPUs
Algorithm Variety Extensive collection of classical and modern algorithms Good selection of common algorithms
Dimensionality Reduction Comprehensive support for various techniques Limited built-in support
Clustering Comprehensive support for various algorithms Limited built-in support
Generative Models Extensive collection of generative models Good support for common generative models
Self-Supervised Learning Comprehensive support for various techniques Limited built-in support
Edge Device Support Native support for edge devices Limited through TensorFlow Lite

Neurenix's Unsupervised Learning module provides better performance through its multi-language implementation and offers more comprehensive hardware support, especially for edge devices. It also provides a wider variety of unsupervised learning algorithms, more advanced dimensionality reduction and clustering techniques, and better support for self-supervised learning and anomaly detection.

Neurenix Unsupervised Learning vs. PyTorch Unsupervised Learning

Feature Neurenix Unsupervised Learning PyTorch Unsupervised Learning
Performance Multi-language implementation with Rust/C++ backends Python implementation with PyTorch backend
Hardware Support Comprehensive support for various hardware Primarily focused on CUDA devices
Algorithm Variety Extensive collection of classical and modern algorithms Good selection through separate libraries
Dimensionality Reduction Comprehensive support for various techniques Limited built-in support
Clustering Comprehensive support for various algorithms Limited built-in support
Generative Models Extensive collection of generative models Good support through separate libraries
Self-Supervised Learning Comprehensive support for various techniques Good support through separate libraries
Edge Device Support Native support for edge devices Limited through separate tools

While PyTorch has good unsupervised learning capabilities through various libraries, Neurenix's Unsupervised Learning module offers better performance through its multi-language implementation and provides more comprehensive hardware support, especially for edge devices. It also offers more integrated support for dimensionality reduction, clustering, and anomaly detection.

Neurenix Unsupervised Learning vs. Scikit-Learn Unsupervised Learning

Feature Neurenix Unsupervised Learning Scikit-Learn Unsupervised Learning
Performance Multi-language implementation with Rust/C++ backends Python implementation with limited C++ backends
Hardware Acceleration Native support for various hardware accelerators Limited hardware acceleration
Algorithm Variety Extensive collection of classical and modern algorithms Good selection of classical algorithms
Deep Learning Integration Seamless integration with deep learning Limited deep learning integration
Dimensionality Reduction Comprehensive support for various techniques Good support for classical techniques
Clustering Comprehensive support for various algorithms Good support for classical algorithms
Generative Models Extensive collection of generative models Limited support for generative models
Edge Device Support Native support for edge devices Limited edge support

Scikit-Learn has good support for classical unsupervised learning algorithms, but Neurenix's Unsupervised Learning module offers better performance through its multi-language implementation and provides more comprehensive hardware support. It also offers better integration with deep learning, more advanced generative models, and better support for self-supervised learning.

Best Practices

Choosing the Right Algorithm

  1. Consider Data Characteristics: Choose algorithms based on data characteristics.
# For high-dimensional data, use dimensionality reduction first
if X.shape[1] > 100:
    # Use PCA for linear relationships
    if linear_relationships:
        pca = neurenix.unsupervised.PCA(n_components=min(50, X.shape[1]))
        X_reduced = pca.fit_transform(X)
    # Use t-SNE or UMAP for non-linear relationships
    else:
        umap = neurenix.unsupervised.UMAP(n_components=2)
        X_reduced = umap.fit_transform(X)

    # Use the reduced data for further analysis
    X = X_reduced
  1. Consider Task Requirements: Choose algorithms based on task requirements.
# For visualization, use dimensionality reduction
if task == "visualization":
    # For preserving global structure
    if preserve_global_structure:
        pca = neurenix.unsupervised.PCA(n_components=2)
        X_viz = pca.fit_transform(X)
    # For preserving local structure
    else:
        tsne = neurenix.unsupervised.TSNE(n_components=2)
        X_viz = tsne.fit_transform(X)

Data Preprocessing

  1. Scale Features: Scale features to ensure equal contribution.
from neurenix.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Use StandardScaler for normally distributed data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Use MinMaxScaler for bounded data
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

Tutorials

Dimensionality Reduction and Visualization

import neurenix
import numpy as np
import matplotlib.pyplot as plt
from neurenix.datasets import load_digits
from neurenix.unsupervised import PCA, TSNE, UMAP
from neurenix.preprocessing import StandardScaler

# Load the digits dataset
digits = load_digits()
X = digits.data
y = digits.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2, random_state=42)
X_pca = pca.fit_transform(X_scaled)

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

# Apply UMAP
umap = UMAP(n_components=2, n_neighbors=15, min_dist=0.1, random_state=42)
X_umap = umap.fit_transform(X_scaled)

# Create a figure for visualization
plt.figure(figsize=(18, 6))

# Plot PCA results
plt.subplot(1, 3, 1)
scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap="viridis", alpha=0.7)
plt.colorbar(scatter, label="Digit")
plt.title("PCA")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.grid(True, linestyle="--", alpha=0.7)

plt.tight_layout()
plt.show()