Optimization | API - Neurenix Documentation

← Previous Page

Overview¶

The Optimization module provides algorithms for training machine learning models in the Neurenix framework. These optimizers update model parameters based on gradients to minimize the loss function, enabling the model to learn from data.

Neurenix's optimization algorithms are implemented using a multi-language architecture, where the high-performance operations are implemented in the Rust/C++ Phynexus engine, while the Python API provides a user-friendly interface. This architecture enables Neurenix to deliver optimal performance across a wide range of devices, from edge devices to multi-GPU clusters.

Key Concepts¶

Gradient-Based Optimization¶

Neurenix's optimizers use gradient-based methods to update model parameters. The gradient of the loss function with respect to each parameter indicates the direction of steepest ascent, so optimizers move parameters in the opposite direction to minimize the loss.

Parameter Groups¶

Optimizers in Neurenix support parameter groups, allowing different parts of a model to use different hyperparameters (e.g., learning rates). This is particularly useful for fine-tuning pre-trained models or implementing learning rate schedules.

Edge Device Optimization¶

Neurenix's optimizers are designed with edge devices in mind, with efficient implementations that minimize memory usage and computational requirements while maintaining high performance. This makes Neurenix particularly well-suited for AI agent applications on resource-constrained hardware.

Multi-Device Support¶

Optimizers in Neurenix can work with parameters stored on different devices (CPU, CUDA, ROCm, WebGPU), enabling efficient training across a wide range of hardware configurations.

API Reference¶

Base Optimizer¶

neurenix.optim.Optimizer(params, defaults)

Base class for all optimizers.

Parameters: - params: An iterable of tensors to optimize. - defaults: Default hyperparameters for the optimizer.

Methods: - zero_grad(): Reset the gradients of all optimized tensors. - step(): Update the parameters based on the current gradients. - add_param_group(param_group): Add a parameter group to the optimizer.

Example:

import neurenix
from neurenix.optim import Optimizer

# Create a custom optimizer
class MyOptimizer(Optimizer):
    def __init__(self, params, lr=0.01):
        defaults = {"lr": lr}
        super().__init__(params, defaults)

    def step(self):
        for group in self._parameter_groups:
            for param in group["params"]:
                if param.grad is not None:
                    # Simple gradient descent update
                    param._numpy_data -= group["lr"] * param.grad.numpy()

# Create a model
model = neurenix.nn.Sequential(
    neurenix.nn.Linear(10, 5),
    neurenix.nn.ReLU(),
    neurenix.nn.Linear(5, 1)
)

# Create an optimizer
optimizer = MyOptimizer(model.parameters(), lr=0.01)

# Training loop
for epoch in range(10):
    # Forward pass
    output = model(input_data)
    loss = loss_function(output, target_data)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Stochastic Gradient Descent (SGD)¶

neurenix.optim.SGD(params, lr=0.01, momentum=0, dampening=0, weight_decay=0, nesterov=False)

Implements stochastic gradient descent (optionally with momentum).

Parameters: - params: Iterable of parameters to optimize or dicts defining parameter groups. - lr: Learning rate. - momentum: Momentum factor. - dampening: Dampening for momentum. - weight_decay: Weight decay (L2 penalty). - nesterov: Enables Nesterov momentum.

Example:

import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD

# Create a model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Create an SGD optimizer
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)

# Training loop
for epoch in range(10):
    # Forward pass
    output = model(input_data)
    loss = loss_function(output, target_data)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Adam¶

neurenix.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0)

Implements the Adam algorithm.

Parameters: - params: Iterable of parameters to optimize or dicts defining parameter groups. - lr: Learning rate. - betas: Coefficients used for computing running averages of gradient and its square. - eps: Term added to the denominator to improve numerical stability. - weight_decay: Weight decay (L2 penalty).

Example:

import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import Adam

# Create a model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Create an Adam optimizer
optimizer = Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999))

# Training loop
for epoch in range(10):
    # Forward pass
    output = model(input_data)
    loss = loss_function(output, target_data)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Framework Comparison¶

Neurenix vs. TensorFlow¶

Feature	Neurenix	TensorFlow
Optimizer API	Object-oriented, similar to PyTorch	Functional and object-oriented APIs
Parameter Groups	Supported through add_param_group	Limited support through variable collections
Edge Optimization	Native optimization for edge devices	TensorFlow Lite for edge devices
Custom Optimizers	Easy to create through Optimizer subclassing	Requires custom optimizer implementation through tf.keras.optimizers.Optimizer
Multi-Device Support	CPU, CUDA, ROCm, WebGPU	CPU, CUDA, TPU
Multi-Language Architecture	Rust/C++ core with Python interface	C++ core with Python interface

Neurenix's optimization algorithms offer a more intuitive, PyTorch-like API compared to TensorFlow's optimizer API, making it easier for researchers and developers to create custom optimizers. The native optimization for edge devices in Neurenix provides better performance on resource-constrained hardware compared to TensorFlow Lite, which is an add-on component rather than being integrated into the core architecture.

Neurenix vs. PyTorch¶

Feature	Neurenix	PyTorch
Optimizer API	Similar to PyTorch	Object-oriented, intuitive
Parameter Groups	Supported through add_param_group	Supported through parameter groups
Edge Optimization	Native optimization for edge devices	PyTorch Mobile for edge devices
Hardware Support	CPU, CUDA, ROCm, WebGPU	CPU, CUDA
Multi-Language Architecture	Rust/C++ core with Python interface	C++ core with Python interface
WebAssembly Support	Native WebGPU support	Limited support via PyTorch.js

Neurenix's optimization algorithms are very similar to PyTorch's in terms of API design and functionality, making it easy for PyTorch users to transition to Neurenix. However, Neurenix extends hardware support to include ROCm and WebGPU, making it more versatile across different hardware platforms. The native edge device optimization in Neurenix also provides advantages over PyTorch Mobile, particularly for AI agent applications.

Neurenix vs. Scikit-Learn¶

Feature	Neurenix	Scikit-Learn
Optimizer Types	Gradient-based optimizers for neural networks	Various optimizers for different algorithms
GPU Acceleration	Native support for multiple GPU types	No native GPU support
Edge Device Support	Native optimization for edge devices	No specific edge device support
API Design	Object-oriented, focused on neural networks	Function-based, focused on specific algorithms
Parameter Groups	Supported through add_param_group	Not applicable (different optimization paradigm)
Multi-Language Architecture	Rust/C++ core with Python interface	Pure Python with C/C++ extensions

Neurenix provides optimizers specifically designed for neural networks, while Scikit-Learn offers a variety of optimization algorithms for different machine learning models. While Scikit-Learn's optimizers are integrated with specific algorithms, Neurenix's optimizers are more general-purpose and can be used with any differentiable model. Additionally, Neurenix's optimizers support GPU acceleration and edge device optimization, making them more suitable for deep learning and AI agent applications.

Best Practices¶

Choosing an Optimizer¶

When choosing an optimizer for your model, consider these factors:

Task Complexity: For simple tasks, SGD with momentum is often sufficient. For more complex tasks, Adam is generally a good default choice.
Convergence Speed: Adam typically converges faster than SGD, but SGD may reach better final solutions in some cases.
Memory Requirements: Adam requires more memory than SGD due to its moment estimates.
Hardware Constraints: On edge devices with limited memory, SGD may be preferable to Adam.

import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD, Adam

# Create a model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# For simple tasks or memory-constrained devices
optimizer_sgd = SGD(model.parameters(), lr=0.01, momentum=0.9)

# For complex tasks or faster convergence
optimizer_adam = Adam(model.parameters(), lr=0.001)

Setting Learning Rates¶

Choosing an appropriate learning rate is crucial for effective training:

Start with a Reasonable Default: 0.01 for SGD, 0.001 for Adam.
Learning Rate Schedules: Decrease the learning rate over time to fine-tune the model.
Learning Rate Warmup: Gradually increase the learning rate at the beginning of training.
Different Learning Rates for Different Layers: Use parameter groups to set different learning rates for different parts of the model.

import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD

# Create a model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Create an optimizer with different learning rates for different layers
optimizer = SGD([
    {"params": model[0].parameters(), "lr": 0.01},  # First layer
    {"params": model[2].parameters(), "lr": 0.001}  # Last layer
], lr=0.005)  # Default learning rate for other parameters

Using Weight Decay¶

Weight decay (L2 regularization) can help prevent overfitting:

Choose an Appropriate Value: Typical values range from 1e-6 to 1e-4.
Different Weight Decay for Different Layers: Use parameter groups to set different weight decay values for different parts of the model.
No Weight Decay for Bias Terms: It's common practice to apply weight decay only to weights, not biases.

import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import Adam

# Create a model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Create an optimizer with weight decay
optimizer = Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

Optimizing for Edge Devices¶

When deploying models to edge devices, consider these optimizations:

Choose Memory-Efficient Optimizers: SGD requires less memory than Adam.
Reduce Precision: Use lower precision data types when possible.
Minimize Parameter Count: Use smaller models or techniques like pruning and quantization.
Batch Size Adjustment: Use smaller batch sizes to reduce memory requirements.

import neurenix
from neurenix import Device, DeviceType
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD

# Create a small model for edge devices
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Use the most efficient available device
devices = neurenix.get_available_devices()
edge_device = None

# Prioritize WebGPU for browser-based edge devices
for device in devices:
    if device.type == DeviceType.WEBGPU:
        edge_device = device
        break

# Fall back to CPU if no accelerator is available
if edge_device is None:
    edge_device = Device(DeviceType.CPU)

# Move model to the selected device
for param in model.parameters():
    param.to(edge_device, inplace=True)

# Create a memory-efficient optimizer
optimizer = SGD(model.parameters(), lr=0.01)

Tutorials¶

Basic Optimization with SGD¶

import neurenix
from neurenix.nn import Linear, Sequential, ReLU, MSELoss
from neurenix.optim import SGD

# Create a simple model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Generate some dummy data
input_data = neurenix.Tensor.randn((100, 10))
target_data = neurenix.Tensor.randn((100, 1))

# Create loss function and optimizer
criterion = MSELoss()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

# Training loop
model.train()
for epoch in range(100):
    # Forward pass
    output = model(input_data)
    loss = criterion(output, target_data)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item()}")

# Evaluation
model.eval()
with neurenix.no_grad():
    test_input = neurenix.Tensor.randn((20, 10))
    predictions = model(test_input)
    print(f"Predictions shape: {predictions.shape}")

Using Adam with Parameter Groups¶

import neurenix
from neurenix.nn import Module, Linear, ReLU
from neurenix.optim import Adam

# Create a custom model
class TwoPartModel(Module):
    def __init__(self):
        super().__init__()
        self.feature_extractor = Linear(10, 5)
        self.classifier = Linear(5, 1)

    def forward(self, x):
        features = self.feature_extractor(x)
        features = ReLU()(features)
        return self.classifier(features)

# Create the model
model = TwoPartModel()

# Generate some dummy data
input_data = neurenix.Tensor.randn((100, 10))
target_data = neurenix.Tensor.randn((100, 1))

# Create an optimizer with different parameter groups
optimizer = Adam([
    {"params": model.feature_extractor.parameters(), "lr": 0.0001},  # Lower learning rate for feature extractor
    {"params": model.classifier.parameters(), "lr": 0.001}           # Higher learning rate for classifier
])

# Training loop
model.train()
for epoch in range(100):
    # Forward pass
    output = model(input_data)
    loss = neurenix.nn.MSELoss()(output, target_data)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item()}")

Implementing a Learning Rate Schedule¶

import neurenix
from neurenix.nn import Linear, Sequential, ReLU, MSELoss
from neurenix.optim import SGD

# Create a simple model
model = Sequential(
    Linear(10, 5),
    ReLU(),
    Linear(5, 1)
)

# Generate some dummy data
input_data = neurenix.Tensor.randn((100, 10))
target_data = neurenix.Tensor.randn((100, 1))

# Create loss function and optimizer
criterion = MSELoss()
optimizer = SGD(model.parameters(), lr=0.1)

# Training loop with learning rate schedule
model.train()
for epoch in range(100):
    # Adjust learning rate
    if epoch == 30:
        for param_group in optimizer._parameter_groups:
            param_group["lr"] = 0.01
    elif epoch == 60:
        for param_group in optimizer._parameter_groups:
            param_group["lr"] = 0.001

    # Forward pass
    output = model(input_data)
    loss = criterion(output, target_data)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        current_lr = optimizer._parameter_groups[0]["lr"]
        print(f"Epoch {epoch+1}, Loss: {loss.item()}, LR: {current_lr}")

Conclusion¶

The Optimization module of Neurenix provides a comprehensive set of algorithms for training machine learning models. Its multi-language architecture with a high-performance Rust/C++ core enables optimal performance across a wide range of devices, from edge devices to multi-GPU clusters.

Compared to other frameworks like TensorFlow, PyTorch, and Scikit-Learn, Neurenix's Optimization module offers advantages in terms of API design, hardware support, and edge device optimization. These features make Neurenix particularly well-suited for AI agent development, edge computing, and browser-based applications.

Page

Table Of Contents

Optimization Documentation¶

Overview¶

Key Concepts¶

Gradient-Based Optimization¶

Parameter Groups¶

Edge Device Optimization¶

Multi-Device Support¶

API Reference¶

Base Optimizer¶

Stochastic Gradient Descent (SGD)¶

Adam¶

Framework Comparison¶

Neurenix vs. TensorFlow¶

Neurenix vs. PyTorch¶

Neurenix vs. Scikit-Learn¶

Best Practices¶

Choosing an Optimizer¶

Setting Learning Rates¶

Using Weight Decay¶

Optimizing for Edge Devices¶

Tutorials¶

Basic Optimization with SGD¶

Using Adam with Parameter Groups¶

Implementing a Learning Rate Schedule¶

Conclusion¶