The Optimization module provides algorithms for training machine learning models in the Neurenix framework. These optimizers update model parameters based on gradients to minimize the loss function, enabling the model to learn from data.
Neurenix's optimization algorithms are implemented using a multi-language architecture, where the high-performance operations are implemented in the Rust/C++ Phynexus engine, while the Python API provides a user-friendly interface. This architecture enables Neurenix to deliver optimal performance across a wide range of devices, from edge devices to multi-GPU clusters.
Neurenix's optimizers use gradient-based methods to update model parameters. The gradient of the loss function with respect to each parameter indicates the direction of steepest ascent, so optimizers move parameters in the opposite direction to minimize the loss.
Optimizers in Neurenix support parameter groups, allowing different parts of a model to use different hyperparameters (e.g., learning rates). This is particularly useful for fine-tuning pre-trained models or implementing learning rate schedules.
Neurenix's optimizers are designed with edge devices in mind, with efficient implementations that minimize memory usage and computational requirements while maintaining high performance. This makes Neurenix particularly well-suited for AI agent applications on resource-constrained hardware.
Optimizers in Neurenix can work with parameters stored on different devices (CPU, CUDA, ROCm, WebGPU), enabling efficient training across a wide range of hardware configurations.
neurenix.optim.Optimizer(params, defaults)
Base class for all optimizers.
Parameters:
- params: An iterable of tensors to optimize.
- defaults: Default hyperparameters for the optimizer.
Methods:
- zero_grad(): Reset the gradients of all optimized tensors.
- step(): Update the parameters based on the current gradients.
- add_param_group(param_group): Add a parameter group to the optimizer.
Example:
import neurenix
from neurenix.optim import Optimizer
# Create a custom optimizer
class MyOptimizer(Optimizer):
def __init__(self, params, lr=0.01):
defaults = {"lr": lr}
super().__init__(params, defaults)
def step(self):
for group in self._parameter_groups:
for param in group["params"]:
if param.grad is not None:
# Simple gradient descent update
param._numpy_data -= group["lr"] * param.grad.numpy()
# Create a model
model = neurenix.nn.Sequential(
neurenix.nn.Linear(10, 5),
neurenix.nn.ReLU(),
neurenix.nn.Linear(5, 1)
)
# Create an optimizer
optimizer = MyOptimizer(model.parameters(), lr=0.01)
# Training loop
for epoch in range(10):
# Forward pass
output = model(input_data)
loss = loss_function(output, target_data)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
neurenix.optim.SGD(params, lr=0.01, momentum=0, dampening=0, weight_decay=0, nesterov=False)
Implements stochastic gradient descent (optionally with momentum).
Parameters:
- params: Iterable of parameters to optimize or dicts defining parameter groups.
- lr: Learning rate.
- momentum: Momentum factor.
- dampening: Dampening for momentum.
- weight_decay: Weight decay (L2 penalty).
- nesterov: Enables Nesterov momentum.
Example:
import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD
# Create a model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Create an SGD optimizer
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)
# Training loop
for epoch in range(10):
# Forward pass
output = model(input_data)
loss = loss_function(output, target_data)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
neurenix.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0)
Implements the Adam algorithm.
Parameters:
- params: Iterable of parameters to optimize or dicts defining parameter groups.
- lr: Learning rate.
- betas: Coefficients used for computing running averages of gradient and its square.
- eps: Term added to the denominator to improve numerical stability.
- weight_decay: Weight decay (L2 penalty).
Example:
import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import Adam
# Create a model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Create an Adam optimizer
optimizer = Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999))
# Training loop
for epoch in range(10):
# Forward pass
output = model(input_data)
loss = loss_function(output, target_data)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
| Feature | Neurenix | TensorFlow |
|---|---|---|
| Optimizer API | Object-oriented, similar to PyTorch | Functional and object-oriented APIs |
| Parameter Groups | Supported through add_param_group | Limited support through variable collections |
| Edge Optimization | Native optimization for edge devices | TensorFlow Lite for edge devices |
| Custom Optimizers | Easy to create through Optimizer subclassing | Requires custom optimizer implementation through tf.keras.optimizers.Optimizer |
| Multi-Device Support | CPU, CUDA, ROCm, WebGPU | CPU, CUDA, TPU |
| Multi-Language Architecture | Rust/C++ core with Python interface | C++ core with Python interface |
Neurenix's optimization algorithms offer a more intuitive, PyTorch-like API compared to TensorFlow's optimizer API, making it easier for researchers and developers to create custom optimizers. The native optimization for edge devices in Neurenix provides better performance on resource-constrained hardware compared to TensorFlow Lite, which is an add-on component rather than being integrated into the core architecture.
| Feature | Neurenix | PyTorch |
|---|---|---|
| Optimizer API | Similar to PyTorch | Object-oriented, intuitive |
| Parameter Groups | Supported through add_param_group | Supported through parameter groups |
| Edge Optimization | Native optimization for edge devices | PyTorch Mobile for edge devices |
| Hardware Support | CPU, CUDA, ROCm, WebGPU | CPU, CUDA |
| Multi-Language Architecture | Rust/C++ core with Python interface | C++ core with Python interface |
| WebAssembly Support | Native WebGPU support | Limited support via PyTorch.js |
Neurenix's optimization algorithms are very similar to PyTorch's in terms of API design and functionality, making it easy for PyTorch users to transition to Neurenix. However, Neurenix extends hardware support to include ROCm and WebGPU, making it more versatile across different hardware platforms. The native edge device optimization in Neurenix also provides advantages over PyTorch Mobile, particularly for AI agent applications.
| Feature | Neurenix | Scikit-Learn |
|---|---|---|
| Optimizer Types | Gradient-based optimizers for neural networks | Various optimizers for different algorithms |
| GPU Acceleration | Native support for multiple GPU types | No native GPU support |
| Edge Device Support | Native optimization for edge devices | No specific edge device support |
| API Design | Object-oriented, focused on neural networks | Function-based, focused on specific algorithms |
| Parameter Groups | Supported through add_param_group | Not applicable (different optimization paradigm) |
| Multi-Language Architecture | Rust/C++ core with Python interface | Pure Python with C/C++ extensions |
Neurenix provides optimizers specifically designed for neural networks, while Scikit-Learn offers a variety of optimization algorithms for different machine learning models. While Scikit-Learn's optimizers are integrated with specific algorithms, Neurenix's optimizers are more general-purpose and can be used with any differentiable model. Additionally, Neurenix's optimizers support GPU acceleration and edge device optimization, making them more suitable for deep learning and AI agent applications.
When choosing an optimizer for your model, consider these factors:
import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD, Adam
# Create a model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# For simple tasks or memory-constrained devices
optimizer_sgd = SGD(model.parameters(), lr=0.01, momentum=0.9)
# For complex tasks or faster convergence
optimizer_adam = Adam(model.parameters(), lr=0.001)
Choosing an appropriate learning rate is crucial for effective training:
import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD
# Create a model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Create an optimizer with different learning rates for different layers
optimizer = SGD([
{"params": model[0].parameters(), "lr": 0.01}, # First layer
{"params": model[2].parameters(), "lr": 0.001} # Last layer
], lr=0.005) # Default learning rate for other parameters
Weight decay (L2 regularization) can help prevent overfitting:
import neurenix
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import Adam
# Create a model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Create an optimizer with weight decay
optimizer = Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
When deploying models to edge devices, consider these optimizations:
import neurenix
from neurenix import Device, DeviceType
from neurenix.nn import Linear, Sequential, ReLU
from neurenix.optim import SGD
# Create a small model for edge devices
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Use the most efficient available device
devices = neurenix.get_available_devices()
edge_device = None
# Prioritize WebGPU for browser-based edge devices
for device in devices:
if device.type == DeviceType.WEBGPU:
edge_device = device
break
# Fall back to CPU if no accelerator is available
if edge_device is None:
edge_device = Device(DeviceType.CPU)
# Move model to the selected device
for param in model.parameters():
param.to(edge_device, inplace=True)
# Create a memory-efficient optimizer
optimizer = SGD(model.parameters(), lr=0.01)
import neurenix
from neurenix.nn import Linear, Sequential, ReLU, MSELoss
from neurenix.optim import SGD
# Create a simple model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Generate some dummy data
input_data = neurenix.Tensor.randn((100, 10))
target_data = neurenix.Tensor.randn((100, 1))
# Create loss function and optimizer
criterion = MSELoss()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
# Training loop
model.train()
for epoch in range(100):
# Forward pass
output = model(input_data)
loss = criterion(output, target_data)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Loss: {loss.item()}")
# Evaluation
model.eval()
with neurenix.no_grad():
test_input = neurenix.Tensor.randn((20, 10))
predictions = model(test_input)
print(f"Predictions shape: {predictions.shape}")
import neurenix
from neurenix.nn import Module, Linear, ReLU
from neurenix.optim import Adam
# Create a custom model
class TwoPartModel(Module):
def __init__(self):
super().__init__()
self.feature_extractor = Linear(10, 5)
self.classifier = Linear(5, 1)
def forward(self, x):
features = self.feature_extractor(x)
features = ReLU()(features)
return self.classifier(features)
# Create the model
model = TwoPartModel()
# Generate some dummy data
input_data = neurenix.Tensor.randn((100, 10))
target_data = neurenix.Tensor.randn((100, 1))
# Create an optimizer with different parameter groups
optimizer = Adam([
{"params": model.feature_extractor.parameters(), "lr": 0.0001}, # Lower learning rate for feature extractor
{"params": model.classifier.parameters(), "lr": 0.001} # Higher learning rate for classifier
])
# Training loop
model.train()
for epoch in range(100):
# Forward pass
output = model(input_data)
loss = neurenix.nn.MSELoss()(output, target_data)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Loss: {loss.item()}")
import neurenix
from neurenix.nn import Linear, Sequential, ReLU, MSELoss
from neurenix.optim import SGD
# Create a simple model
model = Sequential(
Linear(10, 5),
ReLU(),
Linear(5, 1)
)
# Generate some dummy data
input_data = neurenix.Tensor.randn((100, 10))
target_data = neurenix.Tensor.randn((100, 1))
# Create loss function and optimizer
criterion = MSELoss()
optimizer = SGD(model.parameters(), lr=0.1)
# Training loop with learning rate schedule
model.train()
for epoch in range(100):
# Adjust learning rate
if epoch == 30:
for param_group in optimizer._parameter_groups:
param_group["lr"] = 0.01
elif epoch == 60:
for param_group in optimizer._parameter_groups:
param_group["lr"] = 0.001
# Forward pass
output = model(input_data)
loss = criterion(output, target_data)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
current_lr = optimizer._parameter_groups[0]["lr"]
print(f"Epoch {epoch+1}, Loss: {loss.item()}, LR: {current_lr}")
The Optimization module of Neurenix provides a comprehensive set of algorithms for training machine learning models. Its multi-language architecture with a high-performance Rust/C++ core enables optimal performance across a wide range of devices, from edge devices to multi-GPU clusters.
Compared to other frameworks like TensorFlow, PyTorch, and Scikit-Learn, Neurenix's Optimization module offers advantages in terms of API design, hardware support, and edge device optimization. These features make Neurenix particularly well-suited for AI agent development, edge computing, and browser-based applications.