This document describes the Intel oneDNN (Deep Neural Network Library) support in the Phynexus framework.
Intel oneDNN (formerly DNNL and MKL-DNN) is an open-source performance library for deep learning applications. It includes highly optimized implementations of basic operations like convolution, matrix multiplication, and activation functions specifically tuned for Intel architectures. Phynexus integrates with oneDNN to provide accelerated deep learning operations on Intel CPUs and GPUs.
The oneDNN backend in Phynexus leverages Intel's optimized primitives to significantly improve performance for both training and inference workloads on Intel hardware. This integration allows models to run efficiently on a wide range of Intel platforms, from laptops to high-performance computing systems.
The oneDNN backend in Phynexus provides:
oneDNN support in Phynexus requires:
Compatible hardware includes: - Intel Xeon processors - Intel Core processors - Intel Atom processors - Intel Iris, UHD, and Arc GPUs
# Python
from neurenix.hardware.onednn import is_onednn_available
if is_onednn_available():
print("oneDNN is available")
else:
print("oneDNN is not available")
# Python
from neurenix.hardware.onednn import OneDNNBackend
# Create the backend
try:
onednn = OneDNNBackend()
# Initialize the backend
if onednn.initialize():
print("oneDNN backend initialized successfully")
else:
print("Failed to initialize oneDNN backend")
except RuntimeError as e:
print(f"oneDNN error: {e}")
# Python
from neurenix.hardware.onednn import OneDNNBackend
# Create and initialize the backend
onednn = OneDNNBackend()
onednn.initialize()
# Get the number of available devices
device_count = onednn.get_device_count()
print(f"Available oneDNN devices: {device_count}")
# Get information about a specific device
device_info = onednn.get_device_info(0) # First device
print(f"Device info: {device_info}")
# Python
import neurenix as nx
from neurenix.hardware.onednn import OneDNNBackend
# Create tensors
a = nx.Tensor([[1, 2], [3, 4]])
b = nx.Tensor([[5, 6], [7, 8]])
# Create and initialize the backend
onednn = OneDNNBackend()
onednn.initialize()
# Perform matrix multiplication using oneDNN
c = onednn.matmul(a, b)
print(f"Result: {c}")
# Python
import neurenix as nx
from neurenix.hardware.onednn import OneDNNBackend
# Create input and weight tensors
input = nx.random.randn(1, 3, 32, 32) # Batch, Channels, Height, Width
weight = nx.random.randn(16, 3, 3, 3) # Out channels, In channels, Kernel H, Kernel W
# Create and initialize the backend
onednn = OneDNNBackend()
onednn.initialize()
# Perform 2D convolution using oneDNN
output = onednn.conv2d(
input=input,
weight=weight,
bias=None,
stride=(1, 1),
padding=(1, 1)
)
print(f"Output shape: {output.shape}")
# Python
import neurenix as nx
from neurenix.hardware.onednn import OneDNNBackend
# Create input and hidden state tensors
batch_size = 1
seq_length = 10
input_size = 20
hidden_size = 30
input = nx.random.randn(seq_length, batch_size, input_size)
hidden = nx.random.randn(batch_size, hidden_size)
weight_ih = nx.random.randn(hidden_size, input_size)
weight_hh = nx.random.randn(hidden_size, hidden_size)
bias_ih = nx.random.randn(hidden_size)
bias_hh = nx.random.randn(hidden_size)
# Create and initialize the backend
onednn = OneDNNBackend()
onednn.initialize()
# Perform RNN operation using oneDNN
output, new_hidden = onednn.rnn(
input=input,
hidden=hidden,
weight_ih=weight_ih,
weight_hh=weight_hh,
bias_ih=bias_ih,
bias_hh=bias_hh
)
print(f"Output shape: {output.shape}")
print(f"New hidden shape: {new_hidden.shape}")
The oneDNN backend implementation in Phynexus follows a layered architecture:
The implementation uses oneDNN's engine and stream abstractions:
These abstractions allow for efficient execution and synchronization of operations.
The implementation includes sophisticated memory management:
oneDNN operations are represented as primitives:
For optimal performance with oneDNN:
# Python
import neurenix as nx
from neurenix.hardware.onednn import OneDNNBackend
# Create and initialize the backend
onednn = OneDNNBackend()
onednn.initialize()
# When creating tensors, let oneDNN choose the optimal memory format
# (In a real implementation, this would be handled internally)
Reuse primitives for repeated operations:
# Python
# In a real implementation, the OneDNNBackend class would cache primitives internally
# for repeated operations with the same parameters
Choose appropriate data types for your workload:
# Python
import neurenix as nx
from neurenix.hardware.onednn import OneDNNBackend
# Create and initialize the backend
onednn = OneDNNBackend()
onednn.initialize()
# For inference, consider using lower precision
# (In a real implementation, this would be configurable)
| Feature | Neurenix | TensorFlow |
|---|---|---|
| oneDNN Integration | Native integration | Via oneDNN plugin |
| Intel CPU Optimization | Comprehensive | Limited |
| Intel GPU Support | Native support | Limited support |
| Memory Format Optimization | Automatic | Manual configuration required |
| Primitive Caching | Automatic | Limited |
| API Simplicity | Unified API | Separate API for oneDNN |
Neurenix provides more seamless integration with oneDNN compared to TensorFlow, which requires a separate plugin. The unified API in Neurenix makes it easier to use oneDNN acceleration while maintaining compatibility with other backends.
| Feature | Neurenix | PyTorch |
|---|---|---|
| oneDNN Integration | Native integration | Via MKLDNN backend |
| Intel CPU Optimization | Comprehensive | Good |
| Intel GPU Support | Native support | Limited support |
| Memory Format Optimization | Automatic | Manual configuration required |
| Primitive Caching | Automatic | Limited |
| API Simplicity | Unified API | Separate API for MKLDNN |
PyTorch has some integration with oneDNN through its MKLDNN backend, but it's not as deeply integrated as Neurenix's native oneDNN support. Neurenix provides a more seamless experience with automatic memory format optimization and primitive caching.
| Feature | Neurenix | Scikit-Learn |
|---|---|---|
| oneDNN Support | Comprehensive support | No oneDNN support |
| Deep Learning Primitives | Optimized primitives | Generic implementations |
| Intel CPU Optimization | Comprehensive | Limited |
| Intel GPU Support | Native support | No GPU support |
| Memory Format Optimization | Automatic | N/A |
| Performance on Intel Hardware | Optimized | Not optimized for deep learning |
Scikit-Learn does not provide any oneDNN integration, focusing on traditional machine learning algorithms rather than deep learning. Neurenix's oneDNN support enables significant performance improvements for deep learning workloads on Intel hardware.
The current oneDNN implementation has the following limitations:
Future development of the oneDNN backend will include: