← Previous Page

MKL-DNN Support in Phynexus¶

This document describes the Intel MKL-DNN (Math Kernel Library for Deep Neural Networks) support in the Phynexus framework.

Overview¶

Intel MKL-DNN is the predecessor to oneDNN, providing optimized primitives for deep learning applications on Intel CPUs. It includes highly optimized implementations of basic operations like convolution, matrix multiplication, and activation functions specifically tuned for Intel architectures. Phynexus maintains support for MKL-DNN to ensure compatibility with systems that have not yet migrated to oneDNN.

The MKL-DNN backend in Phynexus leverages Intel's optimized primitives to significantly improve performance for both training and inference workloads on Intel CPUs. This integration allows models to run efficiently on a wide range of Intel platforms, from laptops to high-performance computing systems.

Features¶

The MKL-DNN backend in Phynexus provides:

Optimized deep learning operations for Intel CPUs
Support for both training and inference workloads
Automatic primitive selection and optimization
Memory format propagation for optimal performance
Integration with Intel's hardware features (AVX, AVX2, AVX-512)
Support for various data types (FP32, FP16, INT8)
Seamless integration with the rest of the framework

Hardware Requirements¶

MKL-DNN support in Phynexus requires:

Intel CPU with SSE4.1 or later (AVX2 or AVX-512 recommended for best performance)
MKL-DNN library installed on the system

Compatible hardware includes: - Intel Xeon processors - Intel Core processors - Intel Atom processors

Usage¶

Checking for MKL-DNN Availability¶

# Python
from neurenix.hardware.mkldnn import is_mkldnn_available

if is_mkldnn_available():
    print("MKL-DNN is available")
else:
    print("MKL-DNN is not available")

Creating a MKL-DNN Backend¶

# Python
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create the backend
try:
    mkldnn = MKLDNNBackend()

    # Initialize the backend
    if mkldnn.initialize():
        print("MKL-DNN backend initialized successfully")
    else:
        print("Failed to initialize MKL-DNN backend")
except RuntimeError as e:
    print(f"MKL-DNN error: {e}")

Getting Device Information¶

# Python
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create and initialize the backend
mkldnn = MKLDNNBackend()
mkldnn.initialize()

# Get the number of available devices
device_count = mkldnn.get_device_count()
print(f"Available MKL-DNN devices: {device_count}")

# Get information about a specific device
device_info = mkldnn.get_device_info(0)  # First device
print(f"Device info: {device_info}")

Using MKL-DNN for Tensor Operations¶

# Python
import neurenix as nx
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create tensors
a = nx.Tensor([[1, 2], [3, 4]])
b = nx.Tensor([[5, 6], [7, 8]])

# Create and initialize the backend
mkldnn = MKLDNNBackend()
mkldnn.initialize()

# Perform matrix multiplication using MKL-DNN
c = mkldnn.matmul(a, b)
print(f"Result: {c}")

Using MKL-DNN for Convolution¶

# Python
import neurenix as nx
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create input and weight tensors
input = nx.random.randn(1, 3, 32, 32)  # Batch, Channels, Height, Width
weight = nx.random.randn(16, 3, 3, 3)  # Out channels, In channels, Kernel H, Kernel W

# Create and initialize the backend
mkldnn = MKLDNNBackend()
mkldnn.initialize()

# Perform 2D convolution using MKL-DNN
output = mkldnn.conv2d(
    input=input,
    weight=weight,
    bias=None,
    stride=(1, 1),
    padding=(1, 1)
)
print(f"Output shape: {output.shape}")

Using MKL-DNN for LSTM Operations¶

# Python
import neurenix as nx
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create input and hidden state tensors
batch_size = 1
seq_length = 10
input_size = 20
hidden_size = 30

input = nx.random.randn(seq_length, batch_size, input_size)
h0 = nx.random.randn(batch_size, hidden_size)
c0 = nx.random.randn(batch_size, hidden_size)
hidden = (h0, c0)
weight_ih = nx.random.randn(4 * hidden_size, input_size)
weight_hh = nx.random.randn(4 * hidden_size, hidden_size)
bias_ih = nx.random.randn(4 * hidden_size)
bias_hh = nx.random.randn(4 * hidden_size)

# Create and initialize the backend
mkldnn = MKLDNNBackend()
mkldnn.initialize()

# Perform LSTM operation using MKL-DNN
output, (h_n, c_n) = mkldnn.lstm(
    input=input,
    hidden=(h0, c0),
    weight_ih=weight_ih,
    weight_hh=weight_hh,
    bias_ih=bias_ih,
    bias_hh=bias_hh
)
print(f"Output shape: {output.shape}")
print(f"Final hidden state shape: {h_n.shape}")
print(f"Final cell state shape: {c_n.shape}")

Implementation Details¶

The MKL-DNN backend implementation in Phynexus follows a layered architecture:

Python API Layer: Provides a user-friendly interface for MKL-DNN operations
Rust Binding Layer: Connects the Python API to the C++ implementation
C++ Implementation Layer: Implements the core MKL-DNN functionality

Engine and Stream Management¶

The implementation uses MKL-DNN's engine and stream abstractions:

Engine: Represents a computational device (CPU)
Stream: Represents a queue of operations to be executed on an engine

These abstractions allow for efficient execution and synchronization of operations.

Memory Management¶

The implementation includes sophisticated memory management:

Memory Descriptors: Define the logical organization of data
Memory Objects: Represent actual data in memory
Memory Format Propagation: Automatically select optimal memory formats
Reorder Primitives: Handle format conversions when necessary

Primitive Management¶

MKL-DNN operations are represented as primitives:

Primitive Descriptors: Define the operation and its parameters
Primitives: Executable objects that perform the operation
Primitive Caching: Reuse primitives for repeated operations

Best Practices¶

Memory Format Optimization¶

For optimal performance with MKL-DNN:

# Python
import neurenix as nx
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create and initialize the backend
mkldnn = MKLDNNBackend()
mkldnn.initialize()

# When creating tensors, let MKL-DNN choose the optimal memory format
# (In a real implementation, this would be handled internally)

Primitive Reuse¶

Reuse primitives for repeated operations:

# Python
# In a real implementation, the MKLDNNBackend class would cache primitives internally
# for repeated operations with the same parameters

Data Type Selection¶

Choose appropriate data types for your workload:

# Python
import neurenix as nx
from neurenix.hardware.mkldnn import MKLDNNBackend

# Create and initialize the backend
mkldnn = MKLDNNBackend()
mkldnn.initialize()

# For inference, consider using lower precision
# (In a real implementation, this would be configurable)

Framework Comparison¶

Neurenix vs. TensorFlow¶

Feature	Neurenix	TensorFlow
MKL-DNN Integration	Native integration	Via MKL-DNN plugin
Intel CPU Optimization	Comprehensive	Limited
Memory Format Optimization	Automatic	Manual configuration required
Primitive Caching	Automatic	Limited
API Simplicity	Unified API	Separate API for MKL-DNN
Transition to oneDNN	Smooth transition path	Requires code changes

Neurenix provides more seamless integration with MKL-DNN compared to TensorFlow, which requires a separate plugin. The unified API in Neurenix makes it easier to use MKL-DNN acceleration while maintaining compatibility with other backends, and provides a smooth transition path to oneDNN.

Neurenix vs. PyTorch¶

Feature	Neurenix	PyTorch
MKL-DNN Integration	Native integration	Via MKLDNN backend
Intel CPU Optimization	Comprehensive	Good
Memory Format Optimization	Automatic	Manual configuration required
Primitive Caching	Automatic	Limited
API Simplicity	Unified API	Separate API for MKLDNN
Transition to oneDNN	Smooth transition path	Requires code changes

PyTorch has some integration with MKL-DNN through its MKLDNN backend, but it's not as deeply integrated as Neurenix's native MKL-DNN support. Neurenix provides a more seamless experience with automatic memory format optimization and primitive caching, as well as a smoother transition path to oneDNN.

Neurenix vs. Scikit-Learn¶

Feature	Neurenix	Scikit-Learn
MKL-DNN Support	Comprehensive support	No MKL-DNN support
Deep Learning Primitives	Optimized primitives	Generic implementations
Intel CPU Optimization	Comprehensive	Limited
Memory Format Optimization	Automatic	N/A
Performance on Intel Hardware	Optimized	Not optimized for deep learning
Transition to oneDNN	Smooth transition path	N/A

Scikit-Learn does not provide any MKL-DNN integration, focusing on traditional machine learning algorithms rather than deep learning. Neurenix's MKL-DNN support enables significant performance improvements for deep learning workloads on Intel hardware.

Limitations¶

The current MKL-DNN implementation has the following limitations:

CPU Only: Only supports Intel CPUs, not GPUs or other accelerators
Legacy Support: MKL-DNN is the predecessor to oneDNN and may not receive new features
Limited Operations: Not all operations are optimized for MKL-DNN
Library Dependency: Requires MKL-DNN library installation
Implementation Status: Some features are still in development

Future Work¶

Future development of the MKL-DNN backend will focus on:

Migration Path: Providing a smooth migration path to oneDNN
Compatibility Layer: Ensuring compatibility with older systems
Performance Optimizations: Further tuning for different Intel architectures
Feature Parity: Ensuring feature parity with the oneDNN backend
Documentation: Improving documentation for migration from MKL-DNN to oneDNN

Page

Table Of Contents