Page

Table Of Contents


Built with 🛠 MkDocs - Theme 🖤 Github.

DirectML Support in Phynexus

This document describes the DirectML support in the Phynexus framework.

Overview

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. It enables AI workloads to run on a wide variety of DirectX 12-capable GPUs on Windows systems. Phynexus includes DirectML support to provide hardware acceleration on Windows devices, particularly those without dedicated CUDA or ROCm support.

DirectML leverages the DirectX 12 API to provide cross-vendor GPU acceleration, allowing models to run efficiently on GPUs from NVIDIA, AMD, and Intel on Windows platforms. This makes it an excellent choice for deploying AI applications on Windows systems with diverse hardware configurations.

Features

The DirectML backend in Phynexus provides:

Hardware Requirements

DirectML support in Phynexus requires:

Compatible hardware includes: - NVIDIA GPUs (Kepler architecture or newer) - AMD GPUs (GCN architecture or newer) - Intel GPUs (Gen9 architecture or newer) - Any other DirectX 12 compatible GPU

Usage

Checking for DirectML Availability

# Python
from neurenix.hardware.directml import is_directml_available

if is_directml_available():
    print("DirectML is available")
else:
    print("DirectML is not available")

Creating a DirectML Backend

# Python
from neurenix.hardware.directml import DirectMLBackend

# Create the backend
try:
    directml = DirectMLBackend()

    # Initialize the backend
    if directml.initialize():
        print("DirectML backend initialized successfully")
    else:
        print("Failed to initialize DirectML backend")
except RuntimeError as e:
    print(f"DirectML error: {e}")

Getting Device Information

# Python
from neurenix.hardware.directml import DirectMLBackend

# Create and initialize the backend
directml = DirectMLBackend()
directml.initialize()

# Get the number of available devices
device_count = directml.get_device_count()
print(f"Available DirectML devices: {device_count}")

# Get information about a specific device
device_info = directml.get_device_info(0)  # First device
print(f"Device info: {device_info}")

Using DirectML for Tensor Operations

# Python
import neurenix as nx
from neurenix.hardware.directml import DirectMLBackend

# Create tensors
a = nx.Tensor([[1, 2], [3, 4]])
b = nx.Tensor([[5, 6], [7, 8]])

# Create and initialize the backend
directml = DirectMLBackend()
directml.initialize()

# Perform matrix multiplication using DirectML
c = directml.matmul(a, b)
print(f"Result: {c}")

Using DirectML for Convolution

# Python
import neurenix as nx
from neurenix.hardware.directml import DirectMLBackend

# Create input and weight tensors
input = nx.random.randn(1, 3, 32, 32)  # Batch, Channels, Height, Width
weight = nx.random.randn(16, 3, 3, 3)  # Out channels, In channels, Kernel H, Kernel W

# Create and initialize the backend
directml = DirectMLBackend()
directml.initialize()

# Perform 2D convolution using DirectML
output = directml.conv2d(
    input=input,
    weight=weight,
    bias=None,
    stride=(1, 1),
    padding=(1, 1)
)
print(f"Output shape: {output.shape}")

Implementation Details

The DirectML backend implementation in Phynexus follows a layered architecture:

  1. Python API Layer: Provides a user-friendly interface for DirectML operations
  2. Rust Binding Layer: Connects the Python API to the C++ implementation
  3. C++ Implementation Layer: Implements the core DirectML functionality

DirectML Integration

The implementation uses the DirectML API to:

  1. Enumerate and select DirectX 12 devices
  2. Create device contexts and command queues
  3. Compile and execute operators
  4. Manage memory and resources

Automatic Fallback

The DirectML backend includes automatic fallback to CPU implementations when: - DirectML is not available on the system - The operation is not supported by DirectML - An error occurs during DirectML execution

This ensures that code written to use DirectML will still work on systems without DirectML support, albeit with reduced performance.

Best Practices

Device Selection

When multiple DirectML-compatible devices are available:

# Python
from neurenix.hardware.directml import DirectMLBackend

# Create the backend
directml = DirectMLBackend()
directml.initialize()

# Get the number of available devices
device_count = directml.get_device_count()

# Get information about all devices
for i in range(device_count):
    device_info = directml.get_device_info(i)
    print(f"Device {i}: {device_info}")

# Select the best device based on your requirements
# (In a real implementation, you would choose based on device capabilities)

Memory Management

For optimal performance with DirectML:

  1. Reuse Tensors: Avoid creating new tensors for intermediate results
  2. Batch Operations: Group operations together when possible
  3. Appropriate Tensor Sizes: Use tensor dimensions that align well with GPU architecture

Performance Optimization

For best performance with DirectML:

  1. Use FP16 when possible: Half-precision can significantly improve performance
  2. Optimize Memory Access Patterns: Ensure tensors are accessed in a GPU-friendly manner
  3. Minimize Host-Device Transfers: Keep data on the GPU as much as possible

Framework Comparison

Neurenix vs. TensorFlow

Feature Neurenix TensorFlow
DirectML Support Native integration Via DirectML plugin
Windows Integration Seamless Requires additional setup
Cross-vendor Support NVIDIA, AMD, Intel GPUs Limited cross-vendor support
Fallback Mechanism Automatic CPU fallback Manual fallback required
API Simplicity Unified API Separate API for DirectML
Integration with Framework Fully integrated Plugin-based integration

Neurenix provides more seamless integration with DirectML compared to TensorFlow, which requires a separate plugin. The unified API in Neurenix makes it easier to use DirectML acceleration while maintaining compatibility with other backends.

Neurenix vs. PyTorch

Feature Neurenix PyTorch
DirectML Support Native integration Via DirectML plugin
Windows Integration Seamless Requires additional setup
Cross-vendor Support NVIDIA, AMD, Intel GPUs Limited cross-vendor support
Fallback Mechanism Automatic CPU fallback Manual fallback required
API Simplicity Unified API Separate API for DirectML
Integration with Framework Fully integrated Plugin-based integration

PyTorch's DirectML support is provided through a separate plugin, which introduces additional complexity. Neurenix's native DirectML support provides a more integrated experience with automatic fallback to CPU when needed.

Neurenix vs. Scikit-Learn

Feature Neurenix Scikit-Learn
DirectML Support Comprehensive support No DirectML support
Hardware Acceleration Multiple acceleration options CPU only
Windows GPU Support Native support No GPU support on Windows
Cross-vendor Support NVIDIA, AMD, Intel GPUs No GPU support
API Consistency Consistent API across devices CPU-only API
Performance on Windows Optimized for Windows GPUs Limited to CPU performance

Scikit-Learn does not provide any DirectML or GPU acceleration support, focusing solely on CPU execution. Neurenix's DirectML support enables significant performance improvements on Windows systems with compatible GPUs.

Limitations

The current DirectML implementation has the following limitations:

  1. Windows Only: DirectML is only available on Windows systems
  2. Limited Operations: Not all operations are optimized for DirectML
  3. Performance Variability: Performance can vary significantly across different GPU vendors
  4. DirectX 12 Requirement: Requires DirectX 12 compatible hardware and up-to-date drivers
  5. Implementation Status: Some features are still in development

Future Work

Future development of the DirectML backend will include:

  1. Expanded Operation Support: Implementation of more operations using DirectML
  2. Performance Optimizations: Further tuning for different GPU architectures
  3. Enhanced Debugging: Better error reporting and debugging tools
  4. DirectML 1.9+ Features: Support for newer DirectML features as they become available
  5. Integration with Windows ML: Closer integration with the Windows ML ecosystem