← Previous Page

DirectML Support in Phynexus¶

This document describes the DirectML support in the Phynexus framework.

Overview¶

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. It enables AI workloads to run on a wide variety of DirectX 12-capable GPUs on Windows systems. Phynexus includes DirectML support to provide hardware acceleration on Windows devices, particularly those without dedicated CUDA or ROCm support.

DirectML leverages the DirectX 12 API to provide cross-vendor GPU acceleration, allowing models to run efficiently on GPUs from NVIDIA, AMD, and Intel on Windows platforms. This makes it an excellent choice for deploying AI applications on Windows systems with diverse hardware configurations.

Features¶

The DirectML backend in Phynexus provides:

Hardware acceleration on Windows DirectX 12-compatible GPUs
Cross-vendor support (NVIDIA, AMD, Intel)
Integration with the Windows graphics stack
Automatic fallback to CPU when DirectML is unavailable
Support for common deep learning operations
Seamless integration with the rest of the framework

Hardware Requirements¶

DirectML support in Phynexus requires:

Windows 10 version 1903 (May 2019 Update) or newer
DirectX 12 compatible GPU with updated drivers
DirectML runtime (included in Windows or via the DirectML redistributable package)

Compatible hardware includes: - NVIDIA GPUs (Kepler architecture or newer) - AMD GPUs (GCN architecture or newer) - Intel GPUs (Gen9 architecture or newer) - Any other DirectX 12 compatible GPU

Usage¶

Checking for DirectML Availability¶

# Python
from neurenix.hardware.directml import is_directml_available

if is_directml_available():
    print("DirectML is available")
else:
    print("DirectML is not available")

Creating a DirectML Backend¶

# Python
from neurenix.hardware.directml import DirectMLBackend

# Create the backend
try:
    directml = DirectMLBackend()

    # Initialize the backend
    if directml.initialize():
        print("DirectML backend initialized successfully")
    else:
        print("Failed to initialize DirectML backend")
except RuntimeError as e:
    print(f"DirectML error: {e}")

Getting Device Information¶

# Python
from neurenix.hardware.directml import DirectMLBackend

# Create and initialize the backend
directml = DirectMLBackend()
directml.initialize()

# Get the number of available devices
device_count = directml.get_device_count()
print(f"Available DirectML devices: {device_count}")

# Get information about a specific device
device_info = directml.get_device_info(0)  # First device
print(f"Device info: {device_info}")

Using DirectML for Tensor Operations¶

# Python
import neurenix as nx
from neurenix.hardware.directml import DirectMLBackend

# Create tensors
a = nx.Tensor([[1, 2], [3, 4]])
b = nx.Tensor([[5, 6], [7, 8]])

# Create and initialize the backend
directml = DirectMLBackend()
directml.initialize()

# Perform matrix multiplication using DirectML
c = directml.matmul(a, b)
print(f"Result: {c}")

Using DirectML for Convolution¶

# Python
import neurenix as nx
from neurenix.hardware.directml import DirectMLBackend

# Create input and weight tensors
input = nx.random.randn(1, 3, 32, 32)  # Batch, Channels, Height, Width
weight = nx.random.randn(16, 3, 3, 3)  # Out channels, In channels, Kernel H, Kernel W

# Create and initialize the backend
directml = DirectMLBackend()
directml.initialize()

# Perform 2D convolution using DirectML
output = directml.conv2d(
    input=input,
    weight=weight,
    bias=None,
    stride=(1, 1),
    padding=(1, 1)
)
print(f"Output shape: {output.shape}")

Implementation Details¶

The DirectML backend implementation in Phynexus follows a layered architecture:

Python API Layer: Provides a user-friendly interface for DirectML operations
Rust Binding Layer: Connects the Python API to the C++ implementation
C++ Implementation Layer: Implements the core DirectML functionality

DirectML Integration¶

The implementation uses the DirectML API to:

Enumerate and select DirectX 12 devices
Create device contexts and command queues
Compile and execute operators
Manage memory and resources

Automatic Fallback¶

The DirectML backend includes automatic fallback to CPU implementations when: - DirectML is not available on the system - The operation is not supported by DirectML - An error occurs during DirectML execution

This ensures that code written to use DirectML will still work on systems without DirectML support, albeit with reduced performance.

Best Practices¶

Device Selection¶

When multiple DirectML-compatible devices are available:

# Python
from neurenix.hardware.directml import DirectMLBackend

# Create the backend
directml = DirectMLBackend()
directml.initialize()

# Get the number of available devices
device_count = directml.get_device_count()

# Get information about all devices
for i in range(device_count):
    device_info = directml.get_device_info(i)
    print(f"Device {i}: {device_info}")

# Select the best device based on your requirements
# (In a real implementation, you would choose based on device capabilities)

Memory Management¶

For optimal performance with DirectML:

Reuse Tensors: Avoid creating new tensors for intermediate results
Batch Operations: Group operations together when possible
Appropriate Tensor Sizes: Use tensor dimensions that align well with GPU architecture

Performance Optimization¶

For best performance with DirectML:

Use FP16 when possible: Half-precision can significantly improve performance
Optimize Memory Access Patterns: Ensure tensors are accessed in a GPU-friendly manner
Minimize Host-Device Transfers: Keep data on the GPU as much as possible

Framework Comparison¶

Neurenix vs. TensorFlow¶

Feature	Neurenix	TensorFlow
DirectML Support	Native integration	Via DirectML plugin
Windows Integration	Seamless	Requires additional setup
Cross-vendor Support	NVIDIA, AMD, Intel GPUs	Limited cross-vendor support
Fallback Mechanism	Automatic CPU fallback	Manual fallback required
API Simplicity	Unified API	Separate API for DirectML
Integration with Framework	Fully integrated	Plugin-based integration

Neurenix provides more seamless integration with DirectML compared to TensorFlow, which requires a separate plugin. The unified API in Neurenix makes it easier to use DirectML acceleration while maintaining compatibility with other backends.

Neurenix vs. PyTorch¶

Feature	Neurenix	PyTorch
DirectML Support	Native integration	Via DirectML plugin
Windows Integration	Seamless	Requires additional setup
Cross-vendor Support	NVIDIA, AMD, Intel GPUs	Limited cross-vendor support
Fallback Mechanism	Automatic CPU fallback	Manual fallback required
API Simplicity	Unified API	Separate API for DirectML
Integration with Framework	Fully integrated	Plugin-based integration

PyTorch's DirectML support is provided through a separate plugin, which introduces additional complexity. Neurenix's native DirectML support provides a more integrated experience with automatic fallback to CPU when needed.

Neurenix vs. Scikit-Learn¶

Feature	Neurenix	Scikit-Learn
DirectML Support	Comprehensive support	No DirectML support
Hardware Acceleration	Multiple acceleration options	CPU only
Windows GPU Support	Native support	No GPU support on Windows
Cross-vendor Support	NVIDIA, AMD, Intel GPUs	No GPU support
API Consistency	Consistent API across devices	CPU-only API
Performance on Windows	Optimized for Windows GPUs	Limited to CPU performance

Scikit-Learn does not provide any DirectML or GPU acceleration support, focusing solely on CPU execution. Neurenix's DirectML support enables significant performance improvements on Windows systems with compatible GPUs.

Limitations¶

The current DirectML implementation has the following limitations:

Windows Only: DirectML is only available on Windows systems
Limited Operations: Not all operations are optimized for DirectML
Performance Variability: Performance can vary significantly across different GPU vendors
DirectX 12 Requirement: Requires DirectX 12 compatible hardware and up-to-date drivers
Implementation Status: Some features are still in development

Future Work¶

Future development of the DirectML backend will include:

Expanded Operation Support: Implementation of more operations using DirectML
Performance Optimizations: Further tuning for different GPU architectures
Enhanced Debugging: Better error reporting and debugging tools
DirectML 1.9+ Features: Support for newer DirectML features as they become available
Integration with Windows ML: Closer integration with the Windows ML ecosystem

Page

Table Of Contents