Page

Table Of Contents


Built with 🛠 MkDocs - Theme 🖤 Github.

ARM Support in Phynexus

This document describes the ARM architecture support in the Phynexus framework.

Overview

ARM processors are widely used in mobile devices, embedded systems, and increasingly in servers and desktops. Phynexus includes optimized support for ARM architecture to enable efficient AI workloads on these platforms. This support leverages various ARM-specific technologies for maximum performance.

Features

The ARM backend in Phynexus provides:

ARM Technologies

ARM Compute Library

The ARM Compute Library is an open-source collection of low-level functions optimized for ARM processors. Phynexus integrates with this library to provide high-performance implementations of common tensor operations.

Neon SIMD

Neon is ARM's Advanced SIMD (Single Instruction Multiple Data) architecture extension, providing vector processing capabilities. Phynexus leverages Neon instructions for parallel data processing and computation.

Scalable Vector Extensions (SVE)

SVE is a vector extension for the AArch64 execution state of the ARM architecture. It allows for variable vector lengths, enabling more flexible and efficient vector processing. Phynexus supports SVE where available.

Ethos-U/NPU

ARM Ethos-U is a microNPU (Neural Processing Unit) designed for efficient ML inference at the edge. Phynexus can utilize this dedicated hardware when available.

Usage

Creating an ARM Device

# Python
from neurenix.device import Device, DeviceType

# Create an ARM device
arm_device = Device(DeviceType.ARM, 0)

# Check if the device is available
if arm_device.is_available():
    print("ARM device is available")
else:
    print("ARM device is not available")

Creating a Tensor on ARM

# Python
import neurenix as nx
from neurenix.device import Device, DeviceType

# Create an ARM device
arm_device = Device(DeviceType.ARM, 0)

# Create a tensor on the ARM device
tensor = nx.Tensor([1, 2, 3, 4], device=arm_device)

Using ARM Compute Library

# Python
from neurenix.hardware.arm import use_arm_compute_library

# Enable ARM Compute Library
use_arm_compute_library(True)

# Run operations optimized by ARM Compute Library

Checking for Specific ARM Features

# Python
from neurenix.hardware.arm import (
    is_neon_available,
    is_sve_available,
    is_ethos_available
)

# Check for Neon SIMD
if is_neon_available():
    print("Neon SIMD is available")

# Check for SVE
if is_sve_available():
    print("SVE is available")

# Check for Ethos-U/NPU
if is_ethos_available():
    print("Ethos-U/NPU is available")

Implementation Details

The ARM backend implementation in Phynexus follows the same architecture as other hardware backends:

  1. A DeviceType::ARM enum value to identify ARM devices
  2. ARM-specific memory management functions
  3. ARM-specific implementations of tensor operations
  4. Multiple backend implementations leveraging different ARM technologies:
  5. Neon SIMD for vectorized operations
  6. SVE for scalable vector processing
  7. ARM Compute Library for optimized functions
  8. Ethos-U/NPU for dedicated neural processing

Best Practices

Memory Alignment

For optimal performance on ARM devices, ensure data is properly aligned:

# Python
import neurenix as nx
from neurenix.device import Device, DeviceType
from neurenix.hardware.arm import get_optimal_alignment

# Get optimal alignment for the current ARM device
alignment = get_optimal_alignment()

# Create aligned tensor
tensor = nx.Tensor([1, 2, 3, 4], device=Device(DeviceType.ARM, 0), alignment=alignment)

Tensor Dimensions

For Neon SIMD operations, performance is best when: - Tensor dimensions are multiples of 4 (for 32-bit types) - Tensor dimensions are multiples of 8 (for 16-bit types) - Tensor dimensions are multiples of 16 (for 8-bit types)

Power Management

On mobile devices, consider power efficiency:

# Python
from neurenix.hardware.arm import set_power_efficiency_mode

# Enable power efficiency mode (trades some performance for better battery life)
set_power_efficiency_mode(True)

Limitations

The current ARM implementation has the following limitations:

  1. Some advanced operations may fall back to generic CPU implementations
  2. Ethos-U/NPU support is limited to specific hardware configurations
  3. SVE is only available on newer ARM processors

Future Work

Future development of the ARM backend will include:

  1. Expanded support for ARM Mali GPU compute
  2. Improved Ethos-U/NPU integration
  3. Optimization for ARM Cortex-A, Cortex-R, and Cortex-M series processors
  4. Enhanced support for ARM big.LITTLE and DynamIQ architectures