This document describes the ARM architecture support in the Phynexus framework.
ARM processors are widely used in mobile devices, embedded systems, and increasingly in servers and desktops. Phynexus includes optimized support for ARM architecture to enable efficient AI workloads on these platforms. This support leverages various ARM-specific technologies for maximum performance.
The ARM backend in Phynexus provides:
The ARM Compute Library is an open-source collection of low-level functions optimized for ARM processors. Phynexus integrates with this library to provide high-performance implementations of common tensor operations.
Neon is ARM's Advanced SIMD (Single Instruction Multiple Data) architecture extension, providing vector processing capabilities. Phynexus leverages Neon instructions for parallel data processing and computation.
SVE is a vector extension for the AArch64 execution state of the ARM architecture. It allows for variable vector lengths, enabling more flexible and efficient vector processing. Phynexus supports SVE where available.
ARM Ethos-U is a microNPU (Neural Processing Unit) designed for efficient ML inference at the edge. Phynexus can utilize this dedicated hardware when available.
# Python
from neurenix.device import Device, DeviceType
# Create an ARM device
arm_device = Device(DeviceType.ARM, 0)
# Check if the device is available
if arm_device.is_available():
print("ARM device is available")
else:
print("ARM device is not available")
# Python
import neurenix as nx
from neurenix.device import Device, DeviceType
# Create an ARM device
arm_device = Device(DeviceType.ARM, 0)
# Create a tensor on the ARM device
tensor = nx.Tensor([1, 2, 3, 4], device=arm_device)
# Python
from neurenix.hardware.arm import use_arm_compute_library
# Enable ARM Compute Library
use_arm_compute_library(True)
# Run operations optimized by ARM Compute Library
# Python
from neurenix.hardware.arm import (
is_neon_available,
is_sve_available,
is_ethos_available
)
# Check for Neon SIMD
if is_neon_available():
print("Neon SIMD is available")
# Check for SVE
if is_sve_available():
print("SVE is available")
# Check for Ethos-U/NPU
if is_ethos_available():
print("Ethos-U/NPU is available")
The ARM backend implementation in Phynexus follows the same architecture as other hardware backends:
DeviceType::ARM enum value to identify ARM devicesFor optimal performance on ARM devices, ensure data is properly aligned:
# Python
import neurenix as nx
from neurenix.device import Device, DeviceType
from neurenix.hardware.arm import get_optimal_alignment
# Get optimal alignment for the current ARM device
alignment = get_optimal_alignment()
# Create aligned tensor
tensor = nx.Tensor([1, 2, 3, 4], device=Device(DeviceType.ARM, 0), alignment=alignment)
For Neon SIMD operations, performance is best when: - Tensor dimensions are multiples of 4 (for 32-bit types) - Tensor dimensions are multiples of 8 (for 16-bit types) - Tensor dimensions are multiples of 16 (for 8-bit types)
On mobile devices, consider power efficiency:
# Python
from neurenix.hardware.arm import set_power_efficiency_mode
# Enable power efficiency mode (trades some performance for better battery life)
set_power_efficiency_mode(True)
The current ARM implementation has the following limitations:
Future development of the ARM backend will include: