This document describes the NVIDIA Tensor Cores support in the Phynexus framework.
NVIDIA Tensor Cores are specialized hardware units designed for accelerating matrix multiplication and convolution operations, providing significant performance improvements for deep learning workloads. Phynexus includes support for Tensor Cores alongside its existing support for standard CUDA operations.
The Tensor Cores backend in Phynexus provides:
Tensor Cores are available on NVIDIA GPUs with compute capability 7.0 or higher: - Volta architecture (V100) - Turing architecture (RTX 20-series) - Ampere architecture (A100, RTX 30-series) - Hopper architecture (H100)
# Python
from neurenix.hardware import is_tensor_cores_available
if is_tensor_cores_available():
print("Tensor Cores are available")
else:
print("Tensor Cores are not available")
# Python
from neurenix.hardware import TensorCoresBackend
# Create the backend
tensor_cores = TensorCoresBackend()
# Initialize the backend
if tensor_cores.initialize():
print("Tensor Cores backend initialized successfully")
else:
print("Failed to initialize Tensor Cores backend")
# Python
from neurenix.hardware import TensorCoresBackend
tensor_cores = TensorCoresBackend()
tensor_cores.initialize()
# Set precision mode
tensor_cores.set_precision("mixed") # Options: "fp32", "fp16", "mixed"
# Python
from neurenix.hardware import TensorCoresBackend
from neurenix.nn import Sequential, Linear, ReLU
# Create a model
model = Sequential([
Linear(1024, 1024),
ReLU(),
Linear(1024, 1024),
])
# Create and initialize the backend
tensor_cores = TensorCoresBackend()
tensor_cores.initialize()
# Optimize the model for Tensor Cores
optimized_model = tensor_cores.optimize_model(model, precision="mixed")
The Tensor Cores backend implementation in Phynexus follows the same architecture as other hardware backends:
TensorCoresBackend class that manages Tensor Cores resourcesThe current implementation automatically detects the availability of Tensor Cores and configures the appropriate CUDA libraries to utilize them.
For optimal performance with Tensor Cores, use mixed precision training:
from neurenix.hardware import TensorCoresBackend
from neurenix.nn import Sequential, Linear, ReLU
from neurenix.optim import Adam
# Create a model
model = Sequential([
Linear(1024, 1024),
ReLU(),
Linear(1024, 1024),
])
# Create and initialize the backend
tensor_cores = TensorCoresBackend()
tensor_cores.initialize()
# Set mixed precision
tensor_cores.set_precision("mixed")
# Optimize the model
optimized_model = tensor_cores.optimize_model(model, precision="mixed")
# Train with mixed precision
optimizer = Adam(optimized_model.parameters(), lr=0.001)
For maximum Tensor Cores utilization:
The current Tensor Cores implementation has the following limitations:
Future development of the Tensor Cores backend will include: