The Hugging Face Integration module in Neurenix provides seamless interoperability with the Hugging Face ecosystem, enabling users to leverage pre-trained models, datasets, and tools from the Hugging Face Hub within the Neurenix framework. This integration allows researchers and developers to combine the performance benefits of Neurenix's multi-language architecture with the vast resources available in the Hugging Face ecosystem.
Built on Neurenix's high-performance Rust and C++ backends with a user-friendly Python interface, this module ensures efficient execution of Hugging Face models while maintaining compatibility with the broader Hugging Face ecosystem. The integration supports a wide range of model architectures, including transformers for natural language processing, vision transformers for computer vision, and multimodal models.
The module provides utilities for model conversion, fine-tuning, inference, and deployment, making it easy to incorporate state-of-the-art pre-trained models into Neurenix-based applications. It also includes tools for dataset access and processing, enabling seamless integration with the Hugging Face datasets library.
The module provides tools for converting models between Hugging Face's format and Neurenix's native format:
This bidirectional conversion ensures seamless integration between the two ecosystems.
The module provides direct access to pre-trained models from the Hugging Face Hub:
These features enable users to leverage the vast collection of pre-trained models available on the Hugging Face Hub.
The module includes tools for fine-tuning Hugging Face models using Neurenix's training infrastructure:
These capabilities enable efficient adaptation of pre-trained models to specific tasks and domains.
The module provides optimized inference for Hugging Face models using Neurenix's hardware acceleration:
These features ensure efficient execution of Hugging Face models in production environments.
The module includes tools for working with Hugging Face datasets:
These capabilities enable seamless integration with the Hugging Face datasets ecosystem.
import neurenix
from neurenix.huggingface import from_pretrained, to_pretrained
# Import a Hugging Face model to Neurenix
model = neurenix.huggingface.from_pretrained(
model_name_or_path="bert-base-uncased",
task="text-classification",
num_labels=2,
device="cuda" if neurenix.cuda.is_available() else "cpu"
)
# Export a Neurenix model to Hugging Face format
neurenix.huggingface.to_pretrained(
model=model,
save_directory="./my_exported_model",
push_to_hub=True,
repository_id="username/model-name",
token="hf_token" # Optional, can also use environment variable
)
from neurenix.huggingface import list_models, get_model_info
# List available models for a specific task
models = neurenix.huggingface.list_models(
task="text-classification",
filter_by_language="en",
sort_by="downloads",
direction="descending",
limit=10
)
# Get information about a specific model
model_info = neurenix.huggingface.get_model_info("bert-base-uncased")
print(f"Model: {model_info.name}")
print(f"Downloads: {model_info.downloads}")
print(f"Task: {model_info.task}")
print(f"Architecture: {model_info.architecture}")
# Download and cache a model for offline use
neurenix.huggingface.cache_model(
model_name_or_path="bert-base-uncased",
force_download=False,
resume_download=True,
proxies=None,
local_files_only=False
)
from neurenix.huggingface import Trainer, TrainingArguments
# Define training arguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=10,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
metric_for_best_model="accuracy"
)
# Create a trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
optimizers=(optimizer, scheduler)
)
# Fine-tune the model
trainer.train()
# Evaluate the model
results = trainer.evaluate()
# Save the fine-tuned model
trainer.save_model("./fine_tuned_model")
from neurenix.huggingface import LoraConfig, PeftModel
# Configure LoRA for parameter-efficient fine-tuning
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["query", "key", "value"],
lora_dropout=0.05,
bias="none",
task_type="SEQ_CLS"
)
# Create a PEFT model
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
peft_model = PeftModel.from_pretrained(
model=model,
peft_config=lora_config
)
# Fine-tune the PEFT model
trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset
)
trainer.train()
from neurenix.huggingface import Pipeline
# Create an inference pipeline
pipeline = Pipeline(
task="text-classification",
model=model,
tokenizer=tokenizer,
device=0 if neurenix.cuda.is_available() else -1,
batch_size=32
)
# Run inference
results = pipeline(
texts=["I love this product!", "This product is terrible."],
top_k=1
)
# Create a zero-shot classification pipeline
zero_shot_pipeline = Pipeline(
task="zero-shot-classification",
model="facebook/bart-large-mnli",
device=0 if neurenix.cuda.is_available() else -1
)
# Run zero-shot classification
zero_shot_results = zero_shot_pipeline(
texts=["I love this product!"],
candidate_labels=["positive", "negative", "neutral"],
hypothesis_template="This text expresses a {} sentiment."
)
from neurenix.huggingface import OptimizedModel
# Create an optimized model for inference
optimized_model = OptimizedModel.from_pretrained(
model_name_or_path="bert-base-uncased",
optimization_level=3, # 0-3, higher means more aggressive optimization
quantization="int8", # None, "int8", "fp16", "fp8"
device="cuda" if neurenix.cuda.is_available() else "cpu",
batch_size=32,
sequence_length=128,
enable_caching=True
)
# Run inference with the optimized model
outputs = optimized_model(
input_ids=input_ids,
attention_mask=attention_mask
)
from neurenix.huggingface import load_dataset, DatasetDict
# Load a dataset from the Hugging Face Hub
dataset = neurenix.huggingface.load_dataset(
path="glue",
name="mrpc",
split="train",
cache_dir="./cache",
use_auth_token=None
)
# Create a DatasetDict
dataset_dict = DatasetDict({
"train": dataset.select(range(3000)),
"validation": dataset.select(range(3000, 3500)),
"test": dataset.select(range(3500, 4000))
})
# Apply preprocessing to the dataset
def preprocess_function(examples):
return tokenizer(
examples["sentence1"],
examples["sentence2"],
truncation=True,
padding="max_length",
max_length=128
)
processed_dataset = dataset.map(
preprocess_function,
batched=True,
num_proc=4,
remove_columns=["sentence1", "sentence2", "idx"]
)
# Create a Neurenix DataLoader from the dataset
dataloader = neurenix.data.DataLoader(
processed_dataset,
batch_size=32,
shuffle=True,
num_workers=4
)
| Feature | Neurenix Hugging Face Integration | TensorFlow Hugging Face Integration |
|---|---|---|
| Performance | Multi-language implementation with Rust/C++ backends | Python implementation with TensorFlow backend |
| Hardware Support | Comprehensive support for various hardware | Primarily focused on TPUs and GPUs |
| Model Conversion | Bidirectional conversion with weight mapping | One-way conversion with limited mapping |
| Optimization | Advanced optimization techniques for inference | Basic optimization through TensorFlow tools |
| Edge Device Support | Native support for edge devices | Limited through TensorFlow Lite |
| Parameter-Efficient Fine-tuning | Comprehensive support for various techniques | Limited support through separate libraries |
| Integration Depth | Deep integration with Neurenix ecosystem | Separate library with limited integration |
| Deployment Options | Multiple deployment options with hardware optimization | Limited to TensorFlow Serving and TF Lite |
Neurenix's Hugging Face Integration provides better performance through its multi-language implementation and offers more comprehensive hardware support, especially for edge devices. It also provides more advanced optimization techniques and deeper integration with the Neurenix ecosystem.
| Feature | Neurenix Hugging Face Integration | PyTorch Hugging Face Integration |
|---|---|---|
| Performance | Multi-language implementation with Rust/C++ backends | Python implementation with PyTorch backend |
| Hardware Support | Comprehensive support for various hardware | Primarily focused on CUDA devices |
| Model Conversion | Bidirectional conversion with automatic adaptation | Native format with minimal conversion needed |
| Optimization | Integrated optimization pipeline | Separate optimization tools |
| Edge Device Support | Native support for edge devices | Limited through separate tools |
| Parameter-Efficient Fine-tuning | Integrated support for various techniques | Available through separate libraries |
| Integration Depth | Deep integration with Neurenix ecosystem | Native integration with PyTorch |
| Deployment Options | Multiple deployment options with hardware optimization | Limited deployment options |
While PyTorch has native integration with Hugging Face, Neurenix's integration offers better performance through its multi-language implementation and provides more comprehensive hardware support, especially for edge devices. It also offers an integrated optimization pipeline and better support for deployment.
| Feature | Neurenix Hugging Face Integration | Scikit-Learn Hugging Face Integration |
|---|---|---|
| Model Types | Comprehensive support for various model architectures | Limited to specific model types |
| Hardware Acceleration | Native support for various hardware accelerators | Limited hardware acceleration |
| Deep Learning Support | Full support for deep learning models | Limited deep learning capabilities |
| Scalability | Scales to large models and datasets | Limited scalability for large models |
| Integration Depth | Deep integration with Neurenix ecosystem | Limited integration through adapters |
| Deployment Options | Multiple deployment options with hardware optimization | Limited deployment options |
| Parameter-Efficient Fine-tuning | Comprehensive support for various techniques | Limited support for fine-tuning |
| Edge Device Support | Native support for edge devices | Limited edge support |
Scikit-Learn's integration with Hugging Face is limited compared to Neurenix's, especially for deep learning models and hardware acceleration. Neurenix provides more comprehensive support for various model architectures, better scalability, and more deployment options.
# For simple models, use direct conversion
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
# For complex models, use the advanced conversion with custom mapping
model = neurenix.huggingface.from_pretrained(
"t5-base",
weight_mapping={
"encoder.block.0.layer.0.SelfAttention.q": "encoder.layers.0.self_attn.q_proj",
"encoder.block.0.layer.0.SelfAttention.k": "encoder.layers.0.self_attn.k_proj",
"encoder.block.0.layer.0.SelfAttention.v": "encoder.layers.0.self_attn.v_proj",
# ... other mappings
}
)
# Original Hugging Face model
import transformers
hf_model = transformers.AutoModel.from_pretrained("bert-base-uncased")
hf_outputs = hf_model(input_ids, attention_mask)
# Converted Neurenix model
nx_model = neurenix.huggingface.from_pretrained("bert-base-uncased")
nx_outputs = nx_model(input_ids, attention_mask)
# Verify outputs are similar
import neurenix.nn.functional as F
similarity = F.cosine_similarity(
neurenix.tensor(hf_outputs.last_hidden_state.detach().numpy()),
nx_outputs.last_hidden_state
)
print(f"Output similarity: {similarity.mean().item()}")
# Convert and cache a model
neurenix.huggingface.from_pretrained(
"bert-base-uncased",
cache_dir="./model_cache",
force_convert=False # Use cached version if available
)
# For full fine-tuning with sufficient resources
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
trainer = Trainer(model=model, ...)
# For parameter-efficient fine-tuning with limited resources
from neurenix.huggingface import LoraConfig, PeftModel
lora_config = LoraConfig(r=8, lora_alpha=16, ...)
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
peft_model = PeftModel.from_pretrained(model, lora_config)
trainer = Trainer(model=peft_model, ...)
training_args = TrainingArguments(
fp16=True, # Enable mixed precision training
fp16_opt_level="O1", # Optimization level
# ... other arguments
)
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4, # Effective batch size of 16
# ... other arguments
)
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = neurenix.argmax(logits, axis=-1)
return {
"accuracy": (predictions == labels).mean().item(),
"f1": f1_score(labels, predictions, average="weighted")
}
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
# ... other arguments
)
# Inefficient: Process inputs one by one
results = []
for text in texts:
result = pipeline(text)
results.append(result)
# Efficient: Process inputs in batches
results = pipeline(texts, batch_size=32)
# Create a quantized model for inference
quantized_model = neurenix.huggingface.OptimizedModel.from_pretrained(
"bert-base-uncased",
quantization="int8",
device="cuda" if neurenix.cuda.is_available() else "cpu"
)
# Enable caching for the model
model = neurenix.huggingface.from_pretrained(
"bert-base-uncased",
enable_caching=True,
cache_size=100 # Cache size in MB
)
# Enable tensor core optimizations for NVIDIA GPUs
if neurenix.cuda.is_available() and neurenix.cuda.is_tensor_cores_available():
model = neurenix.huggingface.from_pretrained(
"bert-base-uncased",
enable_tensor_cores=True
)
import neurenix
from neurenix.huggingface import from_pretrained, Trainer, TrainingArguments
from neurenix.data import DataLoader
import numpy as np
# Load a pre-trained model from Hugging Face
model = neurenix.huggingface.from_pretrained(
model_name_or_path="bert-base-uncased",
task="text-classification",
num_labels=2,
device="cuda" if neurenix.cuda.is_available() else "cpu"
)
# Load a tokenizer
tokenizer = neurenix.huggingface.AutoTokenizer.from_pretrained("bert-base-uncased")
# Load and preprocess a dataset
dataset = neurenix.huggingface.load_dataset("glue", "mrpc")
def preprocess_function(examples):
return tokenizer(
examples["sentence1"],
examples["sentence2"],
truncation=True,
padding="max_length",
max_length=128
)
processed_dataset = dataset.map(
preprocess_function,
batched=True,
num_proc=4,
remove_columns=["sentence1", "sentence2", "idx"]
)
# Split the dataset
train_dataset = processed_dataset["train"]
eval_dataset = processed_dataset["validation"]
# Define training arguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=10,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
metric_for_best_model="accuracy",
fp16=True # Enable mixed precision training
)
# Define metrics computation function
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
accuracy = (predictions == labels).mean()
return {"accuracy": accuracy}
# Create a trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics
)
# Fine-tune the model
trainer.train()
# Evaluate the model
results = trainer.evaluate()
print(f"Evaluation results: {results}")
# Save the fine-tuned model
trainer.save_model("./fine_tuned_model")
# Export the model to Hugging Face format
neurenix.huggingface.to_pretrained(
model=model,
save_directory="./exported_model",
push_to_hub=True,
repository_id="username/bert-finetuned-mrpc",
token="hf_token" # Optional, can also use environment variable
)
import neurenix
from neurenix.huggingface import from_pretrained, LoraConfig, PeftModel, Trainer, TrainingArguments
# Load a pre-trained model from Hugging Face
base_model = neurenix.huggingface.from_pretrained(
model_name_or_path="gpt2",
device="cuda" if neurenix.cuda.is_available() else "cpu"
)
# Configure LoRA for parameter-efficient fine-tuning
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["c_attn", "c_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
# Create a PEFT model
peft_model = PeftModel.from_pretrained(
model=base_model,
peft_config=lora_config
)
# Load a tokenizer
tokenizer = neurenix.huggingface.AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
# Load and preprocess a dataset
dataset = neurenix.huggingface.load_dataset("imdb")
def preprocess_function(examples):
return tokenizer(
examples["text"],
truncation=True,
padding="max_length",
max_length=512
)
processed_dataset = dataset.map(
preprocess_function,
batched=True,
num_proc=4,
remove_columns=["text"]
)
# Split the dataset
train_dataset = processed_dataset["train"].select(range(10000)) # Subsample for faster training
eval_dataset = processed_dataset["test"].select(range(1000))
# Define training arguments
training_args = TrainingArguments(
output_dir="./results_lora",
num_train_epochs=1,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
per_device_eval_batch_size=16,
warmup_steps=100,
weight_decay=0.01,
logging_dir="./logs_lora",
logging_steps=10,
evaluation_strategy="steps",
eval_steps=500,
save_strategy="steps",
save_steps=500,
load_best_model_at_end=True,
fp16=True
)
# Create a trainer
trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset
)
# Fine-tune the model
trainer.train()
# Evaluate the model
results = trainer.evaluate()
print(f"Evaluation results: {results}")
# Save the fine-tuned LoRA model
peft_model.save_pretrained("./lora_finetuned_model")
# Merge LoRA weights with the base model
merged_model = peft_model.merge_and_unload()
# Generate text with the fine-tuned model
input_text = "This movie was"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(peft_model.device)
with neurenix.no_grad():
output = merged_model.generate(
input_ids=input_ids,
max_length=100,
num_return_sequences=1,
temperature=0.7,
top_p=0.9
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
import neurenix
from neurenix.huggingface import OptimizedModel, Pipeline
import time
# Load and optimize a model for inference
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
optimized_model = neurenix.huggingface.OptimizedModel.from_pretrained(
model_name_or_path=model_name,
optimization_level=3,
quantization="int8",
device="cuda" if neurenix.cuda.is_available() else "cpu",
batch_size=32,
sequence_length=128,
enable_caching=True
)
# Load a tokenizer
tokenizer = neurenix.huggingface.AutoTokenizer.from_pretrained(model_name)
# Create an optimized pipeline
pipeline = Pipeline(
task="text-classification",
model=optimized_model,
tokenizer=tokenizer,
device=0 if neurenix.cuda.is_available() else -1,
batch_size=32
)
# Prepare test data
test_texts = [
"I love this product! It's amazing and works perfectly.",
"This is the worst purchase I've ever made. Completely disappointed.",
"The product is okay, but not worth the price.",
"Exceeded my expectations in every way possible.",
"It's decent, but there are better alternatives available."
] * 100 # Repeat to create a larger test set
# Warm up the pipeline
_ = pipeline(test_texts[:5])
# Benchmark inference
start_time = time.time()
results = pipeline(test_texts)
end_time = time.time()
# Print results
print(f"Processed {len(test_texts)} texts in {end_time - start_time:.2f} seconds")
print(f"Average time per text: {(end_time - start_time) / len(test_texts) * 1000:.2f} ms")
print(f"Throughput: {len(test_texts) / (end_time - start_time):.2f} texts/second")
# Print some example results
for i, (text, result) in enumerate(zip(test_texts[:5], results[:5])):
label = result["label"]
score = result["score"]
print(f"Text: {text}")
print(f"Prediction: {label} (confidence: {score:.4f})")
print()
# Export the optimized model
optimized_model.save("./optimized_model")
# Create a model for edge deployment
edge_model = neurenix.huggingface.OptimizedModel.from_pretrained(
model_name_or_path=model_name,
optimization_level=3,
quantization="int8",
target_device="edge",
export_format="onnx"
)
# Save the edge model
edge_model.save("./edge_model")
The Hugging Face Integration module in Neurenix provides a comprehensive set of tools for leveraging the vast ecosystem of pre-trained models, datasets, and tools available on the Hugging Face Hub. By combining Neurenix's high-performance multi-language architecture with Hugging Face's resources, users can build powerful machine learning applications with state-of-the-art models while benefiting from Neurenix's hardware acceleration and optimization capabilities.
Compared to other frameworks like TensorFlow, PyTorch, and Scikit-Learn, Neurenix's Hugging Face Integration offers advantages in terms of performance, hardware support, and optimization capabilities. Its deep integration with the Neurenix ecosystem ensures seamless interoperability while maintaining compatibility with the broader Hugging Face ecosystem.
By following the best practices and tutorials outlined in this documentation, users can efficiently convert, fine-tune, and deploy Hugging Face models using Neurenix, taking advantage of the strengths of both ecosystems to build high-performance machine learning applications.