Page

Table Of Contents


Built with 🛠 MkDocs - Theme 🖤 Github.

Hugging Face Integration Module

Overview

The Hugging Face Integration module in Neurenix provides seamless interoperability with the Hugging Face ecosystem, enabling users to leverage pre-trained models, datasets, and tools from the Hugging Face Hub within the Neurenix framework. This integration allows researchers and developers to combine the performance benefits of Neurenix's multi-language architecture with the vast resources available in the Hugging Face ecosystem.

Built on Neurenix's high-performance Rust and C++ backends with a user-friendly Python interface, this module ensures efficient execution of Hugging Face models while maintaining compatibility with the broader Hugging Face ecosystem. The integration supports a wide range of model architectures, including transformers for natural language processing, vision transformers for computer vision, and multimodal models.

The module provides utilities for model conversion, fine-tuning, inference, and deployment, making it easy to incorporate state-of-the-art pre-trained models into Neurenix-based applications. It also includes tools for dataset access and processing, enabling seamless integration with the Hugging Face datasets library.

Key Concepts

Model Conversion and Compatibility

The module provides tools for converting models between Hugging Face's format and Neurenix's native format:

This bidirectional conversion ensures seamless integration between the two ecosystems.

Pre-trained Model Access

The module provides direct access to pre-trained models from the Hugging Face Hub:

These features enable users to leverage the vast collection of pre-trained models available on the Hugging Face Hub.

Fine-tuning and Adaptation

The module includes tools for fine-tuning Hugging Face models using Neurenix's training infrastructure:

These capabilities enable efficient adaptation of pre-trained models to specific tasks and domains.

Inference and Deployment

The module provides optimized inference for Hugging Face models using Neurenix's hardware acceleration:

These features ensure efficient execution of Hugging Face models in production environments.

Dataset Integration

The module includes tools for working with Hugging Face datasets:

These capabilities enable seamless integration with the Hugging Face datasets ecosystem.

API Reference

Model Conversion

import neurenix
from neurenix.huggingface import from_pretrained, to_pretrained

# Import a Hugging Face model to Neurenix
model = neurenix.huggingface.from_pretrained(
    model_name_or_path="bert-base-uncased",
    task="text-classification",
    num_labels=2,
    device="cuda" if neurenix.cuda.is_available() else "cpu"
)

# Export a Neurenix model to Hugging Face format
neurenix.huggingface.to_pretrained(
    model=model,
    save_directory="./my_exported_model",
    push_to_hub=True,
    repository_id="username/model-name",
    token="hf_token"  # Optional, can also use environment variable
)

Model Access and Management

from neurenix.huggingface import list_models, get_model_info

# List available models for a specific task
models = neurenix.huggingface.list_models(
    task="text-classification",
    filter_by_language="en",
    sort_by="downloads",
    direction="descending",
    limit=10
)

# Get information about a specific model
model_info = neurenix.huggingface.get_model_info("bert-base-uncased")
print(f"Model: {model_info.name}")
print(f"Downloads: {model_info.downloads}")
print(f"Task: {model_info.task}")
print(f"Architecture: {model_info.architecture}")

# Download and cache a model for offline use
neurenix.huggingface.cache_model(
    model_name_or_path="bert-base-uncased",
    force_download=False,
    resume_download=True,
    proxies=None,
    local_files_only=False
)

Fine-tuning

from neurenix.huggingface import Trainer, TrainingArguments

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy"
)

# Create a trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    optimizers=(optimizer, scheduler)
)

# Fine-tune the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()

# Save the fine-tuned model
trainer.save_model("./fine_tuned_model")

Parameter-Efficient Fine-tuning

from neurenix.huggingface import LoraConfig, PeftModel

# Configure LoRA for parameter-efficient fine-tuning
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query", "key", "value"],
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_CLS"
)

# Create a PEFT model
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
peft_model = PeftModel.from_pretrained(
    model=model,
    peft_config=lora_config
)

# Fine-tune the PEFT model
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)
trainer.train()

Inference

from neurenix.huggingface import Pipeline

# Create an inference pipeline
pipeline = Pipeline(
    task="text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0 if neurenix.cuda.is_available() else -1,
    batch_size=32
)

# Run inference
results = pipeline(
    texts=["I love this product!", "This product is terrible."],
    top_k=1
)

# Create a zero-shot classification pipeline
zero_shot_pipeline = Pipeline(
    task="zero-shot-classification",
    model="facebook/bart-large-mnli",
    device=0 if neurenix.cuda.is_available() else -1
)

# Run zero-shot classification
zero_shot_results = zero_shot_pipeline(
    texts=["I love this product!"],
    candidate_labels=["positive", "negative", "neutral"],
    hypothesis_template="This text expresses a {} sentiment."
)

Optimized Inference

from neurenix.huggingface import OptimizedModel

# Create an optimized model for inference
optimized_model = OptimizedModel.from_pretrained(
    model_name_or_path="bert-base-uncased",
    optimization_level=3,  # 0-3, higher means more aggressive optimization
    quantization="int8",  # None, "int8", "fp16", "fp8"
    device="cuda" if neurenix.cuda.is_available() else "cpu",
    batch_size=32,
    sequence_length=128,
    enable_caching=True
)

# Run inference with the optimized model
outputs = optimized_model(
    input_ids=input_ids,
    attention_mask=attention_mask
)

Dataset Integration

from neurenix.huggingface import load_dataset, DatasetDict

# Load a dataset from the Hugging Face Hub
dataset = neurenix.huggingface.load_dataset(
    path="glue",
    name="mrpc",
    split="train",
    cache_dir="./cache",
    use_auth_token=None
)

# Create a DatasetDict
dataset_dict = DatasetDict({
    "train": dataset.select(range(3000)),
    "validation": dataset.select(range(3000, 3500)),
    "test": dataset.select(range(3500, 4000))
})

# Apply preprocessing to the dataset
def preprocess_function(examples):
    return tokenizer(
        examples["sentence1"],
        examples["sentence2"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

processed_dataset = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=4,
    remove_columns=["sentence1", "sentence2", "idx"]
)

# Create a Neurenix DataLoader from the dataset
dataloader = neurenix.data.DataLoader(
    processed_dataset,
    batch_size=32,
    shuffle=True,
    num_workers=4
)

Framework Comparison

Neurenix Hugging Face Integration vs. TensorFlow Hugging Face Integration

Feature Neurenix Hugging Face Integration TensorFlow Hugging Face Integration
Performance Multi-language implementation with Rust/C++ backends Python implementation with TensorFlow backend
Hardware Support Comprehensive support for various hardware Primarily focused on TPUs and GPUs
Model Conversion Bidirectional conversion with weight mapping One-way conversion with limited mapping
Optimization Advanced optimization techniques for inference Basic optimization through TensorFlow tools
Edge Device Support Native support for edge devices Limited through TensorFlow Lite
Parameter-Efficient Fine-tuning Comprehensive support for various techniques Limited support through separate libraries
Integration Depth Deep integration with Neurenix ecosystem Separate library with limited integration
Deployment Options Multiple deployment options with hardware optimization Limited to TensorFlow Serving and TF Lite

Neurenix's Hugging Face Integration provides better performance through its multi-language implementation and offers more comprehensive hardware support, especially for edge devices. It also provides more advanced optimization techniques and deeper integration with the Neurenix ecosystem.

Neurenix Hugging Face Integration vs. PyTorch Hugging Face Integration

Feature Neurenix Hugging Face Integration PyTorch Hugging Face Integration
Performance Multi-language implementation with Rust/C++ backends Python implementation with PyTorch backend
Hardware Support Comprehensive support for various hardware Primarily focused on CUDA devices
Model Conversion Bidirectional conversion with automatic adaptation Native format with minimal conversion needed
Optimization Integrated optimization pipeline Separate optimization tools
Edge Device Support Native support for edge devices Limited through separate tools
Parameter-Efficient Fine-tuning Integrated support for various techniques Available through separate libraries
Integration Depth Deep integration with Neurenix ecosystem Native integration with PyTorch
Deployment Options Multiple deployment options with hardware optimization Limited deployment options

While PyTorch has native integration with Hugging Face, Neurenix's integration offers better performance through its multi-language implementation and provides more comprehensive hardware support, especially for edge devices. It also offers an integrated optimization pipeline and better support for deployment.

Neurenix Hugging Face Integration vs. Scikit-Learn Hugging Face Integration

Feature Neurenix Hugging Face Integration Scikit-Learn Hugging Face Integration
Model Types Comprehensive support for various model architectures Limited to specific model types
Hardware Acceleration Native support for various hardware accelerators Limited hardware acceleration
Deep Learning Support Full support for deep learning models Limited deep learning capabilities
Scalability Scales to large models and datasets Limited scalability for large models
Integration Depth Deep integration with Neurenix ecosystem Limited integration through adapters
Deployment Options Multiple deployment options with hardware optimization Limited deployment options
Parameter-Efficient Fine-tuning Comprehensive support for various techniques Limited support for fine-tuning
Edge Device Support Native support for edge devices Limited edge support

Scikit-Learn's integration with Hugging Face is limited compared to Neurenix's, especially for deep learning models and hardware acceleration. Neurenix provides more comprehensive support for various model architectures, better scalability, and more deployment options.

Best Practices

Efficient Model Conversion

  1. Use the Right Conversion Strategy: Choose the appropriate conversion strategy based on your needs.
# For simple models, use direct conversion
model = neurenix.huggingface.from_pretrained("bert-base-uncased")

# For complex models, use the advanced conversion with custom mapping
model = neurenix.huggingface.from_pretrained(
    "t5-base",
    weight_mapping={
        "encoder.block.0.layer.0.SelfAttention.q": "encoder.layers.0.self_attn.q_proj",
        "encoder.block.0.layer.0.SelfAttention.k": "encoder.layers.0.self_attn.k_proj",
        "encoder.block.0.layer.0.SelfAttention.v": "encoder.layers.0.self_attn.v_proj",
        # ... other mappings
    }
)
  1. Verify Conversion Accuracy: Always verify that the converted model produces the same outputs as the original.
# Original Hugging Face model
import transformers
hf_model = transformers.AutoModel.from_pretrained("bert-base-uncased")
hf_outputs = hf_model(input_ids, attention_mask)

# Converted Neurenix model
nx_model = neurenix.huggingface.from_pretrained("bert-base-uncased")
nx_outputs = nx_model(input_ids, attention_mask)

# Verify outputs are similar
import neurenix.nn.functional as F
similarity = F.cosine_similarity(
    neurenix.tensor(hf_outputs.last_hidden_state.detach().numpy()),
    nx_outputs.last_hidden_state
)
print(f"Output similarity: {similarity.mean().item()}")
  1. Cache Converted Models: Cache converted models to avoid repeated conversion.
# Convert and cache a model
neurenix.huggingface.from_pretrained(
    "bert-base-uncased",
    cache_dir="./model_cache",
    force_convert=False  # Use cached version if available
)

Optimizing Fine-tuning

  1. Choose the Right Fine-tuning Approach: Select the appropriate fine-tuning approach based on your resources and requirements.
# For full fine-tuning with sufficient resources
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
trainer = Trainer(model=model, ...)

# For parameter-efficient fine-tuning with limited resources
from neurenix.huggingface import LoraConfig, PeftModel
lora_config = LoraConfig(r=8, lora_alpha=16, ...)
model = neurenix.huggingface.from_pretrained("bert-base-uncased")
peft_model = PeftModel.from_pretrained(model, lora_config)
trainer = Trainer(model=peft_model, ...)
  1. Use Mixed Precision Training: Enable mixed precision training for faster fine-tuning.
training_args = TrainingArguments(
    fp16=True,  # Enable mixed precision training
    fp16_opt_level="O1",  # Optimization level
    # ... other arguments
)
  1. Implement Gradient Accumulation: Use gradient accumulation for larger effective batch sizes.
training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,  # Effective batch size of 16
    # ... other arguments
)
  1. Monitor Training Metrics: Track relevant metrics during fine-tuning.
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = neurenix.argmax(logits, axis=-1)
    return {
        "accuracy": (predictions == labels).mean().item(),
        "f1": f1_score(labels, predictions, average="weighted")
    }

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    # ... other arguments
)

Optimizing Inference

  1. Batch Inputs for Efficiency: Process inputs in batches for better throughput.
# Inefficient: Process inputs one by one
results = []
for text in texts:
    result = pipeline(text)
    results.append(result)

# Efficient: Process inputs in batches
results = pipeline(texts, batch_size=32)
  1. Use Quantization for Faster Inference: Apply quantization to reduce model size and improve inference speed.
# Create a quantized model for inference
quantized_model = neurenix.huggingface.OptimizedModel.from_pretrained(
    "bert-base-uncased",
    quantization="int8",
    device="cuda" if neurenix.cuda.is_available() else "cpu"
)
  1. Enable Caching for Repeated Inference: Cache intermediate results for repeated inference with similar inputs.
# Enable caching for the model
model = neurenix.huggingface.from_pretrained(
    "bert-base-uncased",
    enable_caching=True,
    cache_size=100  # Cache size in MB
)
  1. Use Hardware-Specific Optimizations: Enable optimizations for specific hardware.
# Enable tensor core optimizations for NVIDIA GPUs
if neurenix.cuda.is_available() and neurenix.cuda.is_tensor_cores_available():
    model = neurenix.huggingface.from_pretrained(
        "bert-base-uncased",
        enable_tensor_cores=True
    )

Tutorials

Converting and Fine-tuning a Hugging Face Model

import neurenix
from neurenix.huggingface import from_pretrained, Trainer, TrainingArguments
from neurenix.data import DataLoader
import numpy as np

# Load a pre-trained model from Hugging Face
model = neurenix.huggingface.from_pretrained(
    model_name_or_path="bert-base-uncased",
    task="text-classification",
    num_labels=2,
    device="cuda" if neurenix.cuda.is_available() else "cpu"
)

# Load a tokenizer
tokenizer = neurenix.huggingface.AutoTokenizer.from_pretrained("bert-base-uncased")

# Load and preprocess a dataset
dataset = neurenix.huggingface.load_dataset("glue", "mrpc")

def preprocess_function(examples):
    return tokenizer(
        examples["sentence1"],
        examples["sentence2"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

processed_dataset = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=4,
    remove_columns=["sentence1", "sentence2", "idx"]
)

# Split the dataset
train_dataset = processed_dataset["train"]
eval_dataset = processed_dataset["validation"]

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    fp16=True  # Enable mixed precision training
)

# Define metrics computation function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = (predictions == labels).mean()
    return {"accuracy": accuracy}

# Create a trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics
)

# Fine-tune the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()
print(f"Evaluation results: {results}")

# Save the fine-tuned model
trainer.save_model("./fine_tuned_model")

# Export the model to Hugging Face format
neurenix.huggingface.to_pretrained(
    model=model,
    save_directory="./exported_model",
    push_to_hub=True,
    repository_id="username/bert-finetuned-mrpc",
    token="hf_token"  # Optional, can also use environment variable
)

Parameter-Efficient Fine-tuning with LoRA

import neurenix
from neurenix.huggingface import from_pretrained, LoraConfig, PeftModel, Trainer, TrainingArguments

# Load a pre-trained model from Hugging Face
base_model = neurenix.huggingface.from_pretrained(
    model_name_or_path="gpt2",
    device="cuda" if neurenix.cuda.is_available() else "cpu"
)

# Configure LoRA for parameter-efficient fine-tuning
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["c_attn", "c_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Create a PEFT model
peft_model = PeftModel.from_pretrained(
    model=base_model,
    peft_config=lora_config
)

# Load a tokenizer
tokenizer = neurenix.huggingface.AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Load and preprocess a dataset
dataset = neurenix.huggingface.load_dataset("imdb")

def preprocess_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=512
    )

processed_dataset = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=4,
    remove_columns=["text"]
)

# Split the dataset
train_dataset = processed_dataset["train"].select(range(10000))  # Subsample for faster training
eval_dataset = processed_dataset["test"].select(range(1000))

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results_lora",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=16,
    warmup_steps=100,
    weight_decay=0.01,
    logging_dir="./logs_lora",
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=500,
    load_best_model_at_end=True,
    fp16=True
)

# Create a trainer
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

# Fine-tune the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()
print(f"Evaluation results: {results}")

# Save the fine-tuned LoRA model
peft_model.save_pretrained("./lora_finetuned_model")

# Merge LoRA weights with the base model
merged_model = peft_model.merge_and_unload()

# Generate text with the fine-tuned model
input_text = "This movie was"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(peft_model.device)

with neurenix.no_grad():
    output = merged_model.generate(
        input_ids=input_ids,
        max_length=100,
        num_return_sequences=1,
        temperature=0.7,
        top_p=0.9
    )

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")

Creating an Optimized Inference Pipeline

import neurenix
from neurenix.huggingface import OptimizedModel, Pipeline
import time

# Load and optimize a model for inference
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
optimized_model = neurenix.huggingface.OptimizedModel.from_pretrained(
    model_name_or_path=model_name,
    optimization_level=3,
    quantization="int8",
    device="cuda" if neurenix.cuda.is_available() else "cpu",
    batch_size=32,
    sequence_length=128,
    enable_caching=True
)

# Load a tokenizer
tokenizer = neurenix.huggingface.AutoTokenizer.from_pretrained(model_name)

# Create an optimized pipeline
pipeline = Pipeline(
    task="text-classification",
    model=optimized_model,
    tokenizer=tokenizer,
    device=0 if neurenix.cuda.is_available() else -1,
    batch_size=32
)

# Prepare test data
test_texts = [
    "I love this product! It's amazing and works perfectly.",
    "This is the worst purchase I've ever made. Completely disappointed.",
    "The product is okay, but not worth the price.",
    "Exceeded my expectations in every way possible.",
    "It's decent, but there are better alternatives available."
] * 100  # Repeat to create a larger test set

# Warm up the pipeline
_ = pipeline(test_texts[:5])

# Benchmark inference
start_time = time.time()
results = pipeline(test_texts)
end_time = time.time()

# Print results
print(f"Processed {len(test_texts)} texts in {end_time - start_time:.2f} seconds")
print(f"Average time per text: {(end_time - start_time) / len(test_texts) * 1000:.2f} ms")
print(f"Throughput: {len(test_texts) / (end_time - start_time):.2f} texts/second")

# Print some example results
for i, (text, result) in enumerate(zip(test_texts[:5], results[:5])):
    label = result["label"]
    score = result["score"]
    print(f"Text: {text}")
    print(f"Prediction: {label} (confidence: {score:.4f})")
    print()

# Export the optimized model
optimized_model.save("./optimized_model")

# Create a model for edge deployment
edge_model = neurenix.huggingface.OptimizedModel.from_pretrained(
    model_name_or_path=model_name,
    optimization_level=3,
    quantization="int8",
    target_device="edge",
    export_format="onnx"
)

# Save the edge model
edge_model.save("./edge_model")

Conclusion

The Hugging Face Integration module in Neurenix provides a comprehensive set of tools for leveraging the vast ecosystem of pre-trained models, datasets, and tools available on the Hugging Face Hub. By combining Neurenix's high-performance multi-language architecture with Hugging Face's resources, users can build powerful machine learning applications with state-of-the-art models while benefiting from Neurenix's hardware acceleration and optimization capabilities.

Compared to other frameworks like TensorFlow, PyTorch, and Scikit-Learn, Neurenix's Hugging Face Integration offers advantages in terms of performance, hardware support, and optimization capabilities. Its deep integration with the Neurenix ecosystem ensures seamless interoperability while maintaining compatibility with the broader Hugging Face ecosystem.

By following the best practices and tutorials outlined in this documentation, users can efficiently convert, fine-tune, and deploy Hugging Face models using Neurenix, taking advantage of the strengths of both ecosystems to build high-performance machine learning applications.