---
base_model: unsloth/Llama-3.2-3B-Instruct
library_name: peft
tags:
  - llama-3.2
  - unsloth
  - lora
  - tool
  - json
language:
  - en
license: llama3 # Llama 3 Community License
---

# Model Card for LLaMA-3.2-3B Tool Caller

This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities. 
It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls.

## Model Details

### Model Description

This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.

- **Developed by:** [Uness.fr](https://uness.fr) 
- **Model type:** Fine-tuned LLM (LoRA)
- **Language(s) (NLP):** English
- **License:** [Same as base model - specify LLaMA 3.2 license]
- **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)

### Model Sources

- **Repository:** [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search)
- **Base model:** [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)
- **Training dataset:** [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset)

## Uses

### Direct Use

This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:

1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about")
2. `check_and_connect`: Triggered when users ask about system status or connectivity

The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.

### Downstream Use

This model can be integrated into:
- AI assistants that need to understand when to delegate tasks to external tools

### Out-of-Scope Use

This model should not be used for:
- General text generation without tool calling
- Tasks requiring more than the two trained tools
- Critical systems where reliability is essential without human oversight
- Applications requiring factual accuracy guarantees

## Bias, Risks, and Limitations

- The model inherits biases from the base LLaMA-3.2-3B model
- Performance depends on how similar user queries are to the training data format
- There's a strong dependency on the specific prefixing pattern used in training ("Search information about")

### Recommendations

Users (both direct and downstream) should:
- Follow the same prompting patterns used in training for optimal results
- Include the "Search information about" prefix for queries intended for the search_documents tool
- Be aware that the model expects a specific system prompt format
- Test thoroughly before deployment in production environments
- Consider implementing fallback mechanisms for unrecognized query types

## How to Get Started with the Model

Use the code below to get started with the model:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
model_path = "your-username/llama-3-2-3b-tool-caller-lora"  # Replace with actual path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define the prompting format (must match training)
SYSTEM_PROMPT = """Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 18 May 2025"""

USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. 
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
"""

# Example 1: Information query (add the prefix)
user_query = "What is the capital of France?"
formatted_query = f"Search information about {user_query}"  # Add prefix for search_documents in French

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
print(response)

# Example 2: System status query (no prefix needed)
status_query = "Are we connected?"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
]

# Generate response...
```

## Training Details

### Training Data

The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset):
- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
- 50 examples of system status queries for the "check_and_connect" tool

The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.

### Training Procedure

The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.

#### Training Hyperparameters

- **Training regime:** 4-bit quantization with LoRA
- **LoRA rank:** 16
- **LoRA alpha:** 16
- **LoRA dropout:** 0
- **Target modules:** "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
- **Learning rate:** 2e-4
- **Batch size:** 2 per device
- **Gradient accumulation steps:** 4
- **Warmup steps:** 5
- **Number of epochs:** 3
- **Optimizer:** adamw_8bit
- **Weight decay:** 0.01
- **LR scheduler:** linear
- **Max sequence length:** 2048
- **Packing:** False
- **Random seed:** 3407

#### Speeds, Sizes, Times

- **Training hardware:** [GPU type, e.g., NVIDIA A100, etc.]
- **Training time:** [Approximately X minutes based on training code output]
- **Model size:** Base model is 3B parameters; LoRA adapter is significantly smaller

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

The model was evaluated on sample inference examples from both categories:
- Information queries with "Search information about" prefix
- System status queries

#### Metrics

- **Accuracy:** Measured by whether the model correctly selects the appropriate tool for the query type
- **Format correctness:** Whether the JSON output is properly formatted and parsable

### Results

Qualitative evaluation showed the model successfully distinguishes between:
- Queries that should trigger the `search_documents` tool (when prefixed appropriately)
- Queries that should trigger the `check_and_connect` tool

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [GPU model]
- **Hours used:** [Estimated from training time]
- **Cloud Provider:** [If applicable]
- **Compute Region:** [If applicable]
- **Carbon Emitted:** [Estimate if available]

## Technical Specifications

### Model Architecture and Objective

- Base architecture: LLaMA-3.2-3B
- Adaptation method: LoRA fine-tuning
- Objective: Train the model to output properly formatted JSON function calls based on input query type

### Compute Infrastructure

#### Hardware

- The model was trained using CUDA-compatible GPU(s)
- Memory usage metrics are reported in the training script

#### Software

- Unsloth: Fast implementation of LLaMA models
- PyTorch: Deep learning framework
- Transformers: Hugging Face's transformers library
- PEFT: Parameter-Efficient Fine-Tuning library
- TRL: Transformer Reinforcement Learning library

## Framework versions

- PEFT 0.15.2
- Transformers [version]
- PyTorch [version]
- Unsloth [version]

## Model Card Contact

[Your contact information]