File size: 8,974 Bytes

d22a15c
a8e53c5
d22a15c
a8e53c5
 
 
 
 
13f208b
a8e53c5
 
 
d22a15c
 
1184733
d22a15c
60e6379
eff363f
d22a15c
 
 
 
 
a8e53c5
d22a15c
66b44cc
a8e53c5
6ccdeeb
a8e53c5
 
d22a15c
a8e53c5
d22a15c
0ccbd0a
13f208b
 
d22a15c
 
 
 
 
eff363f
d22a15c
13f208b
a8e53c5
d22a15c
a8e53c5
d22a15c
a8e53c5
d22a15c
a8e53c5
eff363f
d22a15c
 
 
a8e53c5
 
 
 
 
d22a15c
 
 
a8e53c5
 
 
d22a15c
 
 
a8e53c5
 
 
 
 
 
d22a15c
 
 
a8e53c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13f208b
a8e53c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d22a15c
 
 
 
 
13f208b
a8e53c5
 
d22a15c
a8e53c5
d22a15c
 
 
a8e53c5
d22a15c
 
 
a8e53c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d22a15c
 
 
 
 
 
 
a8e53c5
 
 
d22a15c
 
 
a8e53c5
 
d22a15c
 
 
a8e53c5
 
 
d22a15c
 
 
 
 
a8e53c5
 
 
 
 
d22a15c
a8e53c5
d22a15c
 
 
a8e53c5
 
 
d22a15c
 
 
 
 
a8e53c5
 
d22a15c
 
 
a8e53c5
 
 
 
 
d22a15c
a8e53c5
d22a15c
a8e53c5
 
 
 
d22a15c
 
 
a8e53c5

---
base_model: unsloth/Llama-3.2-3B-Instruct
library_name: peft
tags:
  - llama-3.2
  - unsloth
  - lora
  - tool
  - json
language:
  - en
license: llama3 # Llama 3 Community License
---

# Model Card for LLaMA-3.2-3B Tool Caller

This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities. 
It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls.

## Model Details

### Model Description

This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.

- **Developed by:** [Uness.fr](https://uness.fr) 
- **Model type:** Fine-tuned LLM (LoRA)
- **Language(s) (NLP):** English
- **License:** [Same as base model - specify LLaMA 3.2 license]
- **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)

### Model Sources

- **Repository:** [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search)
- **Base model:** [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)
- **Training dataset:** [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset)

## Uses

### Direct Use

This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:

1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about")
2. `check_and_connect`: Triggered when users ask about system status or connectivity

The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.

### Downstream Use

This model can be integrated into:
- AI assistants that need to understand when to delegate tasks to external tools

### Out-of-Scope Use

This model should not be used for:
- General text generation without tool calling
- Tasks requiring more than the two trained tools
- Critical systems where reliability is essential without human oversight
- Applications requiring factual accuracy guarantees

## Bias, Risks, and Limitations

- The model inherits biases from the base LLaMA-3.2-3B model
- Performance depends on how similar user queries are to the training data format
- There's a strong dependency on the specific prefixing pattern used in training ("Search information about")

### Recommendations

Users (both direct and downstream) should:
- Follow the same prompting patterns used in training for optimal results
- Include the "Search information about" prefix for queries intended for the search_documents tool
- Be aware that the model expects a specific system prompt format
- Test thoroughly before deployment in production environments
- Consider implementing fallback mechanisms for unrecognized query types

## How to Get Started with the Model

Use the code below to get started with the model:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
model_path = "your-username/llama-3-2-3b-tool-caller-lora"  # Replace with actual path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define the prompting format (must match training)
SYSTEM_PROMPT = """Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 18 May 2025"""

USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. 
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
"""

# Example 1: Information query (add the prefix)
user_query = "What is the capital of France?"
formatted_query = f"Search information about {user_query}"  # Add prefix for search_documents in French

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
print(response)

# Example 2: System status query (no prefix needed)
status_query = "Are we connected?"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
]

# Generate response...
```

## Training Details

### Training Data

The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset):
- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
- 50 examples of system status queries for the "check_and_connect" tool

The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.

### Training Procedure

The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.

#### Training Hyperparameters

- **Training regime:** 4-bit quantization with LoRA
- **LoRA rank:** 16
- **LoRA alpha:** 16
- **LoRA dropout:** 0
- **Target modules:** "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
- **Learning rate:** 2e-4
- **Batch size:** 2 per device
- **Gradient accumulation steps:** 4
- **Warmup steps:** 5
- **Number of epochs:** 3
- **Optimizer:** adamw_8bit
- **Weight decay:** 0.01
- **LR scheduler:** linear
- **Max sequence length:** 2048
- **Packing:** False
- **Random seed:** 3407

#### Speeds, Sizes, Times

- **Training hardware:** [GPU type, e.g., NVIDIA A100, etc.]
- **Training time:** [Approximately X minutes based on training code output]
- **Model size:** Base model is 3B parameters; LoRA adapter is significantly smaller

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

The model was evaluated on sample inference examples from both categories:
- Information queries with "Search information about" prefix
- System status queries

#### Metrics

- **Accuracy:** Measured by whether the model correctly selects the appropriate tool for the query type
- **Format correctness:** Whether the JSON output is properly formatted and parsable

### Results

Qualitative evaluation showed the model successfully distinguishes between:
- Queries that should trigger the `search_documents` tool (when prefixed appropriately)
- Queries that should trigger the `check_and_connect` tool

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [GPU model]
- **Hours used:** [Estimated from training time]
- **Cloud Provider:** [If applicable]
- **Compute Region:** [If applicable]
- **Carbon Emitted:** [Estimate if available]

## Technical Specifications

### Model Architecture and Objective

- Base architecture: LLaMA-3.2-3B
- Adaptation method: LoRA fine-tuning
- Objective: Train the model to output properly formatted JSON function calls based on input query type

### Compute Infrastructure

#### Hardware

- The model was trained using CUDA-compatible GPU(s)
- Memory usage metrics are reported in the training script

#### Software

- Unsloth: Fast implementation of LLaMA models
- PyTorch: Deep learning framework
- Transformers: Hugging Face's transformers library
- PEFT: Parameter-Efficient Fine-Tuning library
- TRL: Transformer Reinforcement Learning library

## Framework versions

- PEFT 0.15.2
- Transformers [version]
- PyTorch [version]
- Unsloth [version]

## Model Card Contact

[Your contact information]