|
--- |
|
base_model: unsloth/Llama-3.2-3B-Instruct |
|
library_name: peft |
|
tags: |
|
- llama-3.2 |
|
- unsloth |
|
- lora |
|
- tool |
|
- json |
|
language: |
|
- en |
|
license: llama3 |
|
--- |
|
|
|
# Model Card for LLaMA-3.2-3B Tool Caller |
|
|
|
This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities. |
|
It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities. |
|
|
|
- **Developed by:** [Uness.fr](https://uness.fr) |
|
- **Model type:** Fine-tuned LLM (LoRA) |
|
- **Language(s) (NLP):** English |
|
- **License:** [Same as base model - specify LLaMA 3.2 license] |
|
- **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct (4-bit quantized version) |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search) |
|
- **Base model:** [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct) |
|
- **Training dataset:** [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions: |
|
|
|
1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about") |
|
2. `check_and_connect`: Triggered when users ask about system status or connectivity |
|
|
|
The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools. |
|
|
|
### Downstream Use |
|
|
|
This model can be integrated into: |
|
- AI assistants that need to understand when to delegate tasks to external tools |
|
|
|
### Out-of-Scope Use |
|
|
|
This model should not be used for: |
|
- General text generation without tool calling |
|
- Tasks requiring more than the two trained tools |
|
- Critical systems where reliability is essential without human oversight |
|
- Applications requiring factual accuracy guarantees |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- The model inherits biases from the base LLaMA-3.2-3B model |
|
- Performance depends on how similar user queries are to the training data format |
|
- There's a strong dependency on the specific prefixing pattern used in training ("Search information about") |
|
|
|
### Recommendations |
|
|
|
Users (both direct and downstream) should: |
|
- Follow the same prompting patterns used in training for optimal results |
|
- Include the "Search information about" prefix for queries intended for the search_documents tool |
|
- Be aware that the model expects a specific system prompt format |
|
- Test thoroughly before deployment in production environments |
|
- Consider implementing fallback mechanisms for unrecognized query types |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load the model and tokenizer |
|
model_path = "your-username/llama-3-2-3b-tool-caller-lora" # Replace with actual path |
|
model = AutoModelForCausalLM.from_pretrained(model_path) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
# Define the prompting format (must match training) |
|
SYSTEM_PROMPT = """Environment: ipython |
|
Cutting Knowledge Date: December 2023 |
|
Today Date: 18 May 2025""" |
|
|
|
USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. |
|
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. |
|
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } } |
|
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } } |
|
""" |
|
|
|
# Example 1: Information query (add the prefix) |
|
user_query = "What is the capital of France?" |
|
formatted_query = f"Search information about {user_query}" # Add prefix for search_documents in French |
|
|
|
messages = [ |
|
{"role": "system", "content": SYSTEM_PROMPT}, |
|
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"}, |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=True, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
outputs = model.generate(input_ids=inputs, max_new_tokens=128) |
|
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True) |
|
print(response) |
|
|
|
# Example 2: System status query (no prefix needed) |
|
status_query = "Are we connected?" |
|
messages = [ |
|
{"role": "system", "content": SYSTEM_PROMPT}, |
|
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"}, |
|
] |
|
|
|
# Generate response... |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset): |
|
- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix |
|
- 50 examples of system status queries for the "check_and_connect" tool |
|
|
|
The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages. |
|
|
|
### Training Procedure |
|
|
|
The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** 4-bit quantization with LoRA |
|
- **LoRA rank:** 16 |
|
- **LoRA alpha:** 16 |
|
- **LoRA dropout:** 0 |
|
- **Target modules:** "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" |
|
- **Learning rate:** 2e-4 |
|
- **Batch size:** 2 per device |
|
- **Gradient accumulation steps:** 4 |
|
- **Warmup steps:** 5 |
|
- **Number of epochs:** 3 |
|
- **Optimizer:** adamw_8bit |
|
- **Weight decay:** 0.01 |
|
- **LR scheduler:** linear |
|
- **Max sequence length:** 2048 |
|
- **Packing:** False |
|
- **Random seed:** 3407 |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
- **Training hardware:** [GPU type, e.g., NVIDIA A100, etc.] |
|
- **Training time:** [Approximately X minutes based on training code output] |
|
- **Model size:** Base model is 3B parameters; LoRA adapter is significantly smaller |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
The model was evaluated on sample inference examples from both categories: |
|
- Information queries with "Search information about" prefix |
|
- System status queries |
|
|
|
#### Metrics |
|
|
|
- **Accuracy:** Measured by whether the model correctly selects the appropriate tool for the query type |
|
- **Format correctness:** Whether the JSON output is properly formatted and parsable |
|
|
|
### Results |
|
|
|
Qualitative evaluation showed the model successfully distinguishes between: |
|
- Queries that should trigger the `search_documents` tool (when prefixed appropriately) |
|
- Queries that should trigger the `check_and_connect` tool |
|
|
|
## Environmental Impact |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** [GPU model] |
|
- **Hours used:** [Estimated from training time] |
|
- **Cloud Provider:** [If applicable] |
|
- **Compute Region:** [If applicable] |
|
- **Carbon Emitted:** [Estimate if available] |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
- Base architecture: LLaMA-3.2-3B |
|
- Adaptation method: LoRA fine-tuning |
|
- Objective: Train the model to output properly formatted JSON function calls based on input query type |
|
|
|
### Compute Infrastructure |
|
|
|
#### Hardware |
|
|
|
- The model was trained using CUDA-compatible GPU(s) |
|
- Memory usage metrics are reported in the training script |
|
|
|
#### Software |
|
|
|
- Unsloth: Fast implementation of LLaMA models |
|
- PyTorch: Deep learning framework |
|
- Transformers: Hugging Face's transformers library |
|
- PEFT: Parameter-Efficient Fine-Tuning library |
|
- TRL: Transformer Reinforcement Learning library |
|
|
|
## Framework versions |
|
|
|
- PEFT 0.15.2 |
|
- Transformers [version] |
|
- PyTorch [version] |
|
- Unsloth [version] |
|
|
|
## Model Card Contact |
|
|
|
[Your contact information] |