Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language: c++
|
4 |
+
tags:
|
5 |
+
- code-generation
|
6 |
+
- codellama
|
7 |
+
- peft
|
8 |
+
- unit-tests
|
9 |
+
- causal-lm
|
10 |
+
- text-generation
|
11 |
+
base_model: codellama/CodeLlama-7b-hf
|
12 |
+
model_type: llama
|
13 |
+
pipeline_tag: text-generation
|
14 |
+
---
|
15 |
+
|
16 |
+
# 🧪 CodeLLaMA Unit Test Generator — Full Merged Model (v2)
|
17 |
+
|
18 |
+
This is a **merged model** that combines [`codellama/CodeLlama-7b-hf`](https://huggingface.co/codellama/CodeLlama-7b-hf) with a LoRA adapter fine-tuned on embedded C/C++ code and high-quality unit tests using GoogleTest and CppUTest. This version includes enhanced formatting, stop tokens, and test cleanup mechanisms.
|
19 |
+
|
20 |
+
> ✅ Trained to generate only test cases, no headers, no `main()`, and uses `// END_OF_TESTS` token to denote completion.
|
21 |
+
|
22 |
+
---
|
23 |
+
|
24 |
+
## 🎯 Use Cases
|
25 |
+
|
26 |
+
- 🧪 Generate comprehensive unit tests for embedded C/C++ functions
|
27 |
+
- ✅ Focus on edge cases, boundaries, error handling
|
28 |
+
- ⚠️ Ensure MISRA-C compliance (if trained accordingly)
|
29 |
+
- 📏 Automatically remove boilerplate and focus on `TEST(...)` blocks
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
## 🧠 Training Summary
|
34 |
+
|
35 |
+
- Base model: `codellama/CodeLlama-7b-hf`
|
36 |
+
- LoRA fine-tuned with:
|
37 |
+
- Special tokens: `<|system|>`, `<|user|>`, `<|assistant|>`, `// END_OF_TESTS`
|
38 |
+
- Instruction-style prompts
|
39 |
+
- Explicit test output formatting
|
40 |
+
- Cleaned test labels via regex stripping headers/main
|
41 |
+
- Datasets: [`athrv/Embedded_Unittest2`](https://huggingface.co/datasets/athrv/Embedded_Unittest2)
|
42 |
+
|
43 |
+
---
|
44 |
+
|
45 |
+
## 📌 Example Usage
|
46 |
+
|
47 |
+
```python
|
48 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
49 |
+
import torch
|
50 |
+
|
51 |
+
model_id = "Utkarsh524/codellama_utests_full_new_ver2"
|
52 |
+
|
53 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
54 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
|
55 |
+
|
56 |
+
prompt = """<|system|>
|
57 |
+
Generate comprehensive unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
|
58 |
+
Output Constraints:
|
59 |
+
1. ONLY include test code (no explanations, headers, or main functions)
|
60 |
+
2. Start directly with TEST(...)
|
61 |
+
3. End after last test case
|
62 |
+
4. Never include framework boilerplate
|
63 |
+
<|user|>
|
64 |
+
Create tests for:
|
65 |
+
int add(int a, int b) { return a + b; }
|
66 |
+
<|assistant|>
|
67 |
+
"""
|
68 |
+
|
69 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
70 |
+
outputs = model.generate(**inputs, max_new_tokens=512, eos_token_id=tokenizer.convert_tokens_to_ids("// END_OF_TESTS"))
|
71 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|