Utkarsh524 commited on
Commit
975dad4
·
verified ·
1 Parent(s): 83045e4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: c++
4
+ tags:
5
+ - code-generation
6
+ - codellama
7
+ - peft
8
+ - unit-tests
9
+ - causal-lm
10
+ - text-generation
11
+ base_model: codellama/CodeLlama-7b-hf
12
+ model_type: llama
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # 🧪 CodeLLaMA Unit Test Generator — Full Merged Model (v2)
17
+
18
+ This is a **merged model** that combines [`codellama/CodeLlama-7b-hf`](https://huggingface.co/codellama/CodeLlama-7b-hf) with a LoRA adapter fine-tuned on embedded C/C++ code and high-quality unit tests using GoogleTest and CppUTest. This version includes enhanced formatting, stop tokens, and test cleanup mechanisms.
19
+
20
+ > ✅ Trained to generate only test cases, no headers, no `main()`, and uses `// END_OF_TESTS` token to denote completion.
21
+
22
+ ---
23
+
24
+ ## 🎯 Use Cases
25
+
26
+ - 🧪 Generate comprehensive unit tests for embedded C/C++ functions
27
+ - ✅ Focus on edge cases, boundaries, error handling
28
+ - ⚠️ Ensure MISRA-C compliance (if trained accordingly)
29
+ - 📏 Automatically remove boilerplate and focus on `TEST(...)` blocks
30
+
31
+ ---
32
+
33
+ ## 🧠 Training Summary
34
+
35
+ - Base model: `codellama/CodeLlama-7b-hf`
36
+ - LoRA fine-tuned with:
37
+ - Special tokens: `<|system|>`, `<|user|>`, `<|assistant|>`, `// END_OF_TESTS`
38
+ - Instruction-style prompts
39
+ - Explicit test output formatting
40
+ - Cleaned test labels via regex stripping headers/main
41
+ - Datasets: [`athrv/Embedded_Unittest2`](https://huggingface.co/datasets/athrv/Embedded_Unittest2)
42
+
43
+ ---
44
+
45
+ ## 📌 Example Usage
46
+
47
+ ```python
48
+ from transformers import AutoTokenizer, AutoModelForCausalLM
49
+ import torch
50
+
51
+ model_id = "Utkarsh524/codellama_utests_full_new_ver2"
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
54
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
55
+
56
+ prompt = """<|system|>
57
+ Generate comprehensive unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
58
+ Output Constraints:
59
+ 1. ONLY include test code (no explanations, headers, or main functions)
60
+ 2. Start directly with TEST(...)
61
+ 3. End after last test case
62
+ 4. Never include framework boilerplate
63
+ <|user|>
64
+ Create tests for:
65
+ int add(int a, int b) { return a + b; }
66
+ <|assistant|>
67
+ """
68
+
69
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
70
+ outputs = model.generate(**inputs, max_new_tokens=512, eos_token_id=tokenizer.convert_tokens_to_ids("// END_OF_TESTS"))
71
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))