Utkarsh524
/

codellama_utests_full_new_ver2

Text Generation

code-generation

embedded-systems

Model card Files Files and versions

Utkarsh524 commited on Jun 21

Commit

975dad4

·

verified ·

1 Parent(s): 83045e4

Create README.md

Files changed (1) hide show

README.md +71 -0

README.md ADDED Viewed

	@@ -0,0 +1,71 @@

+---
+license: apache-2.0
+language: c++
+tags:
+  - code-generation
+  - codellama
+  - peft
+  - unit-tests
+  - causal-lm
+  - text-generation
+base_model: codellama/CodeLlama-7b-hf
+model_type: llama
+pipeline_tag: text-generation
+---
+# 🧪 CodeLLaMA Unit Test Generator — Full Merged Model (v2)
+This is a **merged model** that combines [`codellama/CodeLlama-7b-hf`](https://huggingface.co/codellama/CodeLlama-7b-hf) with a LoRA adapter fine-tuned on embedded C/C++ code and high-quality unit tests using GoogleTest and CppUTest. This version includes enhanced formatting, stop tokens, and test cleanup mechanisms.
+> ✅ Trained to generate only test cases, no headers, no `main()`, and uses `// END_OF_TESTS` token to denote completion.
+---
+## 🎯 Use Cases
+- 🧪 Generate comprehensive unit tests for embedded C/C++ functions
+- ✅ Focus on edge cases, boundaries, error handling
+- ⚠️ Ensure MISRA-C compliance (if trained accordingly)
+- 📏 Automatically remove boilerplate and focus on `TEST(...)` blocks
+---
+## 🧠 Training Summary
+- Base model: `codellama/CodeLlama-7b-hf`
+- LoRA fine-tuned with:
+  - Special tokens: `<|system|>`, `<|user|>`, `<|assistant|>`, `// END_OF_TESTS`
+  - Instruction-style prompts
+  - Explicit test output formatting
+  - Cleaned test labels via regex stripping headers/main
+- Datasets: [`athrv/Embedded_Unittest2`](https://huggingface.co/datasets/athrv/Embedded_Unittest2)
+---
+## 📌 Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "Utkarsh524/codellama_utests_full_new_ver2"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
+prompt = """<|system|>
+Generate comprehensive unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
+Output Constraints:
+1. ONLY include test code (no explanations, headers, or main functions)
+2. Start directly with TEST(...)
+3. End after last test case
+4. Never include framework boilerplate
+<|user|>
+Create tests for:
+int add(int a, int b) { return a + b; }
+<|assistant|>
+"""
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=512, eos_token_id=tokenizer.convert_tokens_to_ids("// END_OF_TESTS"))
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))