clement-cvll commited on
Commit
d5b1252
·
verified ·
1 Parent(s): 9b5f8af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -1,10 +1,98 @@
1
  ---
2
  license: mit
3
  datasets:
4
- - clement-cvll/QWQ-LongCOT-AIMO
5
  base_model:
6
- - clement-cvll/DeepSeek-R1-Distill-Qwen-7B-Floppanacci
7
  pipeline_tag: text-generation
8
  tags:
9
  - math
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  datasets:
4
+ - Floppanacci/QWQ-LongCOT-AIMO
5
  base_model:
6
+ - Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci
7
  pipeline_tag: text-generation
8
  tags:
9
  - math
10
+ - qwen2.5
11
+ - aimo
12
+ language:
13
+ - en
14
+ ---
15
+
16
+ # DeepSeek-R1-Distill-Qwen-7B-Floppanacci (4-bit AWQ Quantized)
17
+
18
+ This repository contains the 4-bit AWQ (Activation-aware Weight Quantization) version of the [`Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci`](https://huggingface.co/Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci) model.
19
+
20
+ ## Model Description
21
+
22
+ This model is optimized for faster inference and lower memory footprint compared to the original bf16/fp16 fine-tuned model. It's designed for mathematical reasoning tasks, especially Chain-of-Thought style problem-solving relevant to the [AIMO competition](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2).
23
+
24
+ The original model was fine-tuned on the [`Floppanacci/QWQ-LongCOT-AIMO`](https://huggingface.co/datasets/Floppanacci/QWQ-LongCOT-AIMO) dataset.
25
+
26
+ ## How to Use
27
+
28
+ ### With `transformers` (and `autoawq`)
29
+
30
+ You need to install the `autoawq` library:
31
+ ```bash
32
+ pip install autoawq transformers torch
33
+ ```
34
+
35
+ Then use the model with `transformers`:
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+ import torch
39
+
40
+ model_id = "Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci-AWQ"
41
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
42
+ # Load the AWQ quantized model
43
+ model = AutoModelForCausalLM.from_pretrained(
44
+ model_id,
45
+ device_map="auto" # Automatically uses available GPU(s)
46
+ )
47
+
48
+ # Example Prompt (adjust based on how the model expects input)
49
+ prompt = "Question: Let $ABCD$ be a unit square. Let $P$ be a point inside the square such that $PA = \sqrt{5}/3$, $PB = \sqrt{2}/3$, and $PC = \sqrt{5}/3$. Find the distance $PD$. Answer:"
50
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
51
+
52
+ # Generate
53
+ outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.1, do_sample=False) # Example settings
54
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
55
+
56
+ print(response)
57
+
58
+ ```
59
+
60
+ ### With `vLLM` (Optimized Inference)
61
+
62
+ For higher throughput and optimized inference, you can use vLLM.
63
+
64
+ First, install vLLM:
65
+ ```bash
66
+ pip install vllm
67
+ ```
68
+
69
+ Then run the following Python code:
70
+ ```python
71
+ from vllm import LLM, SamplingParams
72
+
73
+ # Define prompts
74
+ prompts = [
75
+ "Question: Let $ABCD$ be a unit square. Let $P$ be a point inside the square such that $PA = \sqrt{5}/3$, $PB = \sqrt{2}/3$, and $PC = \sqrt{5}/3$. Find the distance $PD$. Answer:",
76
+ "Question: What is the sum of the first 100 positive integers? Answer:",
77
+ ]
78
+
79
+ # Define sampling parameters
80
+ sampling_params = SamplingParams(temperature=0.1, top_p=0.95, max_tokens=300)
81
+
82
+ # Initialize the LLM engine with the AWQ model
83
+ llm = LLM(model="Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci-AWQ",
84
+ quantization="awq",
85
+ dtype="auto", # vLLM will typically use half-precision for activations (use bfloat16 on compatible hardware e.g. L4, A100, H100, etc.)
86
+ trust_remote_code=True
87
+ )
88
+
89
+ # Generate responses
90
+ outputs = llm.generate(prompts, sampling_params)
91
+
92
+ # Print the outputs
93
+ for output in outputs:
94
+ prompt = output.prompt
95
+ generated_text = output.outputs[0].text
96
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
97
+
98
+ ```