loghugging25 commited on
Commit
f70e885
·
verified ·
1 Parent(s): e577553

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -52
README.md CHANGED
@@ -7,52 +7,79 @@ base_model:
7
  pipeline_tag: question-answering
8
  tags:
9
  - Connect-Transport
10
- - ConnectTransport
11
  - Connect
 
 
 
 
 
 
 
 
 
 
12
  - chatbot
13
  library_name: transformers
14
  ---
15
 
16
  # Model Card for logicsct-mistral-nemo-instruct
 
17
 
18
- logicsct-mistral-nemo-instruct is a QLoRA 4-bit finetuning of [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407).
19
-
20
- ## Model usage
21
- We are currently evaluating and training models to be a support chat bot for [**Connect-Transport**](https://www.logics-connect.de), a transport management system from Logics Software GmbH.
22
-
23
- ## Finding a good base model - speaking German and following instructions well enough
24
- We have evaluated over 70 models for basic tech instruction tasks in German. The evaluation was done manually by checking the answers about the following questions:
25
- 1. Wie kann ich in Chrome machen dass meine Downloads immer am gleichen Ort gespeichert werden?
26
- 2. Wie kann ich in Outlook meine Mail Signatur anpassen und einen Link und Bild dort einfügen?
27
-
28
- The best models according to our subjective scale from 1 (bad) to 5 (very good):
29
- - 5 star rating:
30
- - Big proprietary models like OpenAI o1, OpenAI 4o, OpenAI o1-mini
31
- - Huge models: [deepseek-ai/DeepSeek-R1 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-R1), [deepseek-ai/DeepSeek-V3 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-V3), [mistralai/Mistral-Large-Instruct-2411 (123B)](https://huggingface.co/mistralai/Mistral-Large-Instruct-2411)
32
- - Large models: [Nexusflow/Athene-V2-Chat (72.7B)](https://huggingface.co/Nexusflow/Athene-V2-Chat), [nvidia/Llama-3.1-Nemotron-70B-Instruct (70.6B)](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct)
33
- - 4 star rating:
34
- - Huge models: [mistralai/Mixtral-8x22B-Instruct-v0.1 (141B)](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), [alpindale/WizardLM-2-8x22B (141B)](https://huggingface.co/alpindale/WizardLM-2-8x22B) and [CohereForAI/c4ai-command-r-plus-08-2024 (104B)](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
35
- - Large models: [meta-llama/Llama-3.3-70B-Instruct (70.6B)](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) and [NousResearch/Hermes-3-Llama-3.1-70B (70.6B)](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-70B)
36
- - Big models: [mistralai/Mixtral-8x7B-Instruct-v0.1 (46.7B)](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
37
- - Medium big models: [google/gemma-2-27b (27.2B)](https://huggingface.co/google/gemma-2-27b) and [mistralai/Mistral-Small-Instruct-2409 (22.2B)](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409)
38
- - **Small sized models (main focus currently)**:
39
- - [mistralai/Mistral-Nemo-Instruct-2407 (12.2B)](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
40
- - [microsoft/phi-4](https://huggingface.co/microsoft/phi-4)
41
- - 3 star and lower: not listed here. We have tested dozens and dozens of <20B and <10B models but most do not understand or speak German well enough or perform well enough in context of asking a support chat bot tech questions.
42
- - There are some models which have smaller versions too but aren't listed above. The reason for that is that those smaller versions have not performed well enough for a 4+ rating.
43
- - Furthermore there are some models like Hermes 3 which have bigger versions available too that aren't listed. That's because we were not impressed of their performance per model size ratio and thus not particularily interested in testing their huge 405B versions.
44
- - We mainly focus in <20B size models and compared their performance with some of the bigger models, too.
45
-
46
- ## How we fine tune our base model
47
- - Because of our small training dataset and GPU VRAM constraints we use QLoRA fine tuning only.
48
- - After trying out our own scripts, we finally settled with https://github.com/hiyouga/LLaMA-Factory which fits our needs in terms of easy training, inference, and export functionality for a big set of models.
49
-
50
- ### Training data
51
- - Our training data currently consists of about **220 prompt-response pairs**.
52
- - We have build a webapp for our employees to enter training data, with gamification in form of a daily and weekly high score system. The webapp is furthermore connected to a selection of current evaluation models to see how the models answer to both prompts within their training data and outside of it.
53
-
54
- ### QLoRA settings
55
- Full settings of `logicsct_train_Mistral_Nemo_qlora_sft_otfq.yaml`:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ```
57
  ### model
58
  model_name_or_path: mistralai/Mistral-Nemo-Instruct-2407
@@ -99,8 +126,8 @@ eval_strategy: steps # or "epoch" if you prefer evaluating at the end of each e
99
  eval_steps: 500 # adjust this if needed (e.g., if you use "steps", it determines evaluation frequency)
100
  ```
101
 
102
- ### Training, inference, and export
103
- Following https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#quickstart:
104
 
105
  ```
106
  llamafactory-cli train logicsct_train_Mistral_Nemo_qlora_sft_otfq.yaml # VRAM used: 10099MiB for 4 bit QLoRA training
@@ -110,14 +137,17 @@ llamafactory-cli export logicsct_export_Mistral_Nemo_qlora_sft_Q4.yaml # V
110
  llamafactory-cli chat logicsct_inference_Mistral_Nemo_qlora_sft_otfq_Q4.yaml # VRAM used: 8541MiB-9569MiB VRAM for inference of the 4bit quant merged model (increasing with increasing context length)
111
  ```
112
 
113
- ### Comparison of open source training/models with OpenAI proprietary finetuning
114
- - We have finetuned both OpenAI gpt 4o and 4o-mini, and compared their performance to our best small sized models
115
- - After some initial runs with very unsatisfying results, we needed to adjust the hyper parameters a lot, and mainly continued experimenting with 4o-mini.
116
- - With our current training data, it seems like both 4o and 4o-mini need 5 epochs with the default learning rate and the training loss ends pretty close to 0, but with fewer epochs the models seem not to learn enough, maybe because of our small sized training dataset.
117
- - Unusable overfitting occurs at about 7 epochs for both models.
118
- - Best settings so far: 5 epochs, batch size of 3, automatic learning rate.
119
- - But currently our small sized open source models perform pretty equal to or even better than such finetuning of 4o-mini.
120
- - We will continue further testing with OpenAI finetuning once we have a larger training data set.
121
-
122
- ## Next steps
123
- Number one priority is currently collecting more training data.
 
 
 
 
7
  pipeline_tag: question-answering
8
  tags:
9
  - Connect-Transport
10
+ - Connect Transport
11
  - Connect
12
+ - Logics Software
13
+ - KI-Chatbot Kundenservice
14
+ - KI Chatbot
15
+ - Deutscher Chatbot
16
+ - Deutscher KI Chatbot
17
+ - KI-Chatbot Deutsch
18
+ - KI-Chatbots für Unternehmen
19
+ - German chat bot
20
+ - German support chatbot
21
+ - German AI chatbot
22
  - chatbot
23
  library_name: transformers
24
  ---
25
 
26
  # Model Card for logicsct-mistral-nemo-instruct
27
+ **logicsct-mistral-nemo-instruct** is a QLoRA 4-bit fine-tuned version of [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407). This model has been adapted with domain-specific knowledge to serve as a support chatbot for [**Connect-Transport**](https://www.logics-connect.de), our transport management system developed at Logics Software GmbH.
28
 
29
+ While tailored for our internal use, the training principles and techniques we employed can also be applied by others interested in developing their own chatbot assistants.
30
+
31
+ We are continuously evaluating and refining our models to enhance the performance of our support chatbot for Connect-Transport.
32
+
33
+ ## Finding a Good Base Model – Proficient in German and Following Instructions
34
+ We have evaluated over 70 models for basic technical instruction tasks in German. The evaluation was carried out manually by reviewing the responses to the following questions:
35
+
36
+ - Wie kann ich in Chrome machen dass meine Downloads immer am gleichen Ort gespeichert werden?
37
+ - Wie kann ich in Outlook meine Mail Signatur anpassen und einen Link und Bild dort einfügen?
38
+
39
+ The best models according to our subjective rating scale (1 = poor, 5 = excellent) are:
40
+
41
+ 5-Star Rating:
42
+ - Big proprietary models such as OpenAI o1, OpenAI 4o and OpenAI o1-mini
43
+ - Huge models: [deepseek-ai/DeepSeek-R1 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-R1), [deepseek-ai/DeepSeek-V3 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-V3), [mistralai/Mistral-Large-Instruct-2411 (123B)](https://huggingface.co/mistralai/Mistral-Large-Instruct-2411)
44
+ - Large models: [Nexusflow/Athene-V2-Chat (72.7B)](https://huggingface.co/Nexusflow/Athene-V2-Chat), [nvidia/Llama-3.1-Nemotron-70B-Instruct (70.6B)](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct)
45
+
46
+ 4-Star Rating:
47
+ - Huge models: [mistralai/Mixtral-8x22B-Instruct-v0.1 (141B)](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), [alpindale/WizardLM-2-8x22B (141B)](https://huggingface.co/alpindale/WizardLM-2-8x22B) and [CohereForAI/c4ai-command-r-plus-08-2024 (104B)](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
48
+ - Large models: [meta-llama/Llama-3.3-70B-Instruct (70.6B)](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) and [NousResearch/Hermes-3-Llama-3.1-70B (70.6B)](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-70B)
49
+ - Big models: [mistralai/Mixtral-8x7B-Instruct-v0.1 (46.7B)](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
50
+ - Medium-sized models: [google/gemma-2-27b (27.2B)](https://huggingface.co/google/gemma-2-27b) and [mistralai/Mistral-Small-Instruct-2409 (22.2B)](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409)
51
+
52
+ **Small-Sized Models (Current Main Focus)**:
53
+ - [mistralai/Mistral-Nemo-Instruct-2407 (12.2B)](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
54
+ - [microsoft/phi-4](https://huggingface.co/microsoft/phi-4)
55
+
56
+ Models rated 3 stars or lower are not listed here. We have tested dozens of models with fewer than 20B and 10B parameters, but most do not understand or speak German well enough or perform adequately in the context of answering support chatbot technical questions.
57
+
58
+ Some models also have smaller versions that are not listed above because they did not achieve a 4+ rating. Additionally, some models (e.g., Hermes 3) have larger versions available that are not included, as their performance relative to model size was not impressive, making their massive 405B versions less interesting for our purposes.
59
+
60
+ Given our goal of training, exporting, and running inference on our dedicated server hardware, we primarily focus on models with fewer than 20B parameters while comparing their performance with that of some larger models.
61
+
62
+ ## How We Fine-Tune Our Base Model
63
+ Due to our small training dataset and GPU VRAM constraints, we exclusively use QLoRA fine-tuning. After testing our own scripts, we ultimately settled on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), which meets our needs for straightforward training, inference, and export functionality across a wide range of models.
64
+
65
+ ### Training Data
66
+ Our training data currently consists of approximately **220 prompt-response pairs** in a carefully curated dataset.
67
+
68
+ To collect this training data, we developed an internal web app for our employees. In this app:
69
+ - Employees enter both a prompt and a response — each pair forms a single training record.
70
+ - Gamification is implemented via daily and weekly high score systems that highlight the employee who contributes the most training data.
71
+ - The web app is also connected to a selection of current evaluation models to facilitate internal testing of how the models respond both to prompts from the training data and to new prompts.
72
+
73
+ Here are three example records of our training data:
74
+ > Prompt: "Wie kann ich eine Tour umbenennen?"
75
+ > <br>Response: "1. Öffnen Sie die Stoppliste der Tour und klicken Sie auf das Tourmenü (graues Zahnradsymbol ⚙️) rechts oben in der Stoppliste.\n2. Wählen Sie im Tourmenü \"Tour umbenennen\".\n3. Folgen Sie der Anweisung im Dialog und klicken Sie auf \"OK\".\nBeachten Sie: Fahrer können die Tour nicht umbenennen, sondern nur ein Disponent (Admin Account)."
76
+ > <br>Prompt: "Wie werden Daten in Connect-Transport importiert?"
77
+ > <br>Response: "Der Flex Importer liest Auftragsdaten aus Excel-Dateien ein. Alternativ können Daten per Schnittstelle direkt aus ERP- oder Warenwirtschaftssystemen importiert werden."
78
+ > <br>Prompt: "Gibt es eine Möglichkeit zur Sendungsverfolgung für Endkunden?"
79
+ > <br>Response: "Ja, mit dem Avisierungs-Modul können Disponenten per SMS oder E-Mail voraussichtliche Lieferzeiten an Empfänger senden, die ihre Sendungen live verfolgen können."
80
+
81
+ ### QLoRA Settings
82
+ Full settings for `logicsct_train_Mistral_Nemo_qlora_sft_otfq.yaml`:
83
  ```
84
  ### model
85
  model_name_or_path: mistralai/Mistral-Nemo-Instruct-2407
 
126
  eval_steps: 500 # adjust this if needed (e.g., if you use "steps", it determines evaluation frequency)
127
  ```
128
 
129
+ ### Training, Inference, and Export
130
+ We follow the instructions provided in the [LLaMA-Factory Quickstart Guide](https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#quickstart):
131
 
132
  ```
133
  llamafactory-cli train logicsct_train_Mistral_Nemo_qlora_sft_otfq.yaml # VRAM used: 10099MiB for 4 bit QLoRA training
 
137
  llamafactory-cli chat logicsct_inference_Mistral_Nemo_qlora_sft_otfq_Q4.yaml # VRAM used: 8541MiB-9569MiB VRAM for inference of the 4bit quant merged model (increasing with increasing context length)
138
  ```
139
 
140
+ ### Comparison of Open Source Training/Models with OpenAI Proprietary Fine-Tuning
141
+ We have fine-tuned both OpenAI GPT 4o and 4o-mini and compared their performance to that of our best small-sized models. After some initial runs with unsatisfactory results, we significantly adjusted the hyperparameters and focused primarily on experimenting with 4o-mini.
142
+
143
+ With our current training data, both 4o and 4o-mini appear to require 5 epochs using the default learning rate, with the training loss approaching zero. With fewer epochs, however, the models seem not to learn sufficiently—perhaps due to the small size of our training dataset. Significant overfitting occurs at approximately 7 epochs for both models.
144
+
145
+ Our best settings so far are:
146
+ - Epochs: 5
147
+ - Batch Size: 3
148
+ - Learning Rate: Automatically determined
149
+
150
+ Currently, our small-sized open-source models perform comparably to or even better than the fine-tuned 4o-mini. We will continue testing with OpenAI fine-tuning once we have a larger training dataset.
151
+
152
+ ## Next Steps
153
+ Our top priority at the moment is to collect more training data.