kenyano commited on
Commit
ccf0744
·
verified ·
1 Parent(s): 0544d94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +288 -288
README.md CHANGED
@@ -1,288 +1,288 @@
1
- ---
2
- license: llama3
3
- language:
4
- - en
5
- - ja
6
- - zh
7
- base_model:
8
- - meta-llama/Meta-Llama-3-8B
9
- pipeline_tag: text-generation
10
- library_name: transformers
11
- ---
12
-
13
- # ELAINE-medllm - Build with Llama3-8B
14
-
15
- ELAINE (EngLish-jApanese-chINesE)-medLLM is a trilingual (English, Japanese, Chinese) large language mol adapted for the bio-medical domain based on Llama-3-8B.
16
- The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model.
17
- The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT).
18
- ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model's capability.
19
-
20
-
21
- ## Model Details
22
-
23
- * **Model type**: Please refer to [Llama 3 Github](https://github.com/meta-llama/llama3) for details on the model architecture.
24
- * **Language(s)**: English, Japanese, Chinese
25
- * **Library**: [DeepSpeed](hhttps://github.com/microsoft/DeepSpeed)
26
- * **Tokenizer**: Please refer to [Llama 3 blog](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the tokenizer.
27
-
28
-
29
- ## Model Performance
30
-
31
- ## Evaluation Benchmarks
32
-
33
- The evaluation behchmark dataset and evaluation code can be obtained from [this Github site](https://github.com/aistairc/medLLM_QA_benchmark).
34
- The details of the bechmark are as follows.
35
-
36
- ### English evaluation benchmarks
37
-
38
- - [MedQA](https://arxiv.org/abs/2009.13081)
39
- - [MedQA-4op](https://arxiv.org/abs/2009.13081)
40
- - [MMLU](https://arxiv.org/abs/2009.03300)
41
- - [MedMCQA](https://proceedings.mlr.press/v174/pal22a.html)
42
- - [PubMedQA](https://doi.org/10.18653/v1/D19-1259)
43
-
44
- ### Japanese evaluation benchmarks
45
- - [IgakuQA](https://github.com/jungokasai/IgakuQA)
46
- - We concatenate the original exam data from 2018 to 2022 into a single JSON file.
47
- - [JJSIMQA](https://arxiv.org/abs/2310.10083)
48
- - DenQA
49
- - It contains the exam problems from the Japan National Dentistry Examination and their answers in the past two years (from 2023 through 2024) extracted from the official website of the Ministry of Health, Labor and Welfare in Japan (https://www.mhlw.go.jp/stf/english/index.html).
50
-
51
-
52
- ### Chinese evaluation benchmarks
53
- - [MedQA](https://arxiv.org/abs/2009.13081)
54
- - [MedQA-4op](https://arxiv.org/abs/2009.13081)
55
- - [CMExam](https://arxiv.org/abs/2306.03030)
56
-
57
-
58
- ## Training Datasets
59
-
60
- ### Continued pre-training
61
-
62
- For continued pretraining, we collected English, Japanese, and Chinese text in the bio-medical domain.
63
- The domain text collected is classified into six categories: 1) scientific papers, 2) medical guidelines, 3) web text related to biomedical, 4) textbook of biomedical, 5) PubMed abstracts, and 6) PubMed Central (PMC) archives.
64
- For the Japanese PubMed abstract, we used the original English PubMed abstract translated in Japanese.
65
- We used only open-licensed text except for the Japanese biomedical papers from [J-STAGE](https://www.jstage.jst.go.jp/browse/-char/en).
66
-
67
- ### Instruction supervised fine-tuning
68
-
69
- We collected various conversational QA datasets in the bio-medical domain from different data sources.
70
- For English, we used Medical Meadow in MedAlpca, HealthCareMagic, and iClilic dataset used in ChatDoctor.
71
- We adapted the augmented QA dataset from HuatuoGPT-2 for Chinese and English.
72
- For Japanese, we used the translation from the English dataset.
73
-
74
- ### Results
75
-
76
- ## English benchmark
77
-
78
-
79
- | model_name | MMLU | MedMCQA | MedQA | MedQA-4op | PubMedQA | Avg |
80
- |------------------------------------------------|--------|---------|--------|-----------|----------|--------|
81
- | google_gemma-7b-it | 50.55 | 41.07 | 33.12 | 39.67 | 67.07 | 46.30 |
82
- | meta-llama_Llama-2-7b-chat-hf | 48.71 | 35.97 | 30.99 | 38.09 | 63.64 | 43.48 |
83
- | meta-llama_Meta-Llama-3-8B-Instruct | 72.79 | 60.89 | 57.65 | 61.28 | 78.99 | 66.32 |
84
- | tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 66.88 | 53.85 | 47.95 | 56.07 | 64.65 | 57.88 |
85
- | medalpaca_medalpaca-7b | 51.48 | 36.02 | 31.15 | 39.35 | 55.15 | 42.63 |
86
- | epfl-llm_meditron-7b | 47.32 | 34.35 | 29.18 | 32.26 | 39.19 | 36.46 |
87
- | aaditya_Llama3-OpenBioLLM-8B | 73.43 | 55.03 | 50.00 | 56.78 | 65.86 | 60.22 |
88
- | FreedomIntelligence_Apollo-7B | 68.17 | 53.85 | 45.98 | 53.86 | 75.35 | 59.44 |
89
- | llm-jp-3-7.2b-instruct3 | 47.05 | 36.33 | 30.05 | 36.99 | 69.09 | 43.90 |
90
- | Llama3-ELAINE-medLLM-instruct-8B | 72.69 | 55.07 | 55.76 | 61.36 | 75.35 | 64.05 |
91
- | <ins>Llama3-ELAINE-medLLM-instruct-8B_v0.1</ins> | 73.43 | 52.25 | 54.57 | 60.49 | 70.30 | 62.20 |
92
-
93
-
94
- ## Japanese benchmark
95
-
96
- | model_name | DenQA | IgakuQA | JJSIMQA | Avg |
97
- |------------------------------------------------|--------|---------|---------|--------|
98
- | google_gemma-7b-it | 13.71 | 25.51 | 12.09 | 17.10 |
99
- | meta-llama_Llama-2-7b-chat-hf | 12.03 | 20.80 | 10.55 | 14.46 |
100
- | meta-llama_Meta-Llama-3-8B-Instruct | 19.72 | 40.45 | 25.93 | 28.70 |
101
- | tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 23.78 | 44.01 | 26.81 | 31.53 |
102
- | medalpaca_medalpaca-7b | 10.91 | 17.74 | 10.77 | 13.14 |
103
- | epfl-llm_meditron-7b | 9.79 | 18.20 | 8.35 | 12.11 |
104
- | aaditya_Llama3-OpenBioLLM-8B | 18.18 | 33.03 | 21.98 | 24.40 |
105
- | FreedomIntelligence_Apollo-7B | 17.90 | 32.28 | 20.66 | 23.61 |
106
- | llm-jp-3-7.2b-instruct3 | 18.18 | 30.78 | 19.78 | 22.91 |
107
- | Llama3-ELAINE-medLLM-instruct-8B | 22.24 | 43.36 | 24.40 | 30.00 |
108
- | <ins>Llama3-ELAINE-medLLM-instruct-8B_v0.1</ins> | 22.38 | 43.36 | 27.69 | 31.14 |
109
-
110
-
111
- ## Chinese benchmark
112
-
113
- | model_name | CMExam | MedQA | MedQA-4op | Avg |
114
- |------------------------------------------------|--------|--------|-----------|--------|
115
- | google_gemma-7b-it | 30.90 | 29.03 | 34.96 | 31.63 |
116
- | meta-llama_Llama-2-7b-chat-hf | 25.43 | 25.37 | 32.30 | 27.70 |
117
- | meta-llama_Meta-Llama-3-8B-Instruct | 52.01 | 62.99 | 68.40 | 61.13 |
118
- | tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 41.11 | 45.05 | 51.27 | 45.81 |
119
- | medalpaca_medalpaca-7b | 23.58 | 24.99 | 30.11 | 26.23 |
120
- | epfl-llm_meditron-7b | 23.85 | 25.46 | 29.82 | 26.38 |
121
- | aaditya_Llama3-OpenBioLLM-8B | 39.07 | 42.59 | 48.73 | 43.46 |
122
- | FreedomIntelligence_Apollo-7B | 49.99 | 58.29 | 62.99 | 57.09 |
123
- | llm-jp-3-7.2b-instruct3 | 27.28 | 29.17 | 34.14 | 30.20 |
124
- | Llama3-ELAINE-medLLM-instruct-8B | 48.85 | 55.80 | 61.59 | 55.41 |
125
- | <ins>Llama3-ELAINE-medLLM-instruct-8B_v0.1</ins> | 47.41 | 53.11 | 60.39 | 53.64 |
126
-
127
- ## Changes
128
-
129
- Llama3-ERAINE-medLLM-instruct-8B_v0.1 is a bug-fixed version of lama3-ERAINE-medLLM-instruct-8B.
130
- It improves Japanese performance at the expense of English and Chinese performance.
131
- In general, longer answers are generated than the original.
132
- We recommend to use Llama3-ERAINE-medLLM-instruct-8B_v0.1 instead of Llama3-ERAINE-medLLM-instruct-8B.
133
-
134
-
135
- ## Sample usage
136
-
137
- ```
138
- import torch
139
- from vllm import LLM, SamplingParams
140
-
141
-
142
- messages_en = [
143
- {"role": "System", "content": "You are an AI Health Assistant"},
144
- {"role": "User", "content": "How high is hypertension?"},
145
- {"role": "User", "content": "How can depression be cured?"},
146
- {"role": "User", "content": "What are the possible causes of autism?"},
147
- {"role": "User", "content": "I have allergic rhinitis, are there any good medications?"},
148
- {"role": "User", "content": "What is a stroke and is there a treatment for it?"},
149
- {"role": "User", "content": "What is sudden hearing loss? Is there a treatment?"},
150
- {"role": "User", "content": "Tell me the difference between glaucoma and cataract."},
151
- {"role": "User", "content": "What is the normal level of uric acid levels?" },
152
- {"role": "User", "content": "What are the symptoms and causes of osteoporosis and how is it treated?" },
153
- {"role": "User", "content": "What is the best way to prevent high blood pressure?" },
154
- {"role": "User", "content": "How can I use contraception?"},
155
- {"role": "User", "content": "What can I do to prevent stroke?" },
156
- {"role": "User", "content": "Can depression be treated with medication?" },
157
- {"role": "User", "content": "What is polycythemia vera?" },
158
- {"role": "User", "content": "What are the diseases caused by stress?"},
159
- ]
160
-
161
-
162
- messages_ja = [
163
- {"role": "System", "content": "あなたはAIヘルスアシスタントです" },
164
- {"role": "User", "content": "高血圧とはどれくらいの血圧でしょうか?"},
165
- {"role": "User", "content": "うつ病はどのようにすれば治りますか?"},
166
- {"role": "User", "content": "自閉症はどんな原因が考えられますか?"},
167
- {"role": "User", "content": "アレルギー性鼻炎がありますが、いい薬はありますか?"},
168
- {"role": "User", "content": "脳梗塞とはどんな病気で、治療法はあるでしょうか?"},
169
- {"role": "User", "content": "突発性難聴とはどんな病気ですか?治療法はありますか?"},
170
- {"role": "User", "content": "緑内障と白内障の違いを教えて"},
171
- {"role": "User", "content": "尿酸値の値はどこまでが正常値ですか?"},
172
- {"role": "User", "content": "骨粗しょう症の症状と原因と治療法について教えてください。"},
173
- {"role": "User", "content": "高血圧を予防するにはどんな事がいいですか?"},
174
- {"role": "User", "content": "脳卒中を予防するにはどうしたらいいですか?"},
175
- {"role": "User", "content": "ストレスが原因となる病気はなんですか?"},
176
- ]
177
-
178
- messages_zh = [
179
- {"role": "System", "content": "你是一名人工智能健康助理。" },
180
- {"role": "User", "content": "高血压有多高?"},
181
- {"role": "User", "content": "如何治愈抑郁症?"},
182
- {"role": "User", "content": "自闭症的可能病因是什么?"},
183
- {"role": "User", "content": "我有过敏性鼻炎,有什么好药吗?"},
184
- {"role": "User", "content": "什么是中风,有治疗方法吗?"},
185
- {"role": "User", "content": "什么是突发性听力损失? 有治疗方法吗?"},
186
- {"role": "User", "content": "青光眼和白内障有什么区别?"},
187
- {"role": "User", "content": "尿酸的正常水平是多少?"},
188
- {"role": "User", "content": "骨质疏松症有哪些症状和原因,如何治疗?"},
189
- {"role": "User", "content": "如何预防中风?"},
190
- {"role": "User", "content": "抑郁症可以通过药物治疗吗?"},
191
- {"role": "User", "content": "什么是红细胞增多症?"},
192
- {"role": "User", "content": "压力会导致哪些疾病?"},
193
- ]
194
-
195
- vllm_paralell = 1
196
-
197
- model_path = "kenyano/Llama3-ELAINE-medLLM-instruct-8B_v0.1"
198
-
199
- llm = LLM(model=model_path, tensor_parallel_size=vllm_paralell, dtype='half')
200
-
201
- sampling_params = SamplingParams(
202
- n = 1,
203
- temperature=0.6,
204
- top_p=0.95,
205
- repetition_penalty=1.2,
206
- max_tokens=1024,
207
- min_tokens=50,
208
- stop = ['<|eot_id|>', '<|end_of_text|>'])
209
-
210
- def gen_prompt(messages, cont):
211
-
212
- prompt = ""
213
- for message in messages:
214
- t = f"##User##\n{message['content']}" + "<|eot_id|>"
215
- prompt += t
216
-
217
- if cont:
218
- prompt += "##Assistant##\n"
219
-
220
- return prompt
221
-
222
- for messages in [messages_en, messages_ja, messages_zh]:
223
-
224
- for i in range(len(messages)-1):
225
-
226
- inputs = [messages[i+1]]
227
- prompt = gen_prompt(inputs, True)
228
- print(f"prompt:{prompt}")
229
-
230
- outputs = llm.generate([prompt], sampling_params)
231
- generated_text = outputs[0].outputs[0].text
232
-
233
- print("-"*10)
234
- print(generated_text)
235
-
236
- ```
237
-
238
-
239
- ## Risks and Limitations
240
-
241
- The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
242
-
243
-
244
- ## Acknowledgements
245
-
246
- We thank Meta Research for releasing Llama 3 under a generous open license.
247
-
248
-
249
- ## Contact
250
-
251
- - Ken Yano [[email protected]]
252
-
253
-
254
-
255
- ## How to cite
256
-
257
- If you find our work helpful, please feel free to cite these papers.
258
-
259
- ```
260
- @inproceedings{yano-etal-2025-elaine,
261
- title = "{ELAINE}-med{LLM}: Lightweight {E}nglish {J}apanese {C}hinese Trilingual Large Language Model for Bio-medical Domain",
262
- author = "Yano, Ken and
263
- Luo, Zheheng and
264
- Huang, Jimin and
265
- Xie, Qianqian and
266
- Asada, Masaki and
267
- Yuan, Chenhan and
268
- Yang, Kailai and
269
- Miwa, Makoto and
270
- Ananiadou, Sophia and
271
- Tsujii, Jun{'}ichi",
272
- editor = "Rambow, Owen and
273
- Wanner, Leo and
274
- Apidianaki, Marianna and
275
- Al-Khalifa, Hend and
276
- Eugenio, Barbara Di and
277
- Schockaert, Steven",
278
- booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
279
- month = jan,
280
- year = "2025",
281
- address = "Abu Dhabi, UAE",
282
- publisher = "Association for Computational Linguistics",
283
- url = "https://aclanthology.org/2025.coling-main.313/",
284
- pages = "4670--4688",
285
- }
286
- ```
287
-
288
-
 
1
+ ---
2
+ license: llama3
3
+ language:
4
+ - en
5
+ - ja
6
+ - zh
7
+ base_model:
8
+ - meta-llama/Meta-Llama-3-8B
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ ---
12
+
13
+ # ELAINE-medllm - Build with Llama3-8B
14
+
15
+ ELAINE (EngLish-jApanese-chINesE)-medLLM is a trilingual (English, Japanese, Chinese) large language mol adapted for the bio-medical domain based on Llama-3-8B.
16
+ The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model.
17
+ The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT).
18
+ ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model's capability.
19
+
20
+
21
+ ## Model Details
22
+
23
+ * **Model type**: Please refer to [Llama 3 Github](https://github.com/meta-llama/llama3) for details on the model architecture.
24
+ * **Language(s)**: English, Japanese, Chinese
25
+ * **Library**: [DeepSpeed](hhttps://github.com/microsoft/DeepSpeed)
26
+ * **Tokenizer**: Please refer to [Llama 3 blog](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the tokenizer.
27
+
28
+
29
+ ## Model Performance
30
+
31
+ ## Evaluation Benchmarks
32
+
33
+ The evaluation behchmark dataset and evaluation code can be obtained from [this Github site](https://github.com/aistairc/medLLM_QA_benchmark).
34
+ The details of the bechmark are as follows.
35
+
36
+ ### English evaluation benchmarks
37
+
38
+ - [MedQA](https://arxiv.org/abs/2009.13081)
39
+ - [MedQA-4op](https://arxiv.org/abs/2009.13081)
40
+ - [MMLU](https://arxiv.org/abs/2009.03300)
41
+ - [MedMCQA](https://proceedings.mlr.press/v174/pal22a.html)
42
+ - [PubMedQA](https://doi.org/10.18653/v1/D19-1259)
43
+
44
+ ### Japanese evaluation benchmarks
45
+ - [IgakuQA](https://github.com/jungokasai/IgakuQA)
46
+ - We concatenate the original exam data from 2018 to 2022 into a single JSON file.
47
+ - [JJSIMQA](https://arxiv.org/abs/2310.10083)
48
+ - DenQA
49
+ - It contains the exam problems from the Japan National Dentistry Examination and their answers in the past two years (from 2023 through 2024) extracted from the official website of the Ministry of Health, Labor and Welfare in Japan (https://www.mhlw.go.jp/stf/english/index.html).
50
+
51
+
52
+ ### Chinese evaluation benchmarks
53
+ - [MedQA](https://arxiv.org/abs/2009.13081)
54
+ - [MedQA-4op](https://arxiv.org/abs/2009.13081)
55
+ - [CMExam](https://arxiv.org/abs/2306.03030)
56
+
57
+
58
+ ## Training Datasets
59
+
60
+ ### Continued pre-training
61
+
62
+ For continued pretraining, we collected English, Japanese, and Chinese text in the bio-medical domain.
63
+ The domain text collected is classified into six categories: 1) scientific papers, 2) medical guidelines, 3) web text related to biomedical, 4) textbook of biomedical, 5) PubMed abstracts, and 6) PubMed Central (PMC) archives.
64
+ For the Japanese PubMed abstract, we used the original English PubMed abstract translated in Japanese.
65
+ We used only open-licensed text except for the Japanese biomedical papers from [J-STAGE](https://www.jstage.jst.go.jp/browse/-char/en).
66
+
67
+ ### Instruction supervised fine-tuning
68
+
69
+ We collected various conversational QA datasets in the bio-medical domain from different data sources.
70
+ For English, we used Medical Meadow in MedAlpca, HealthCareMagic, and iClilic dataset used in ChatDoctor.
71
+ We adapted the augmented QA dataset from HuatuoGPT-2 for Chinese and English.
72
+ For Japanese, we used the translation from the English dataset.
73
+
74
+ ### Results
75
+
76
+ ## English benchmark
77
+
78
+
79
+ | model_name | MMLU | MedMCQA | MedQA | MedQA-4op | PubMedQA | Avg |
80
+ |------------------------------------------------|--------|---------|--------|-----------|----------|--------|
81
+ | google_gemma-7b-it | 50.55 | 41.07 | 33.12 | 39.67 | 67.07 | 46.30 |
82
+ | meta-llama_Llama-2-7b-chat-hf | 48.71 | 35.97 | 30.99 | 38.09 | 63.64 | 43.48 |
83
+ | meta-llama_Meta-Llama-3-8B-Instruct | 72.79 | 60.89 | 57.65 | 61.28 | 78.99 | 66.32 |
84
+ | tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 66.88 | 53.85 | 47.95 | 56.07 | 64.65 | 57.88 |
85
+ | medalpaca_medalpaca-7b | 51.48 | 36.02 | 31.15 | 39.35 | 55.15 | 42.63 |
86
+ | epfl-llm_meditron-7b | 47.32 | 34.35 | 29.18 | 32.26 | 39.19 | 36.46 |
87
+ | aaditya_Llama3-OpenBioLLM-8B | 73.43 | 55.03 | 50.00 | 56.78 | 65.86 | 60.22 |
88
+ | FreedomIntelligence_Apollo-7B | 68.17 | 53.85 | 45.98 | 53.86 | 75.35 | 59.44 |
89
+ | llm-jp-3-7.2b-instruct3 | 47.05 | 36.33 | 30.05 | 36.99 | 69.09 | 43.90 |
90
+ | Llama3-ELAINE-medLLM-instruct-8B | 72.69 | 55.07 | 55.76 | 61.36 | 75.35 | 64.05 |
91
+ | <ins>Llama3-ELAINE-medLLM-instruct-8B_v0.1</ins> | 73.43 | 52.25 | 54.57 | 60.49 | 70.30 | 62.20 |
92
+
93
+
94
+ ## Japanese benchmark
95
+
96
+ | model_name | DenQA | IgakuQA | JJSIMQA | Avg |
97
+ |------------------------------------------------|--------|---------|---------|--------|
98
+ | google_gemma-7b-it | 13.71 | 25.51 | 12.09 | 17.10 |
99
+ | meta-llama_Llama-2-7b-chat-hf | 12.03 | 20.80 | 10.55 | 14.46 |
100
+ | meta-llama_Meta-Llama-3-8B-Instruct | 19.72 | 40.45 | 25.93 | 28.70 |
101
+ | tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 23.78 | 44.01 | 26.81 | 31.53 |
102
+ | medalpaca_medalpaca-7b | 10.91 | 17.74 | 10.77 | 13.14 |
103
+ | epfl-llm_meditron-7b | 9.79 | 18.20 | 8.35 | 12.11 |
104
+ | aaditya_Llama3-OpenBioLLM-8B | 18.18 | 33.03 | 21.98 | 24.40 |
105
+ | FreedomIntelligence_Apollo-7B | 17.90 | 32.28 | 20.66 | 23.61 |
106
+ | llm-jp-3-7.2b-instruct3 | 18.18 | 30.78 | 19.78 | 22.91 |
107
+ | Llama3-ELAINE-medLLM-instruct-8B | 22.24 | 43.36 | 24.40 | 30.00 |
108
+ | <ins>Llama3-ELAINE-medLLM-instruct-8B_v0.1</ins> | 22.38 | 43.36 | 27.69 | 31.14 |
109
+
110
+
111
+ ## Chinese benchmark
112
+
113
+ | model_name | CMExam | MedQA | MedQA-4op | Avg |
114
+ |------------------------------------------------|--------|--------|-----------|--------|
115
+ | google_gemma-7b-it | 30.90 | 29.03 | 34.96 | 31.63 |
116
+ | meta-llama_Llama-2-7b-chat-hf | 25.43 | 25.37 | 32.30 | 27.70 |
117
+ | meta-llama_Meta-Llama-3-8B-Instruct | 52.01 | 62.99 | 68.40 | 61.13 |
118
+ | tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 41.11 | 45.05 | 51.27 | 45.81 |
119
+ | medalpaca_medalpaca-7b | 23.58 | 24.99 | 30.11 | 26.23 |
120
+ | epfl-llm_meditron-7b | 23.85 | 25.46 | 29.82 | 26.38 |
121
+ | aaditya_Llama3-OpenBioLLM-8B | 39.07 | 42.59 | 48.73 | 43.46 |
122
+ | FreedomIntelligence_Apollo-7B | 49.99 | 58.29 | 62.99 | 57.09 |
123
+ | llm-jp-3-7.2b-instruct3 | 27.28 | 29.17 | 34.14 | 30.20 |
124
+ | Llama3-ELAINE-medLLM-instruct-8B | 48.85 | 55.80 | 61.59 | 55.41 |
125
+ | <ins>Llama3-ELAINE-medLLM-instruct-8B_v0.1</ins> | 47.41 | 53.11 | 60.39 | 53.64 |
126
+
127
+ ## Changes
128
+
129
+ Llama3-ELAINE-medLLM-instruct-8B_v0.1 is a bug-fixed version of lama3-ELAINE-medLLM-instruct-8B.
130
+ It improves Japanese performance at the expense of English and Chinese performance.
131
+ In general, longer answers are generated than the original.
132
+ We recommend to use Llama3-ELAINE-medLLM-instruct-8B_v0.1 instead of Llama3-ELAINE-medLLM-instruct-8B.
133
+
134
+
135
+ ## Sample usage
136
+
137
+ ```
138
+ import torch
139
+ from vllm import LLM, SamplingParams
140
+
141
+
142
+ messages_en = [
143
+ {"role": "System", "content": "You are an AI Health Assistant"},
144
+ {"role": "User", "content": "How high is hypertension?"},
145
+ {"role": "User", "content": "How can depression be cured?"},
146
+ {"role": "User", "content": "What are the possible causes of autism?"},
147
+ {"role": "User", "content": "I have allergic rhinitis, are there any good medications?"},
148
+ {"role": "User", "content": "What is a stroke and is there a treatment for it?"},
149
+ {"role": "User", "content": "What is sudden hearing loss? Is there a treatment?"},
150
+ {"role": "User", "content": "Tell me the difference between glaucoma and cataract."},
151
+ {"role": "User", "content": "What is the normal level of uric acid levels?" },
152
+ {"role": "User", "content": "What are the symptoms and causes of osteoporosis and how is it treated?" },
153
+ {"role": "User", "content": "What is the best way to prevent high blood pressure?" },
154
+ {"role": "User", "content": "How can I use contraception?"},
155
+ {"role": "User", "content": "What can I do to prevent stroke?" },
156
+ {"role": "User", "content": "Can depression be treated with medication?" },
157
+ {"role": "User", "content": "What is polycythemia vera?" },
158
+ {"role": "User", "content": "What are the diseases caused by stress?"},
159
+ ]
160
+
161
+
162
+ messages_ja = [
163
+ {"role": "System", "content": "あなたはAIヘルスアシスタントです" },
164
+ {"role": "User", "content": "高血圧とはどれくらいの血圧でしょうか?"},
165
+ {"role": "User", "content": "うつ病はどのようにすれば治りますか?"},
166
+ {"role": "User", "content": "自閉症はどんな原因が考えられますか?"},
167
+ {"role": "User", "content": "アレルギー性鼻炎がありますが、いい薬はありますか?"},
168
+ {"role": "User", "content": "脳梗塞とはどんな病気で、治療法はあるでしょうか?"},
169
+ {"role": "User", "content": "突発性難聴とはどんな病気ですか?治療法はありますか?"},
170
+ {"role": "User", "content": "緑内障と白内障の違いを教えて"},
171
+ {"role": "User", "content": "尿酸値の値はどこまでが正常値ですか?"},
172
+ {"role": "User", "content": "骨粗しょう症の症状と原因と治療法について教えてください。"},
173
+ {"role": "User", "content": "高血圧を予防するにはどんな事がいいですか?"},
174
+ {"role": "User", "content": "脳卒中を予防するにはどうしたらいいですか?"},
175
+ {"role": "User", "content": "ストレスが原因となる病気はなんですか?"},
176
+ ]
177
+
178
+ messages_zh = [
179
+ {"role": "System", "content": "你是一名人工智能健康助理。" },
180
+ {"role": "User", "content": "高血压有多高?"},
181
+ {"role": "User", "content": "如何治愈抑郁症?"},
182
+ {"role": "User", "content": "自闭症的可能病因是什么?"},
183
+ {"role": "User", "content": "我有过敏性鼻炎,有什么好药吗?"},
184
+ {"role": "User", "content": "什么是中风,有治疗方法吗?"},
185
+ {"role": "User", "content": "什么是突发性听力损失? 有治疗方法吗?"},
186
+ {"role": "User", "content": "青光眼和白内障有什么区别?"},
187
+ {"role": "User", "content": "尿酸的正常水平是多少?"},
188
+ {"role": "User", "content": "骨质疏松症有哪些症状和原因,如何治疗?"},
189
+ {"role": "User", "content": "如何预防中风?"},
190
+ {"role": "User", "content": "抑郁症可以通过药物治疗吗?"},
191
+ {"role": "User", "content": "什么是红细胞增多症?"},
192
+ {"role": "User", "content": "压力会导致哪些疾病?"},
193
+ ]
194
+
195
+ vllm_paralell = 1
196
+
197
+ model_path = "kenyano/Llama3-ELAINE-medLLM-instruct-8B_v0.1"
198
+
199
+ llm = LLM(model=model_path, tensor_parallel_size=vllm_paralell, dtype='half')
200
+
201
+ sampling_params = SamplingParams(
202
+ n = 1,
203
+ temperature=0.6,
204
+ top_p=0.95,
205
+ repetition_penalty=1.2,
206
+ max_tokens=1024,
207
+ min_tokens=50,
208
+ stop = ['<|eot_id|>', '<|end_of_text|>'])
209
+
210
+ def gen_prompt(messages, cont):
211
+
212
+ prompt = ""
213
+ for message in messages:
214
+ t = f"##User##\n{message['content']}" + "<|eot_id|>"
215
+ prompt += t
216
+
217
+ if cont:
218
+ prompt += "##Assistant##\n"
219
+
220
+ return prompt
221
+
222
+ for messages in [messages_en, messages_ja, messages_zh]:
223
+
224
+ for i in range(len(messages)-1):
225
+
226
+ inputs = [messages[i+1]]
227
+ prompt = gen_prompt(inputs, True)
228
+ print(f"prompt:{prompt}")
229
+
230
+ outputs = llm.generate([prompt], sampling_params)
231
+ generated_text = outputs[0].outputs[0].text
232
+
233
+ print("-"*10)
234
+ print(generated_text)
235
+
236
+ ```
237
+
238
+
239
+ ## Risks and Limitations
240
+
241
+ The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
242
+
243
+
244
+ ## Acknowledgements
245
+
246
+ We thank Meta Research for releasing Llama 3 under a generous open license.
247
+
248
+
249
+ ## Contact
250
+
251
+ - Ken Yano [[email protected]]
252
+
253
+
254
+
255
+ ## How to cite
256
+
257
+ If you find our work helpful, please feel free to cite these papers.
258
+
259
+ ```
260
+ @inproceedings{yano-etal-2025-elaine,
261
+ title = "{ELAINE}-med{LLM}: Lightweight {E}nglish {J}apanese {C}hinese Trilingual Large Language Model for Bio-medical Domain",
262
+ author = "Yano, Ken and
263
+ Luo, Zheheng and
264
+ Huang, Jimin and
265
+ Xie, Qianqian and
266
+ Asada, Masaki and
267
+ Yuan, Chenhan and
268
+ Yang, Kailai and
269
+ Miwa, Makoto and
270
+ Ananiadou, Sophia and
271
+ Tsujii, Jun{'}ichi",
272
+ editor = "Rambow, Owen and
273
+ Wanner, Leo and
274
+ Apidianaki, Marianna and
275
+ Al-Khalifa, Hend and
276
+ Eugenio, Barbara Di and
277
+ Schockaert, Steven",
278
+ booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
279
+ month = jan,
280
+ year = "2025",
281
+ address = "Abu Dhabi, UAE",
282
+ publisher = "Association for Computational Linguistics",
283
+ url = "https://aclanthology.org/2025.coling-main.313/",
284
+ pages = "4670--4688",
285
+ }
286
+ ```
287
+
288
+