somosnlp-hackathon-2023
/

baizemocracy-lora-7B-cfqa

@@ -13,19 +13,39 @@ tags:
 - RAG
 - Retrieval Augmented Generation
 ---
 <h1>
-<a alt="Ask2Democracy project" href="https://github.com/jorge-henao/ask2democracy">Ask2Democracy project</a>
 </h1>
 <hr>
-## What's baizemocracy-lora-7B-cfqa model?
-This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
-- **Developed by:**
-- 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
 - 🇨🇴 [David Torres ](https://github.com/datorresb)
 ## Training Parameters
 - Base Model: [LLaMA-7B](https://arxiv.org/pdf/2302.13971.pdf)
@@ -45,4 +65,23 @@ This model is an open-source chat model fine-tuned with [LoRA](https://github.co
 - [Alpacaca chat Dialogs](https://github.com/project-baize/baize)
 - [Medical chat Dialogs](https://github.com/project-baize/baize)
 More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)

 - RAG
 - Retrieval Augmented Generation
 ---
+---
+license: apache-2.0
+---
 <h1>
+<a alt="About Ask2Democracy project" href="https://github.com/jorge-henao/ask2democracy">About Ask2Democracy project</a>
 </h1>
 <hr>
+## About Ask2Democracy project
+This model was developed as part of the Ask2Democracy project during the 2023 Somos NLP Hackathon. Our focus during the hackathon was on enhancing the generative capabilities in spanish training an open source model for this purpose, which is intended to be incorporated into the space demo.
+However, we encountered performance limitations due to the model's large size, which caused issues when running it on limited hardware. Specifically, we observed an inference time of approximately 70 seconds when using a GPU.
+To address this issue, we are currently working on optimizing ways to integrate the model into the AskDemocracy space demo. Remaining work is required in order to improve the model's performance.
+[Further updates are expected to be integrated in the AskDemocracy space demo](https://huggingface.co/spaces/jorge-henao/ask2democracycol)
+**Developed by:**
+- 🇨🇴 [Jorge Henao](https://linktr.ee/jorgehenao)
 - 🇨🇴 [David Torres ](https://github.com/datorresb)
+## What's baizemocracy-lora-7B-cfqa-conv model?
+This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
+Two model variations was trained during the Hackathon Somos NLP 2023:
+- A generative context focused model
+- A conversational style focused model
+This model variation is more focused on source based augmented retrieval generation. See Pre-proccessing dataset section.
+This model variation is focused in a more conversational way of asking questions. [Baizemocracy-conv](https://huggingface.co/hackathon-somos-nlp-2023/baizemocracy-lora-7B-cfqa-conv).
+Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.
 ## Training Parameters
 - Base Model: [LLaMA-7B](https://arxiv.org/pdf/2302.13971.pdf)
 - [Alpacaca chat Dialogs](https://github.com/project-baize/baize)
 - [Medical chat Dialogs](https://github.com/project-baize/baize)
+## About pre-processing
+Ask2Democracy-cfqa-salud-pension dataset was formated like this::
+```python
+def format_ds(example):
+  example["text"] =  (
+    "Given the Context answer the Question. Answers must be source based, use topics to elaborate on the Response if they're provided."
+    #"Answer the question and use any available context or related topics if they are available"
+    + " Question: '{}'".format(example['input'].strip())
+    + " Context: {}".format(example['instruction'].strip())
+    + " Topics: {}".format(example['topics'])
+    + " Response: '{}'".format(example['output'].strip())
+    )
+  return example
+```
 More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)