|
library_name: transformers |
|
license: mit |
|
model_name: MBart-Urdu-Text-Summarization |
|
pipeline_tag: summarization |
|
tags: |
|
- text-generation |
|
- mbart |
|
- nlp |
|
- transformers |
|
- text-generation-inference |
|
author: Wali Muhammad Ahmad |
|
private: false |
|
gated: false |
|
inference: true |
|
mask_token: <mask> |
|
widget_data: |
|
text: Enter your para here |
|
transformers_info: |
|
auto_class: MBartForConditionalGeneration |
|
processor: AutoTokenizer |
|
language: |
|
- en |
|
- ur |
|
--- |
|
|
|
# Model Card |
|
|
|
MBart-Urdu-Text-Summarization is a fine-tuned MBart model designed for summarizing Urdu text. It leverages the multilingual capabilities of MBart to generate concise and accurate summaries for Urdu paragraphs. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is based on the MBart architecture, which is a sequence-to-sequence model pre-trained on multilingual data. It has been fine-tuned specifically for Urdu text summarization tasks. The model is capable of understanding and generating text in both English and Urdu, making it suitable for multilingual applications. |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** [https://github.com/WaliMuhammadAhmad/UrduTextSummarizationUsingm-BART] |
|
- **Paper [Multilingual Denoising Pre-training for Neural Machine Translation]:** [https://arxiv.org/abs/2001.08210] |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used directly for Urdu text summarization tasks. It is suitable for applications such as news summarization, document summarization, and content generation. |
|
|
|
### Downstream Use [optional] |
|
|
|
The model can be fine-tuned for specific downstream tasks such as sentiment analysis, question answering, or machine translation for Urdu and English. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not intended for generating biased, harmful, or misleading content. It should not be used for tasks outside of text summarization without proper fine-tuning and evaluation. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- The model may generate biased or inappropriate content if the input text contains biases. |
|
- It is trained on a specific dataset and may not generalize well to other domains or languages. |
|
- The model's performance may degrade for very long input texts. |
|
|
|
### Recommendations |
|
|
|
Users should carefully evaluate the model's outputs for biases and appropriateness. Fine-tuning on domain-specific data is recommended for better performance in specialized applications. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AutoTokenizer, MBartForConditionalGeneration |
|
|
|
# Load the model and tokenizer |
|
model_name = "ihatenlp/MBart-Urdu-Text-Summarization" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = MBartForConditionalGeneration.from_pretrained(model_name) |
|
|
|
# Example input text |
|
input_text = "Enter your Urdu paragraph here." |
|
|
|
# Tokenize and generate summary |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
summary_ids = model.generate(inputs["input_ids"], max_length=50, num_beams=4, early_stopping=True) |
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
|
|
print("Summary:", summary) |
|
``` |
|
|
|
## Environmental Impact |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
## Citation [optional] |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{liu2020multilingualdenoisingpretrainingneural, |
|
title={Multilingual Denoising Pre-training for Neural Machine Translation}, |
|
author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer}, |
|
year={2020}, |
|
eprint={2001.08210}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2001.08210}, |
|
} |
|
``` |
|
|
|
## Model Card Authors [optional] |
|
|
|
- **Wali Muhammad Ahmad** |
|
- **Muhammad Labeeb Tariq** |
|
|
|
## Model Card Contact |
|
|
|
- **Email:** [[email protected]] |
|
- **Hugging Face Profile:** [Wali Muhammad Ahmad](https://huggingface.co/ihatenlp) |