File size: 2,133 Bytes
6bc1e8d 6def9c1 5579881 79e8e8f 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 6bc1e8d 6def9c1 76b84ed 6def9c1 68556f9 6def9c1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
language: en
license: mit
tags:
- pytorch
- causal-lm
- language-model
- flash-attention
datasets:
- Salesforce/wikitext
pipeline_tag: question-answering
---
# PurelyUnfunctionalAI/GibberishGPT
A lightweight decoder-only transformer language model trained with Flash Attention on the WikiText dataset. This is a version used for learning about training LLMs and ML pipelines. The model does not actually output coherent text, although serves as a good starting point for learning more about LLMs
<a href="https://github.com/PUFAI/GibberishGPT"> <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github&style=flat-square"> </a>
## Model Details
- **Model Type:** Causal Language Model
- **Architecture:** Decoder-only Transformer
- **Embedding Size:** 512
- **Hidden Layers:** 8
- **Attention Heads:** 8
- **Context Length:** 512
- **Flash Attention:** Enabled
- **Training Data:** Salesforce/wikitext
## Usage
```python
import torch
import tiktoken
from transformers import AutoModelForCausalLM
# Load the tokenizer
tokenizer = tiktoken.get_encoding("gpt2")
# Load the model
model = AutoModelForCausalLM.from_pretrained("PurelyUnfunctionalAI/GibberishGPT")
# Encode input
input_text = "Your prompt here"
input_ids = tokenizer.encode(input_text)
input_tensor = torch.tensor([input_ids], dtype=torch.long)
# Generate
output = model.generate(input_tensor, max_length=100)
generated_text = tokenizer.decode(output[0].tolist())
print(generated_text)
```
# Limitations
- The model has a context length of 512 tokens
- It was trained on WikiText data which may not cover specialized domains
- As a lightweight model, it may not perform as well as larger LLMs on complex tasks
# Citation
If you use this model in your research, please cite:
```
@misc{GibberishGPT,
author = {Gathara, Michael and Menon, Vaishak and Liu, Jason},
title = {GibberishGPT: A Lightweight Language Model with Flash Attention},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face model repository},
howpublished = {\url{https://huggingface.co/PurelyUnfunctionalAI/GibberishGPT}}
}
``` |