|
--- |
|
language: en |
|
license: mit |
|
tags: |
|
- pytorch |
|
- causal-lm |
|
- language-model |
|
- flash-attention |
|
datasets: |
|
- Salesforce/wikitext |
|
pipeline_tag: question-answering |
|
--- |
|
|
|
# PurelyUnfunctionalAI/GibberishGPT |
|
|
|
A lightweight decoder-only transformer language model trained with Flash Attention on the WikiText dataset. This is a version used for learning about training LLMs and ML pipelines. The model does not actually output coherent text, although serves as a good starting point for learning more about LLMs |
|
<a href="https://github.com/PUFAI/GibberishGPT"> <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github&style=flat-square"> </a> |
|
|
|
## Model Details |
|
|
|
- **Model Type:** Causal Language Model |
|
- **Architecture:** Decoder-only Transformer |
|
- **Embedding Size:** 512 |
|
- **Hidden Layers:** 8 |
|
- **Attention Heads:** 8 |
|
- **Context Length:** 512 |
|
- **Flash Attention:** Enabled |
|
- **Training Data:** Salesforce/wikitext |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
import tiktoken |
|
from transformers import AutoModelForCausalLM |
|
|
|
# Load the tokenizer |
|
tokenizer = tiktoken.get_encoding("gpt2") |
|
|
|
# Load the model |
|
model = AutoModelForCausalLM.from_pretrained("PurelyUnfunctionalAI/GibberishGPT") |
|
|
|
# Encode input |
|
input_text = "Your prompt here" |
|
input_ids = tokenizer.encode(input_text) |
|
input_tensor = torch.tensor([input_ids], dtype=torch.long) |
|
|
|
# Generate |
|
output = model.generate(input_tensor, max_length=100) |
|
generated_text = tokenizer.decode(output[0].tolist()) |
|
print(generated_text) |
|
``` |
|
|
|
# Limitations |
|
|
|
- The model has a context length of 512 tokens |
|
- It was trained on WikiText data which may not cover specialized domains |
|
- As a lightweight model, it may not perform as well as larger LLMs on complex tasks |
|
|
|
# Citation |
|
If you use this model in your research, please cite: |
|
|
|
``` |
|
@misc{GibberishGPT, |
|
author = {Gathara, Michael and Menon, Vaishak and Liu, Jason}, |
|
title = {GibberishGPT: A Lightweight Language Model with Flash Attention}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face model repository}, |
|
howpublished = {\url{https://huggingface.co/PurelyUnfunctionalAI/GibberishGPT}} |
|
} |
|
``` |