PurelyUnfunctionalAI
/

GibberishGPT

Question Answering

flash_attention_lm

flash-attention

Model card Files Files and versions

GibberishGPT / README.md

michaelgathara's picture

Update README.md

5579881 verified 5 months ago

|

history blame contribute delete

2.13 kB

	---
	language: en
	license: mit
	tags:
	- pytorch
	- causal-lm
	- language-model
	- flash-attention
	datasets:
	- Salesforce/wikitext
	pipeline_tag: question-answering
	---

	# PurelyUnfunctionalAI/GibberishGPT

	A lightweight decoder-only transformer language model trained with Flash Attention on the WikiText dataset. This is a version used for learning about training LLMs and ML pipelines. The model does not actually output coherent text, although serves as a good starting point for learning more about LLMs
	<a href="https://github.com/PUFAI/GibberishGPT"> <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github&style=flat-square"> </a>

	## Model Details

	- Model Type: Causal Language Model
	- Architecture: Decoder-only Transformer
	- Embedding Size: 512
	- Hidden Layers: 8
	- Attention Heads: 8
	- Context Length: 512
	- Flash Attention: Enabled
	- Training Data: Salesforce/wikitext

	## Usage

	```python
	import torch
	import tiktoken
	from transformers import AutoModelForCausalLM

	# Load the tokenizer
	tokenizer = tiktoken.get_encoding("gpt2")

	# Load the model
	model = AutoModelForCausalLM.from_pretrained("PurelyUnfunctionalAI/GibberishGPT")

	# Encode input
	input_text = "Your prompt here"
	input_ids = tokenizer.encode(input_text)
	input_tensor = torch.tensor([input_ids], dtype=torch.long)

	# Generate
	output = model.generate(input_tensor, max_length=100)
	generated_text = tokenizer.decode(output[0].tolist())
	print(generated_text)
	```

	# Limitations

	- The model has a context length of 512 tokens
	- It was trained on WikiText data which may not cover specialized domains
	- As a lightweight model, it may not perform as well as larger LLMs on complex tasks

	# Citation
	If you use this model in your research, please cite:

	```
	@misc{GibberishGPT,
	author = {Gathara, Michael and Menon, Vaishak and Liu, Jason},
	title = {GibberishGPT: A Lightweight Language Model with Flash Attention},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face model repository},
	howpublished = {\url{https://huggingface.co/PurelyUnfunctionalAI/GibberishGPT}}
	}
	```