wikitext-2-raw-v1 / README.md
goabonga's picture
Upload tokenizer files (vocab, config, README)
f8bbc3d verified
|
raw
history blame
398 Bytes
metadata
language: en
tags:
  - tokenizer
  - pytorch
  - streaming
library_name: nano

Nano Tokenizer

This tokenizer was trained using a Python-only pipeline (no transformers or tokenizers), on a dataset streamed from the Hugging Face Hub.

Usage

from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("goabonga/wikitext-2-raw-v1")