Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +59 -0
embeddings.safetensors +3 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# NILC Portuguese Word Embeddings — FastText Skip-Gram 300d
+Pretrained **static word embeddings** for **Portuguese** (Brazilian + European), trained by the [NILC group](http://nilc.icmc.usp.br/) on a large multi-genre corpus (~1.39B tokens, 17 sources).
+This repository contains the **FastText Skip-Gram 300d** model in safetensors format.
+---
+## 📂 Files
+- `embeddings.safetensors` → word vectors (`[vocab_size, 300]`)
+- `vocab.txt` → vocabulary (one token per line, aligned with rows)
+---
+## 🚀 Usage
+```python
+from safetensors.numpy import load_file
+data = load_file("embeddings.safetensors")
+vectors = data["embeddings"]
+with open("vocab.txt") as f:
+    vocab = [w.strip() for w in f]
+word2idx = {w: i for i, w in enumerate(vocab)}
+print(vectors[word2idx["rei"]])  # vector for "rei"
+```
+Or in PyTorch:
+```python
+from safetensors.torch import load_file
+tensors = load_file("embeddings.safetensors")
+vectors = tensors["embeddings"]  # torch.Tensor
+```
+---
+## 📖 Reference
+```bibtex
+@inproceedings{hartmann-etal-2017-portuguese,
+  title        = {{P}ortuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks},
+  author       = {Hartmann, Nathan  and Fonseca, Erick  and Shulby, Christopher  and Treviso, Marcos  and Silva, J{'e}ssica  and Alu{'i}sio, Sandra},
+  year         = 2017,
+  month        = oct,
+  booktitle    = {Proceedings of the 11th {B}razilian Symposium in Information and Human Language Technology},
+  publisher    = {Sociedade Brasileira de Computa{\c{c}}{\~a}o},
+  address      = {Uberl{\^a}ndia, Brazil},
+  pages        = {122--131},
+  url          = {https://aclanthology.org/W17-6615/},
+  editor       = {Paetzold, Gustavo Henrique  and Pinheiro, Vl{'a}dia}
+}
+```
+---
+## 📜 License
+Creative Commons Attribution 4.0 International (CC BY 4.0)

embeddings.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba786989f10489be655d8bd23d06e605542aaa970da2c621f8a28c8ae642a683
+size 1115526096

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff