swagat-panda
/

multilingual-pos-tagger-language-detection-indian-context-muril

Model card Files Files and versions

swagat-panda commited on Jun 1

Commit

f7924b4

·

verified ·

1 Parent(s): 6cb5769

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -107,6 +107,24 @@ class BertCRF(BertPreTrainedModel):
         return loss, tags
 ```
 Some sample output from the model
 This model uses a different kind of labelling system from it will not only be able to detect language, as well as it can detect the POS of the respective language

         return loss, tags
 ```
+```commandline
+with io.open('./multilingual-pos-tagger-language-detection-indian-context-muril/label_encoder.pkl', 'rb') as f:
+        le = cloudpickle.load(f, encoding="latin-1")
+model = BertCRF.from_pretrained('./multilingual-pos-tagger-language-detection-indian-context-muril/', num_labels=210)
+tokenizer = BertTokenizerFast.from_pretrained('./data/muril-base-cased/')
+corpus='maru naam swagat che'
+inputs = tokenizer(corpus, max_length=512, padding=True, truncation=True, return_tensors='pt',
+                   return_offsets_mapping=True)
+offset_mapping = inputs.pop("offset_mapping").cpu().numpy().tolist()
+outputs = model(**inputs)
+print(decode(outputs[1].numpy().tolist(), inputs['input_ids'].numpy().tolist(), offset_mapping, list(le.inverse_transform(list(range(209))))))
+##[{'words': ['maru', 'naam', 'swagat', 'che'], 'labels': ['gu_rom-PRP', 'gu_rom-NN', 'gu_rom-NNP', 'gu_rom-VAUX']}]
+```
 Some sample output from the model
 This model uses a different kind of labelling system from it will not only be able to detect language, as well as it can detect the POS of the respective language