File size: 2,484 Bytes
ff22ecd df9d779 ff22ecd 4880398 b8cee1d 4880398 5dcf050 bf6623b df9d779 fb870c1 4880398 df9d779 fb870c1 5dcf050 fb870c1 df9d779 bf6623b 5dcf050 fb870c1 bf6623b fb870c1 bf6623b df9d779 fb870c1 4880398 df9d779 4880398 fb870c1 bf6623b df9d779 fb870c1 df9d779 5dcf050 fb870c1 4880398 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
license: mit
---
# ERNIE-Layout_Pytorch
[This repository](https://github.com/NormXU/ERNIE-Layout-Pytorch) contains an unofficial ERNIE-Layout implementations in Pytorch, originally released via [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP). The model weight is converted from [PaddlePaddle/ernie-layoutx-base-uncased](https://huggingface.co/PaddlePaddle/ernie-layoutx-base-uncased) to PyTorch style with the [tools/convert2torch.py](https://github.com/NormXU/ERNIE-Layout-Pytorch/blob/main/tools/convert2torch.py) script. Feel free to edit it if necessary.
**A Quick Example**
```python
import torch
from PIL import Image
import torch.nn.functional as F
from networks import ErnieLayoutConfig, ErnieLayoutForQuestionAnswering, \
ErnieLayoutProcessor, ErnieLayoutTokenizerFast
from transformers.models.layoutlmv3 import LayoutLMv3ImageProcessor
pretrain_torch_model_or_path = "Norm/ERNIE-Layout-Pytorch"
doc_imag_path = "./dummy_input.jpeg"
context = ['This is an example sequence', 'All ocr boxes are inserted into this list']
layout = [[381, 91, 505, 115], [738, 96, 804, 122]] # make sure all boxes are normalized between 0 - 1000
pil_image = Image.open(doc_imag_path).convert("RGB")
# initialize tokenizer
tokenizer = ErnieLayoutTokenizerFast.from_pretrained(pretrained_model_name_or_path=pretrain_torch_model_or_path)
# initialize feature extractor
feature_extractor = LayoutLMv3ImageProcessor(apply_ocr=False)
processor = ErnieLayoutProcessor(image_processor=feature_extractor, tokenizer=tokenizer)
# Tokenize context & questions
question = "what is it?"
encoding = processor(pil_image, question, context, boxes=layout, return_tensors="pt")
# dummy answer start && end index
start_positions = torch.tensor([6])
end_positions = torch.tensor([12])
# initialize config
config = ErnieLayoutConfig.from_pretrained(pretrained_model_name_or_path=pretrain_torch_model_or_path)
config.num_classes = 2 # start and end
# initialize ERNIE for VQA
model = ErnieLayoutForQuestionAnswering.from_pretrained(
pretrained_model_name_or_path=pretrain_torch_model_or_path,
config=config,
)
output = model(**encoding, start_positions=start_positions, end_positions=end_positions)
# decode output
start_max = torch.argmax(F.softmax(output.start_logits, dim=-1))
end_max = torch.argmax(F.softmax(output.end_logits, dim=-1)) + 1 # add one ##because of python list indexing
answer = tokenizer.decode(encoding.input_ids[0][start_max: end_max])
print(answer)
``` |