File size: 2,948 Bytes
			
			| 5eeee0c aa1f2ea 5eeee0c aa1f2ea | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ---
license: other
pipeline_tag: visual-question-answering
---
<p align="center">
    <img src="logo_en.png" width="400"/>
<p>
<p align="center">
    <b><font size="6">InternLM-XComposer2</font></b> 
<p>
<div align="center">
[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
[Paper](https://arxiv.org/abs/2401.16420)
</div>
**InternLM-XComposer2** is a vision-language large model (VLLM) based on [InternLM2](https://github.com/InternLM/InternLM) for advanced text-image comprehension and composition. 
We release InternLM-XComposer2 series in two versions:
- InternLM-XComposer2-VL: The pretrained VLLM model with InternLM2 as the initialization of the LLM, achieving strong performance on various multimodal benchmarks.
- InternLM-XComposer2: The finetuned VLLM for *Free-from Interleaved Text-Image Composition*.
### Import from Transformers
To load the InternLM-XComposer2-VL-1.8B model using Transformers, use the following code:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2-vl-1_8b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()
```
## Quickstart
We provide a simple example to show how to use InternLM-XComposer with 🤗 Transformers.
```python
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-1_8b', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-1_8b', trust_remote_code=True)
query = '<ImageHere>Please describe this image in detail.'
image = './image1.webp'
with torch.cuda.amp.autocast():
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
print(response)
# The image is a captivating photograph of a sunset over a mountainous landscape. The sky, painted in hues of orange and pink,
# serves as a backdrop for two silhouetted figures standing on the mountain. The text on the image, written in white, is a quote 
# from Oscar Wilde, which reads, "Live life with no excuses, travel with no regret." This quote, combined with the serene setting,
# serves as a powerful reminder to embrace life's journey without hesitation or regret.
```
### Open Source License
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].
 | 
