.gitattributes CHANGED
@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- REPORT_Benchmarking[[:space:]]the[[:space:]]AI[[:space:]]advantage[[:space:]]in[[:space:]]finance.pdf filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
README.md DELETED
@@ -1,140 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - ibm-granite/granite-vision-3.3-2b
7
- library_name: transformers
8
- ---
9
- # granite-vision-3.3-2b-embedding
10
- **Model Summary:**
11
- Granite-vision-3.3-2b-embedding is an efficient embedding model based on [granite-vision-3.3-2b](https://huggingface.co/ibm-granite/granite-vision-3.3-2b). This model is specifically designed for multimodal document retrieval, enabling queries on documents with tables, charts, infographics, and complex layouts. The model generates ColBERT-style multi-vector representations of pages.
12
- By removing the need for OCR-based text extractions, granite-vision-3.3-2b-embedding can help simplify and accelerate RAG pipelines.
13
-
14
- **Evaluations:**
15
- We evaluated granite-vision-3.3-2b-embedding alongside other top colBERT style multi-modal embedding models in the 1B-4B parameter range using two benchmark: [Vidore2](https://github.com/illuin-tech/vidore-benchmark/) and [Real-MM-RAG-Bench](https://huggingface.co/collections/ibm-research/real-mm-rag-bench-67d2dc0ddf2dfafe66f09d34) which aim to specifically address complex multimodal document retrieval tasks.
16
-
17
- ## **NDCG@5 - ViDoRe V2**
18
- | Collection \ Model | ColPali-v1.3 | ColQwen2.5-v0.2 | ColNomic-3b | ColSmolvlm-v0.1 | granite-vision-3.3-2b-embedding |
19
- |----------------------------------------|--------------|------------------|-------------|-------------------|-----------
20
- | ESG Restaurant Human | 51.1 | 68.4 | 65.8 | 62.4 | 65.3 |
21
- | Economics Macro Multilingual | 49.9 | 56.5 | 55.4 | 47.4 | 51.2 |
22
- | MIT Biomedical | 59.7 | 63.6 | 63.5 | 58.1 |61.5 |
23
- | ESG Restaurant Synthetic | 57.0 | 57.4 | 56.6 | 51.1 |56.6 |
24
- | ESG Restaurant Synthetic Multilingual | 55.7 | 57.4 | 57.2 | 47.6 |55.7 |
25
- | MIT Biomedical Multilingual | 56.5 | 61.1 | 62.5 | 50.5 | 55.5 |
26
- | Economics Macro | 51.6 | 59.8 | 60.2 | 60.9 |58.3 |
27
- | **Avg (ViDoRe2)** | **54.5** | **60.6** | **60.2** | **54.0** |**57.7** |
28
-
29
- ## **NDCG@5 - REAL-MM-RAG**
30
- | Collection \ Model | ColPali-v1.3 | ColQwen2.5-v0.2 | ColNomic-3b | ColSmolvlm-v0.1 | granite-vision-3.3-2b-embedding |
31
- |----------------------------------------|--------------|------------------|-------------|--------------------------| ------------------
32
- | FinReport | 55 | 66 | 78 | 65 |73
33
- | FinSlides | 68 | 79 | 81 | 55 |79
34
- | TechReport | 78 | 86 | 88 | 83 |87
35
- | TechSlides | 90 | 93 | 92 | 91 |93
36
- | **Avg (REAL-MM-RAG)** | **73** | **81** | **85** | **74** |**83**
37
-
38
- - **Release Date**: June 11th 2025
39
- - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
40
- - **Supported Input Format:** Currently the model supports English instructions and images (png, jpeg) as input format.
41
-
42
- **Intended Use:**
43
- The model is intended to be used in enterprise applications that involve retrieval of visual and text data. In particular, the model is well-suited for multi-modal RAG systems where the knowledge base is composed of complex enterprise documents, such as reports, slides, images, canned doscuments, manuals and more. The model can be used as a standalone retriever, or alongside a text-based retriever.
44
-
45
- ### Usage
46
- ```shell
47
- pip install -q torch torchvision torchaudio
48
- pip install transformers==4.50
49
- ```
50
- Then run the code:
51
- ```python
52
- from io import BytesIO
53
-
54
- import requests
55
- import torch
56
- from PIL import Image
57
- from transformers import AutoProcessor, AutoModel
58
- from transformers.utils.import_utils import is_flash_attn_2_available
59
-
60
- device = "cuda" if torch.cuda.is_available() else "cpu"
61
- model_name = "ibm-granite/granite-vision-3.3-2b-embedding"
62
- model = AutoModel.from_pretrained(
63
- model_name,
64
- trust_remote_code=True,
65
- torch_dtype=torch.float16,
66
- device_map=device,
67
- attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None
68
- ).eval()
69
- processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
70
-
71
- # ─────────────────────────────────────────���───
72
- # Inputs: Image + Text
73
- # ─────────────────────────────────────────────
74
- image_url = "https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg"
75
- print("\nFetching image...")
76
- image = Image.open(BytesIO(requests.get(image_url).content)).convert("RGB")
77
-
78
- text = "A photo of a tiger"
79
- print(f"Image and text inputs ready.")
80
-
81
- # Process both inputs
82
- print("Processing inputs...")
83
- image_inputs = processor.process_images([image])
84
- text_inputs = processor.process_queries([text])
85
-
86
- # Move to correct device
87
- image_inputs = {k: v.to(device) for k, v in image_inputs.items()}
88
- text_inputs = {k: v.to(device) for k, v in text_inputs.items()}
89
-
90
- # ─────────────────────────────────────────────
91
- # Run Inference
92
- # ─────────────────────────────────────────────
93
- with torch.no_grad():
94
- print("🔍 Getting image embedding...")
95
- img_emb = model(**image_inputs)
96
-
97
- print("✍️ Getting text embedding...")
98
- txt_emb = model(**text_inputs)
99
-
100
- # ─────────────────────────────────────────────
101
- # Score the similarity
102
- # ─────────────────────────────────────────────
103
- print("Scoring similarity...")
104
- similarity = processor.score(txt_emb, img_emb, batch_size=1, device=device)
105
-
106
- print("\n" + "=" * 50)
107
- print(f"📊 Similarity between image and text: {similarity.item():.4f}")
108
- print("=" * 50)
109
- ```
110
- ### Use granite-vision-embedding-3.3-2b for MM RAG
111
- For an example of MM-RAG using granite-vision-3.3-2b-embedding refer to [this notebook](https://github.com/ibm-granite/granite-vision-models/blob/main/cookbooks/GraniteVisionEmbedding_MM-RAG_Notebook.ipynb).
112
-
113
- **Model Architecture:**
114
- The architecture of granite-vision-3.3-2b-embedding follows ColPali(https://arxiv.org/abs/2407.01449) approach and consists of the following components:
115
-
116
- (1) Vision-Language model : granite-vision-3.3-2b (https://huggingface.co/ibm-granite/granite-vision-3.3-2b).
117
-
118
- (2) Projection layer: linear layer that projects the hidden layer dimension of Vision-Language model to 128 and outputs 729 embedding vectors per image.
119
-
120
- The scoring is computed using MaxSim-based late interaction mechanism.
121
-
122
- **Training Data:**
123
- Our training data is entirly comprised from DocFM. DocFM is a large-scale comprehensive dataset effort at IBM consisting of 85 million document pages extracted from unique PDF
124
- documents sourced from Common Crawl, Wikipedia, and ESG (Environmental, Social, and Governance)
125
- reports.
126
-
127
- **Infrastructure:**
128
- We train granite-vision-3.3-2b-embedding on IBM’s cognitive computing cluster, which is outfitted with NVIDIA A100 GPUs.
129
-
130
- **Ethical Considerations and Limitations:**
131
- The use of Large Vision and Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-vision-3.3-2b-embedding is not the exception in this regard. Although our alignment processes include safety considerations, the model may in some cases produce inaccurate or biased responses.
132
- Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use granite-vision-3.3-2b-embedding with ethical intentions and in a responsible way.
133
-
134
- **Resources**
135
- - 📄 Granite Vision technical report [here](https://arxiv.org/abs/2502.09927)
136
- - 📄 Real-MM-RAG-Bench paper (ACL 2025) [here](https://arxiv.org/abs/2502.12342)
137
- - 📄 Vidore 2 paper [here](https://www.arxiv.org/pdf/2505.17166)
138
- - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
139
- - 🚀 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
140
- - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
REPORT_Benchmarking the AI advantage in finance.pdf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e6da951c55eef3fd52aa41543f3b4377ab26e2758c579aec2d11068a66b3d20
3
- size 1746880
 
 
 
 
added_tokens.json DELETED
@@ -1,6 +0,0 @@
1
- {
2
- "<image>": 49155,
3
- "<|end_of_role|>": 49153,
4
- "<|start_of_role|>": 49152,
5
- "<|tool_call|>": 49154
6
- }
 
 
 
 
 
 
 
chat_template.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "chat_template": "{%- if tools %}\n {{- '<|start_of_role|>available_tools<|end_of_role|>\n' }}\n {%- for tool in tools %}\n {{- tool | tojson(indent=4) }}\n {%- if not loop.last %}\n {{- '\n\n' }}\n {%- endif %}\n {%- endfor %}\n {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- for message in messages if message['role'] == 'system'%}{% else %}<|system|>\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n{% endfor %}{%- for message in messages %}\n {%- if message['role'] == 'system' %}\n {{- '<|system|>\n' + message['content'][0]['text'] + '\n' }}\n {%- elif message['role'] == 'user' %}<|user|>\n {# Render all images first #}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<image>\n' }}{% endfor %}{# Render all text next #}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] + '\n' }}{% endfor %}\n{%- elif message['role'] == 'assistant' %}\n {{- '<|assistant|>\n' + message['content'][0]['text'] + '<|end_of_text|>' }}\n {%- elif message['role'] == 'assistant_tool_call' %}\n {{- '<|start_of_role|>assistant<|end_of_role|><|tool_call|>' + message['content'][0]['text'] + '<|end_of_text|>\n' }}\n {%- elif message['role'] == 'tool_response' %}\n {{- '<|start_of_role|>tool_response<|end_of_role|>' + message['content'][0]['text'] + '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if loop.last and add_generation_prompt %}\n {{- '<|assistant|>\n' }}\n {%- endif %}\n{%- endfor %}"
3
- }
 
 
 
 
config.json DELETED
@@ -1,179 +0,0 @@
1
- {
2
- "_name_or_path": "ibm_granite/granite-vision-3.3-2b",
3
- "adapter_path": null,
4
- "auto_map": {
5
- "AutoModel": "modeling_granite_vision_embedding.GraniteVisionEmb",
6
- "AutoProcessor": "processing_granite_vision_embedding.GraniteVisionEmbProcessor",
7
- "AutoConfig": "granite_vision_embedding_config.GraniteVisionEmbConfig"
8
- },
9
- "architectures": [
10
- "GraniteVisionEmb"
11
- ],
12
- "base_image_feature_location": "last",
13
- "base_model": null,
14
- "emb_dim_doc": 128,
15
- "emb_dim_query": 128,
16
- "image_grid_pinpoints": [
17
- [
18
- 384,
19
- 768
20
- ],
21
- [
22
- 384,
23
- 1152
24
- ],
25
- [
26
- 384,
27
- 1536
28
- ],
29
- [
30
- 384,
31
- 1920
32
- ],
33
- [
34
- 384,
35
- 2304
36
- ],
37
- [
38
- 384,
39
- 2688
40
- ],
41
- [
42
- 384,
43
- 3072
44
- ],
45
- [
46
- 384,
47
- 3456
48
- ],
49
- [
50
- 384,
51
- 3840
52
- ],
53
- [
54
- 768,
55
- 384
56
- ],
57
- [
58
- 768,
59
- 768
60
- ],
61
- [
62
- 768,
63
- 1152
64
- ],
65
- [
66
- 768,
67
- 1536
68
- ],
69
- [
70
- 768,
71
- 1920
72
- ],
73
- [
74
- 1152,
75
- 384
76
- ],
77
- [
78
- 1152,
79
- 768
80
- ],
81
- [
82
- 1152,
83
- 1152
84
- ],
85
- [
86
- 1536,
87
- 384
88
- ],
89
- [
90
- 1536,
91
- 768
92
- ],
93
- [
94
- 1920,
95
- 384
96
- ],
97
- [
98
- 1920,
99
- 768
100
- ],
101
- [
102
- 2304,
103
- 384
104
- ],
105
- [
106
- 2688,
107
- 384
108
- ],
109
- [
110
- 3072,
111
- 384
112
- ],
113
- [
114
- 3456,
115
- 384
116
- ],
117
- [
118
- 3840,
119
- 384
120
- ]
121
- ],
122
- "image_seq_length": 576,
123
- "image_token_index": 49155,
124
- "model_type": "granitevisionemb",
125
- "multimodal_projector_bias": true,
126
- "pretrained_language_model": "",
127
- "pretrained_vision_tower": "",
128
- "projector_hidden_act": "gelu",
129
- "text_config": {
130
- "_attn_implementation_autoset": true,
131
- "_name_or_path": "ibm-granite/granite-3.1-2b-instruct",
132
- "architectures": [
133
- "GraniteForCausalLM"
134
- ],
135
- "attention_dropout": 0.1,
136
- "attention_multiplier": 0.015625,
137
- "bos_token_id": 0,
138
- "embedding_multiplier": 12.0,
139
- "eos_token_id": 0,
140
- "hidden_size": 2048,
141
- "intermediate_size": 8192,
142
- "logits_scaling": 8.0,
143
- "max_position_embeddings": 131072,
144
- "model_type": "granite",
145
- "num_hidden_layers": 40,
146
- "num_key_value_heads": 8,
147
- "pad_token_id": 0,
148
- "residual_multiplier": 0.22,
149
- "rms_norm_eps": 1e-05,
150
- "rope_theta": 300000,
151
- "tie_word_embeddings": true,
152
- "torch_dtype": "bfloat16",
153
- "vocab_size": 49156
154
- },
155
- "tie_word_embeddings": true,
156
- "torch_dtype": "float32",
157
- "transformers_version": "4.49.0",
158
- "use_image_newline_parameter": true,
159
- "vision_config": {
160
- "_attn_implementation_autoset": true,
161
- "hidden_act": "gelu_pytorch_tanh",
162
- "hidden_size": 1152,
163
- "image_size": 384,
164
- "intermediate_size": 4304,
165
- "layer_norm_eps": 1e-06,
166
- "model_type": "siglip_vision_model",
167
- "num_attention_heads": 16,
168
- "num_hidden_layers": 27,
169
- "patch_size": 14,
170
- "torch_dtype": "bfloat16"
171
- },
172
- "vision_feature_layer": [
173
- -24,
174
- -20,
175
- -12,
176
- -1
177
- ],
178
- "vision_feature_select_strategy": "full"
179
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
granite_vision_embedding_config.py DELETED
@@ -1,15 +0,0 @@
1
- from transformers import LlavaNextConfig
2
-
3
-
4
- class GraniteVisionEmbConfig(LlavaNextConfig):
5
- model_type = "granitevisionemb"
6
-
7
- def __init__(self, **kwargs):
8
- self.base_model = kwargs.get("base_model", None)
9
- self.emb_dim_query = kwargs.get("emb_dim_query", 128)
10
- self.emb_dim_doc = kwargs.get("emb_dim_doc", 128)
11
- self.base_image_feature_location = kwargs.get("base_image_feature_location", "last")
12
- self.adapter_path = kwargs.get("adapter_path", None)
13
- super().__init__(**kwargs)
14
-
15
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00003.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e838b6d98f48fbf45ae6c0d9c74cba649fd06b27ed78ced3971efbab7e16a69
3
- size 4955415688
 
 
 
 
model-00002-of-00003.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d6bf1675fc15977b4d8f37ea1d4960ca2750e6793a80da9771e4693ae8cb13d6
3
- size 4999979448
 
 
 
 
model-00003-of-00003.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:15978cba0606360676faad5c3cf486a58e6d78a1352dbfcd1db51a7410a574d5
3
- size 1947355456
 
 
 
 
model.safetensors.index.json DELETED
@@ -1,824 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_size": 11902636800
4
- },
5
- "weight_map": {
6
- "custom_text_proj.bias": "model-00003-of-00003.safetensors",
7
- "custom_text_proj.weight": "model-00003-of-00003.safetensors",
8
- "model.image_newline": "model-00001-of-00003.safetensors",
9
- "model.language_model.model.embed_tokens.weight": "model-00001-of-00003.safetensors",
10
- "model.language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
11
- "model.language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
12
- "model.language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
13
- "model.language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
14
- "model.language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
15
- "model.language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
16
- "model.language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
17
- "model.language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
18
- "model.language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
19
- "model.language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
20
- "model.language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
21
- "model.language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
22
- "model.language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
23
- "model.language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
24
- "model.language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
25
- "model.language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
26
- "model.language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
27
- "model.language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
28
- "model.language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
29
- "model.language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
30
- "model.language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
31
- "model.language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
32
- "model.language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
33
- "model.language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
34
- "model.language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
35
- "model.language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
36
- "model.language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
37
- "model.language_model.model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
38
- "model.language_model.model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
39
- "model.language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
40
- "model.language_model.model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
41
- "model.language_model.model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
42
- "model.language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
43
- "model.language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
44
- "model.language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
45
- "model.language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
46
- "model.language_model.model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
47
- "model.language_model.model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
48
- "model.language_model.model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
49
- "model.language_model.model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
50
- "model.language_model.model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
51
- "model.language_model.model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
52
- "model.language_model.model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
53
- "model.language_model.model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
54
- "model.language_model.model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
55
- "model.language_model.model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
56
- "model.language_model.model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
57
- "model.language_model.model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
58
- "model.language_model.model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
59
- "model.language_model.model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
60
- "model.language_model.model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
61
- "model.language_model.model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
62
- "model.language_model.model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
63
- "model.language_model.model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
64
- "model.language_model.model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
65
- "model.language_model.model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
66
- "model.language_model.model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
67
- "model.language_model.model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
68
- "model.language_model.model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
69
- "model.language_model.model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
70
- "model.language_model.model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
71
- "model.language_model.model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
72
- "model.language_model.model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
73
- "model.language_model.model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
74
- "model.language_model.model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
75
- "model.language_model.model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
76
- "model.language_model.model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
77
- "model.language_model.model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
78
- "model.language_model.model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
79
- "model.language_model.model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
80
- "model.language_model.model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
81
- "model.language_model.model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
82
- "model.language_model.model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
83
- "model.language_model.model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
84
- "model.language_model.model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
85
- "model.language_model.model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
86
- "model.language_model.model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
87
- "model.language_model.model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
88
- "model.language_model.model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
89
- "model.language_model.model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
90
- "model.language_model.model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
91
- "model.language_model.model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
92
- "model.language_model.model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
93
- "model.language_model.model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
94
- "model.language_model.model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
95
- "model.language_model.model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
96
- "model.language_model.model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
97
- "model.language_model.model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
98
- "model.language_model.model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
99
- "model.language_model.model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
100
- "model.language_model.model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
101
- "model.language_model.model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
102
- "model.language_model.model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
103
- "model.language_model.model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
104
- "model.language_model.model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
105
- "model.language_model.model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
106
- "model.language_model.model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
107
- "model.language_model.model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
108
- "model.language_model.model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
109
- "model.language_model.model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
110
- "model.language_model.model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
111
- "model.language_model.model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
112
- "model.language_model.model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
113
- "model.language_model.model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
114
- "model.language_model.model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
115
- "model.language_model.model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
116
- "model.language_model.model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
117
- "model.language_model.model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
118
- "model.language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
119
- "model.language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
120
- "model.language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
121
- "model.language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
122
- "model.language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
123
- "model.language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
124
- "model.language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
125
- "model.language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
126
- "model.language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
127
- "model.language_model.model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
128
- "model.language_model.model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
129
- "model.language_model.model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
130
- "model.language_model.model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
131
- "model.language_model.model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
132
- "model.language_model.model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
133
- "model.language_model.model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
134
- "model.language_model.model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
135
- "model.language_model.model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
136
- "model.language_model.model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
137
- "model.language_model.model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
138
- "model.language_model.model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
139
- "model.language_model.model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
140
- "model.language_model.model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
141
- "model.language_model.model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
142
- "model.language_model.model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
143
- "model.language_model.model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
144
- "model.language_model.model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
145
- "model.language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
146
- "model.language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
147
- "model.language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
148
- "model.language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
149
- "model.language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
150
- "model.language_model.model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
151
- "model.language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
152
- "model.language_model.model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
153
- "model.language_model.model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
154
- "model.language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
155
- "model.language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
156
- "model.language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
157
- "model.language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
158
- "model.language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
159
- "model.language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
160
- "model.language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
161
- "model.language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
162
- "model.language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
163
- "model.language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00003.safetensors",
164
- "model.language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
165
- "model.language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
166
- "model.language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
167
- "model.language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
168
- "model.language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
169
- "model.language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
170
- "model.language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
171
- "model.language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
172
- "model.language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00003.safetensors",
173
- "model.language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
174
- "model.language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
175
- "model.language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
176
- "model.language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
177
- "model.language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
178
- "model.language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
179
- "model.language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
180
- "model.language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
181
- "model.language_model.model.layers.26.input_layernorm.weight": "model-00002-of-00003.safetensors",
182
- "model.language_model.model.layers.26.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
183
- "model.language_model.model.layers.26.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
184
- "model.language_model.model.layers.26.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
185
- "model.language_model.model.layers.26.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
186
- "model.language_model.model.layers.26.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
187
- "model.language_model.model.layers.26.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
188
- "model.language_model.model.layers.26.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
189
- "model.language_model.model.layers.26.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
190
- "model.language_model.model.layers.27.input_layernorm.weight": "model-00002-of-00003.safetensors",
191
- "model.language_model.model.layers.27.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
192
- "model.language_model.model.layers.27.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
193
- "model.language_model.model.layers.27.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
194
- "model.language_model.model.layers.27.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
195
- "model.language_model.model.layers.27.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
196
- "model.language_model.model.layers.27.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
197
- "model.language_model.model.layers.27.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
198
- "model.language_model.model.layers.27.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
199
- "model.language_model.model.layers.28.input_layernorm.weight": "model-00002-of-00003.safetensors",
200
- "model.language_model.model.layers.28.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
201
- "model.language_model.model.layers.28.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
202
- "model.language_model.model.layers.28.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
203
- "model.language_model.model.layers.28.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
204
- "model.language_model.model.layers.28.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
205
- "model.language_model.model.layers.28.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
206
- "model.language_model.model.layers.28.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
207
- "model.language_model.model.layers.28.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
208
- "model.language_model.model.layers.29.input_layernorm.weight": "model-00002-of-00003.safetensors",
209
- "model.language_model.model.layers.29.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
210
- "model.language_model.model.layers.29.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
211
- "model.language_model.model.layers.29.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
212
- "model.language_model.model.layers.29.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
213
- "model.language_model.model.layers.29.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
214
- "model.language_model.model.layers.29.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
215
- "model.language_model.model.layers.29.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
216
- "model.language_model.model.layers.29.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
217
- "model.language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
218
- "model.language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
219
- "model.language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
220
- "model.language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
221
- "model.language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
222
- "model.language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
223
- "model.language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
224
- "model.language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
225
- "model.language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
226
- "model.language_model.model.layers.30.input_layernorm.weight": "model-00002-of-00003.safetensors",
227
- "model.language_model.model.layers.30.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
228
- "model.language_model.model.layers.30.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
229
- "model.language_model.model.layers.30.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
230
- "model.language_model.model.layers.30.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
231
- "model.language_model.model.layers.30.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
232
- "model.language_model.model.layers.30.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
233
- "model.language_model.model.layers.30.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
234
- "model.language_model.model.layers.30.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
235
- "model.language_model.model.layers.31.input_layernorm.weight": "model-00002-of-00003.safetensors",
236
- "model.language_model.model.layers.31.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
237
- "model.language_model.model.layers.31.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
238
- "model.language_model.model.layers.31.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
239
- "model.language_model.model.layers.31.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
240
- "model.language_model.model.layers.31.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
241
- "model.language_model.model.layers.31.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
242
- "model.language_model.model.layers.31.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
243
- "model.language_model.model.layers.31.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
244
- "model.language_model.model.layers.32.input_layernorm.weight": "model-00003-of-00003.safetensors",
245
- "model.language_model.model.layers.32.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
246
- "model.language_model.model.layers.32.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
247
- "model.language_model.model.layers.32.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
248
- "model.language_model.model.layers.32.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
249
- "model.language_model.model.layers.32.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
250
- "model.language_model.model.layers.32.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
251
- "model.language_model.model.layers.32.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
252
- "model.language_model.model.layers.32.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
253
- "model.language_model.model.layers.33.input_layernorm.weight": "model-00003-of-00003.safetensors",
254
- "model.language_model.model.layers.33.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
255
- "model.language_model.model.layers.33.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
256
- "model.language_model.model.layers.33.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
257
- "model.language_model.model.layers.33.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
258
- "model.language_model.model.layers.33.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
259
- "model.language_model.model.layers.33.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
260
- "model.language_model.model.layers.33.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
261
- "model.language_model.model.layers.33.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
262
- "model.language_model.model.layers.34.input_layernorm.weight": "model-00003-of-00003.safetensors",
263
- "model.language_model.model.layers.34.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
264
- "model.language_model.model.layers.34.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
265
- "model.language_model.model.layers.34.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
266
- "model.language_model.model.layers.34.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
267
- "model.language_model.model.layers.34.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
268
- "model.language_model.model.layers.34.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
269
- "model.language_model.model.layers.34.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
270
- "model.language_model.model.layers.34.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
271
- "model.language_model.model.layers.35.input_layernorm.weight": "model-00003-of-00003.safetensors",
272
- "model.language_model.model.layers.35.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
273
- "model.language_model.model.layers.35.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
274
- "model.language_model.model.layers.35.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
275
- "model.language_model.model.layers.35.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
276
- "model.language_model.model.layers.35.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
277
- "model.language_model.model.layers.35.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
278
- "model.language_model.model.layers.35.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
279
- "model.language_model.model.layers.35.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
280
- "model.language_model.model.layers.36.input_layernorm.weight": "model-00003-of-00003.safetensors",
281
- "model.language_model.model.layers.36.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
282
- "model.language_model.model.layers.36.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
283
- "model.language_model.model.layers.36.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
284
- "model.language_model.model.layers.36.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
285
- "model.language_model.model.layers.36.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
286
- "model.language_model.model.layers.36.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
287
- "model.language_model.model.layers.36.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
288
- "model.language_model.model.layers.36.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
289
- "model.language_model.model.layers.37.input_layernorm.weight": "model-00003-of-00003.safetensors",
290
- "model.language_model.model.layers.37.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
291
- "model.language_model.model.layers.37.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
292
- "model.language_model.model.layers.37.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
293
- "model.language_model.model.layers.37.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
294
- "model.language_model.model.layers.37.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
295
- "model.language_model.model.layers.37.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
296
- "model.language_model.model.layers.37.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
297
- "model.language_model.model.layers.37.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
298
- "model.language_model.model.layers.38.input_layernorm.weight": "model-00003-of-00003.safetensors",
299
- "model.language_model.model.layers.38.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
300
- "model.language_model.model.layers.38.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
301
- "model.language_model.model.layers.38.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
302
- "model.language_model.model.layers.38.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
303
- "model.language_model.model.layers.38.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
304
- "model.language_model.model.layers.38.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
305
- "model.language_model.model.layers.38.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
306
- "model.language_model.model.layers.38.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
307
- "model.language_model.model.layers.39.input_layernorm.weight": "model-00003-of-00003.safetensors",
308
- "model.language_model.model.layers.39.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
309
- "model.language_model.model.layers.39.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
310
- "model.language_model.model.layers.39.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
311
- "model.language_model.model.layers.39.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
312
- "model.language_model.model.layers.39.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
313
- "model.language_model.model.layers.39.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
314
- "model.language_model.model.layers.39.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
315
- "model.language_model.model.layers.39.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
316
- "model.language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
317
- "model.language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
318
- "model.language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
319
- "model.language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
320
- "model.language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
321
- "model.language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
322
- "model.language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
323
- "model.language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
324
- "model.language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
325
- "model.language_model.model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
326
- "model.language_model.model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
327
- "model.language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
328
- "model.language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
329
- "model.language_model.model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
330
- "model.language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
331
- "model.language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
332
- "model.language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
333
- "model.language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
334
- "model.language_model.model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
335
- "model.language_model.model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
336
- "model.language_model.model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
337
- "model.language_model.model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
338
- "model.language_model.model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
339
- "model.language_model.model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
340
- "model.language_model.model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
341
- "model.language_model.model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
342
- "model.language_model.model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
343
- "model.language_model.model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
344
- "model.language_model.model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
345
- "model.language_model.model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
346
- "model.language_model.model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
347
- "model.language_model.model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
348
- "model.language_model.model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
349
- "model.language_model.model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
350
- "model.language_model.model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
351
- "model.language_model.model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
352
- "model.language_model.model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
353
- "model.language_model.model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
354
- "model.language_model.model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
355
- "model.language_model.model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
356
- "model.language_model.model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
357
- "model.language_model.model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
358
- "model.language_model.model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
359
- "model.language_model.model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
360
- "model.language_model.model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
361
- "model.language_model.model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
362
- "model.language_model.model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
363
- "model.language_model.model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
364
- "model.language_model.model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
365
- "model.language_model.model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
366
- "model.language_model.model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
367
- "model.language_model.model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
368
- "model.language_model.model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
369
- "model.language_model.model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
370
- "model.language_model.model.norm.weight": "model-00003-of-00003.safetensors",
371
- "model.multi_modal_projector.linear_1.bias": "model-00001-of-00003.safetensors",
372
- "model.multi_modal_projector.linear_1.weight": "model-00001-of-00003.safetensors",
373
- "model.multi_modal_projector.linear_2.bias": "model-00001-of-00003.safetensors",
374
- "model.multi_modal_projector.linear_2.weight": "model-00001-of-00003.safetensors",
375
- "model.vision_tower.vision_model.embeddings.patch_embedding.bias": "model-00001-of-00003.safetensors",
376
- "model.vision_tower.vision_model.embeddings.patch_embedding.weight": "model-00001-of-00003.safetensors",
377
- "model.vision_tower.vision_model.embeddings.position_embedding.weight": "model-00001-of-00003.safetensors",
378
- "model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias": "model-00001-of-00003.safetensors",
379
- "model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight": "model-00001-of-00003.safetensors",
380
- "model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias": "model-00001-of-00003.safetensors",
381
- "model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight": "model-00001-of-00003.safetensors",
382
- "model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias": "model-00001-of-00003.safetensors",
383
- "model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight": "model-00001-of-00003.safetensors",
384
- "model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias": "model-00001-of-00003.safetensors",
385
- "model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight": "model-00001-of-00003.safetensors",
386
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
387
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
388
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
389
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
390
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
391
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
392
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
393
- "model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
394
- "model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias": "model-00001-of-00003.safetensors",
395
- "model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight": "model-00001-of-00003.safetensors",
396
- "model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias": "model-00001-of-00003.safetensors",
397
- "model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight": "model-00001-of-00003.safetensors",
398
- "model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias": "model-00001-of-00003.safetensors",
399
- "model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight": "model-00001-of-00003.safetensors",
400
- "model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias": "model-00001-of-00003.safetensors",
401
- "model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight": "model-00001-of-00003.safetensors",
402
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
403
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
404
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
405
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
406
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
407
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
408
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
409
- "model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
410
- "model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias": "model-00001-of-00003.safetensors",
411
- "model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight": "model-00001-of-00003.safetensors",
412
- "model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias": "model-00001-of-00003.safetensors",
413
- "model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight": "model-00001-of-00003.safetensors",
414
- "model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias": "model-00001-of-00003.safetensors",
415
- "model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight": "model-00001-of-00003.safetensors",
416
- "model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias": "model-00001-of-00003.safetensors",
417
- "model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight": "model-00001-of-00003.safetensors",
418
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
419
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
420
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
421
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
422
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
423
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
424
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
425
- "model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
426
- "model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias": "model-00001-of-00003.safetensors",
427
- "model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight": "model-00001-of-00003.safetensors",
428
- "model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias": "model-00001-of-00003.safetensors",
429
- "model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight": "model-00001-of-00003.safetensors",
430
- "model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias": "model-00001-of-00003.safetensors",
431
- "model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight": "model-00001-of-00003.safetensors",
432
- "model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias": "model-00001-of-00003.safetensors",
433
- "model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight": "model-00001-of-00003.safetensors",
434
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
435
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
436
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
437
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
438
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
439
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
440
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
441
- "model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
442
- "model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias": "model-00001-of-00003.safetensors",
443
- "model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight": "model-00001-of-00003.safetensors",
444
- "model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias": "model-00001-of-00003.safetensors",
445
- "model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight": "model-00001-of-00003.safetensors",
446
- "model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias": "model-00001-of-00003.safetensors",
447
- "model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight": "model-00001-of-00003.safetensors",
448
- "model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias": "model-00001-of-00003.safetensors",
449
- "model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight": "model-00001-of-00003.safetensors",
450
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
451
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
452
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
453
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
454
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
455
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
456
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
457
- "model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
458
- "model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias": "model-00001-of-00003.safetensors",
459
- "model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight": "model-00001-of-00003.safetensors",
460
- "model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias": "model-00001-of-00003.safetensors",
461
- "model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight": "model-00001-of-00003.safetensors",
462
- "model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias": "model-00001-of-00003.safetensors",
463
- "model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight": "model-00001-of-00003.safetensors",
464
- "model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias": "model-00001-of-00003.safetensors",
465
- "model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight": "model-00001-of-00003.safetensors",
466
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
467
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
468
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
469
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
470
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
471
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
472
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
473
- "model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
474
- "model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias": "model-00001-of-00003.safetensors",
475
- "model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight": "model-00001-of-00003.safetensors",
476
- "model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias": "model-00001-of-00003.safetensors",
477
- "model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight": "model-00001-of-00003.safetensors",
478
- "model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias": "model-00001-of-00003.safetensors",
479
- "model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight": "model-00001-of-00003.safetensors",
480
- "model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias": "model-00001-of-00003.safetensors",
481
- "model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight": "model-00001-of-00003.safetensors",
482
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
483
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
484
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
485
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
486
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
487
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
488
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
489
- "model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
490
- "model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias": "model-00001-of-00003.safetensors",
491
- "model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight": "model-00001-of-00003.safetensors",
492
- "model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias": "model-00001-of-00003.safetensors",
493
- "model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight": "model-00001-of-00003.safetensors",
494
- "model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias": "model-00001-of-00003.safetensors",
495
- "model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight": "model-00001-of-00003.safetensors",
496
- "model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias": "model-00001-of-00003.safetensors",
497
- "model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight": "model-00001-of-00003.safetensors",
498
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
499
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
500
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
501
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
502
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
503
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
504
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
505
- "model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
506
- "model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias": "model-00001-of-00003.safetensors",
507
- "model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight": "model-00001-of-00003.safetensors",
508
- "model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias": "model-00001-of-00003.safetensors",
509
- "model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight": "model-00001-of-00003.safetensors",
510
- "model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias": "model-00001-of-00003.safetensors",
511
- "model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight": "model-00001-of-00003.safetensors",
512
- "model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias": "model-00001-of-00003.safetensors",
513
- "model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight": "model-00001-of-00003.safetensors",
514
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
515
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
516
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
517
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
518
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
519
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
520
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
521
- "model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
522
- "model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias": "model-00001-of-00003.safetensors",
523
- "model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight": "model-00001-of-00003.safetensors",
524
- "model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias": "model-00001-of-00003.safetensors",
525
- "model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight": "model-00001-of-00003.safetensors",
526
- "model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias": "model-00001-of-00003.safetensors",
527
- "model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight": "model-00001-of-00003.safetensors",
528
- "model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias": "model-00001-of-00003.safetensors",
529
- "model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight": "model-00001-of-00003.safetensors",
530
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
531
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
532
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
533
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
534
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
535
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
536
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
537
- "model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
538
- "model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias": "model-00001-of-00003.safetensors",
539
- "model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight": "model-00001-of-00003.safetensors",
540
- "model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias": "model-00001-of-00003.safetensors",
541
- "model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight": "model-00001-of-00003.safetensors",
542
- "model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias": "model-00001-of-00003.safetensors",
543
- "model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight": "model-00001-of-00003.safetensors",
544
- "model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias": "model-00001-of-00003.safetensors",
545
- "model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight": "model-00001-of-00003.safetensors",
546
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
547
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
548
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
549
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
550
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
551
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
552
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
553
- "model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
554
- "model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias": "model-00001-of-00003.safetensors",
555
- "model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight": "model-00001-of-00003.safetensors",
556
- "model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias": "model-00001-of-00003.safetensors",
557
- "model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight": "model-00001-of-00003.safetensors",
558
- "model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias": "model-00001-of-00003.safetensors",
559
- "model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight": "model-00001-of-00003.safetensors",
560
- "model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias": "model-00001-of-00003.safetensors",
561
- "model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight": "model-00001-of-00003.safetensors",
562
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
563
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
564
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
565
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
566
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
567
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
568
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
569
- "model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
570
- "model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias": "model-00001-of-00003.safetensors",
571
- "model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight": "model-00001-of-00003.safetensors",
572
- "model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias": "model-00001-of-00003.safetensors",
573
- "model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight": "model-00001-of-00003.safetensors",
574
- "model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias": "model-00001-of-00003.safetensors",
575
- "model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight": "model-00001-of-00003.safetensors",
576
- "model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias": "model-00001-of-00003.safetensors",
577
- "model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight": "model-00001-of-00003.safetensors",
578
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
579
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
580
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
581
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
582
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
583
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
584
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
585
- "model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
586
- "model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias": "model-00001-of-00003.safetensors",
587
- "model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight": "model-00001-of-00003.safetensors",
588
- "model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias": "model-00001-of-00003.safetensors",
589
- "model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight": "model-00001-of-00003.safetensors",
590
- "model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias": "model-00001-of-00003.safetensors",
591
- "model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight": "model-00001-of-00003.safetensors",
592
- "model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias": "model-00001-of-00003.safetensors",
593
- "model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight": "model-00001-of-00003.safetensors",
594
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
595
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
596
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
597
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
598
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
599
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
600
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
601
- "model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
602
- "model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias": "model-00001-of-00003.safetensors",
603
- "model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight": "model-00001-of-00003.safetensors",
604
- "model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias": "model-00001-of-00003.safetensors",
605
- "model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight": "model-00001-of-00003.safetensors",
606
- "model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias": "model-00001-of-00003.safetensors",
607
- "model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight": "model-00001-of-00003.safetensors",
608
- "model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias": "model-00001-of-00003.safetensors",
609
- "model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight": "model-00001-of-00003.safetensors",
610
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
611
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
612
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
613
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
614
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
615
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
616
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
617
- "model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
618
- "model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias": "model-00001-of-00003.safetensors",
619
- "model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight": "model-00001-of-00003.safetensors",
620
- "model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias": "model-00001-of-00003.safetensors",
621
- "model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight": "model-00001-of-00003.safetensors",
622
- "model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias": "model-00001-of-00003.safetensors",
623
- "model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight": "model-00001-of-00003.safetensors",
624
- "model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias": "model-00001-of-00003.safetensors",
625
- "model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight": "model-00001-of-00003.safetensors",
626
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
627
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
628
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
629
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
630
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
631
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
632
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
633
- "model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
634
- "model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias": "model-00001-of-00003.safetensors",
635
- "model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight": "model-00001-of-00003.safetensors",
636
- "model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias": "model-00001-of-00003.safetensors",
637
- "model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight": "model-00001-of-00003.safetensors",
638
- "model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias": "model-00001-of-00003.safetensors",
639
- "model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight": "model-00001-of-00003.safetensors",
640
- "model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias": "model-00001-of-00003.safetensors",
641
- "model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight": "model-00001-of-00003.safetensors",
642
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
643
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
644
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
645
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
646
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
647
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
648
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
649
- "model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
650
- "model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias": "model-00001-of-00003.safetensors",
651
- "model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight": "model-00001-of-00003.safetensors",
652
- "model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias": "model-00001-of-00003.safetensors",
653
- "model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight": "model-00001-of-00003.safetensors",
654
- "model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias": "model-00001-of-00003.safetensors",
655
- "model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight": "model-00001-of-00003.safetensors",
656
- "model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias": "model-00001-of-00003.safetensors",
657
- "model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight": "model-00001-of-00003.safetensors",
658
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
659
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
660
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
661
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
662
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
663
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
664
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
665
- "model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
666
- "model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias": "model-00001-of-00003.safetensors",
667
- "model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight": "model-00001-of-00003.safetensors",
668
- "model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias": "model-00001-of-00003.safetensors",
669
- "model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight": "model-00001-of-00003.safetensors",
670
- "model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias": "model-00001-of-00003.safetensors",
671
- "model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight": "model-00001-of-00003.safetensors",
672
- "model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias": "model-00001-of-00003.safetensors",
673
- "model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight": "model-00001-of-00003.safetensors",
674
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
675
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
676
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
677
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
678
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
679
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
680
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
681
- "model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
682
- "model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias": "model-00001-of-00003.safetensors",
683
- "model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight": "model-00001-of-00003.safetensors",
684
- "model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias": "model-00001-of-00003.safetensors",
685
- "model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight": "model-00001-of-00003.safetensors",
686
- "model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias": "model-00001-of-00003.safetensors",
687
- "model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight": "model-00001-of-00003.safetensors",
688
- "model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias": "model-00001-of-00003.safetensors",
689
- "model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight": "model-00001-of-00003.safetensors",
690
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
691
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
692
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
693
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
694
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
695
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
696
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
697
- "model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
698
- "model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias": "model-00001-of-00003.safetensors",
699
- "model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight": "model-00001-of-00003.safetensors",
700
- "model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias": "model-00001-of-00003.safetensors",
701
- "model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight": "model-00001-of-00003.safetensors",
702
- "model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias": "model-00001-of-00003.safetensors",
703
- "model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight": "model-00001-of-00003.safetensors",
704
- "model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias": "model-00001-of-00003.safetensors",
705
- "model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight": "model-00001-of-00003.safetensors",
706
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
707
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
708
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
709
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
710
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
711
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
712
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
713
- "model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
714
- "model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias": "model-00001-of-00003.safetensors",
715
- "model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight": "model-00001-of-00003.safetensors",
716
- "model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias": "model-00001-of-00003.safetensors",
717
- "model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight": "model-00001-of-00003.safetensors",
718
- "model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias": "model-00001-of-00003.safetensors",
719
- "model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight": "model-00001-of-00003.safetensors",
720
- "model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias": "model-00001-of-00003.safetensors",
721
- "model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight": "model-00001-of-00003.safetensors",
722
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
723
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
724
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
725
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
726
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
727
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
728
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
729
- "model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
730
- "model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias": "model-00001-of-00003.safetensors",
731
- "model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight": "model-00001-of-00003.safetensors",
732
- "model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias": "model-00001-of-00003.safetensors",
733
- "model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight": "model-00001-of-00003.safetensors",
734
- "model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias": "model-00001-of-00003.safetensors",
735
- "model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight": "model-00001-of-00003.safetensors",
736
- "model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias": "model-00001-of-00003.safetensors",
737
- "model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight": "model-00001-of-00003.safetensors",
738
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
739
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
740
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
741
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
742
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
743
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
744
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
745
- "model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
746
- "model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias": "model-00001-of-00003.safetensors",
747
- "model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight": "model-00001-of-00003.safetensors",
748
- "model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias": "model-00001-of-00003.safetensors",
749
- "model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight": "model-00001-of-00003.safetensors",
750
- "model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias": "model-00001-of-00003.safetensors",
751
- "model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight": "model-00001-of-00003.safetensors",
752
- "model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias": "model-00001-of-00003.safetensors",
753
- "model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight": "model-00001-of-00003.safetensors",
754
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
755
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
756
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
757
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
758
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
759
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
760
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
761
- "model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
762
- "model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias": "model-00001-of-00003.safetensors",
763
- "model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight": "model-00001-of-00003.safetensors",
764
- "model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias": "model-00001-of-00003.safetensors",
765
- "model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight": "model-00001-of-00003.safetensors",
766
- "model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias": "model-00001-of-00003.safetensors",
767
- "model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight": "model-00001-of-00003.safetensors",
768
- "model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias": "model-00001-of-00003.safetensors",
769
- "model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight": "model-00001-of-00003.safetensors",
770
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
771
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
772
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
773
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
774
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
775
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
776
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
777
- "model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
778
- "model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias": "model-00001-of-00003.safetensors",
779
- "model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight": "model-00001-of-00003.safetensors",
780
- "model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias": "model-00001-of-00003.safetensors",
781
- "model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight": "model-00001-of-00003.safetensors",
782
- "model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias": "model-00001-of-00003.safetensors",
783
- "model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight": "model-00001-of-00003.safetensors",
784
- "model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias": "model-00001-of-00003.safetensors",
785
- "model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight": "model-00001-of-00003.safetensors",
786
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
787
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
788
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
789
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
790
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
791
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
792
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
793
- "model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
794
- "model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias": "model-00001-of-00003.safetensors",
795
- "model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight": "model-00001-of-00003.safetensors",
796
- "model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias": "model-00001-of-00003.safetensors",
797
- "model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight": "model-00001-of-00003.safetensors",
798
- "model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias": "model-00001-of-00003.safetensors",
799
- "model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight": "model-00001-of-00003.safetensors",
800
- "model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias": "model-00001-of-00003.safetensors",
801
- "model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight": "model-00001-of-00003.safetensors",
802
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias": "model-00001-of-00003.safetensors",
803
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
804
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias": "model-00001-of-00003.safetensors",
805
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight": "model-00001-of-00003.safetensors",
806
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias": "model-00001-of-00003.safetensors",
807
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
808
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias": "model-00001-of-00003.safetensors",
809
- "model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
810
- "model.vision_tower.vision_model.head.attention.in_proj_bias": "model-00001-of-00003.safetensors",
811
- "model.vision_tower.vision_model.head.attention.in_proj_weight": "model-00001-of-00003.safetensors",
812
- "model.vision_tower.vision_model.head.attention.out_proj.bias": "model-00001-of-00003.safetensors",
813
- "model.vision_tower.vision_model.head.attention.out_proj.weight": "model-00001-of-00003.safetensors",
814
- "model.vision_tower.vision_model.head.layernorm.bias": "model-00001-of-00003.safetensors",
815
- "model.vision_tower.vision_model.head.layernorm.weight": "model-00001-of-00003.safetensors",
816
- "model.vision_tower.vision_model.head.mlp.fc1.bias": "model-00001-of-00003.safetensors",
817
- "model.vision_tower.vision_model.head.mlp.fc1.weight": "model-00001-of-00003.safetensors",
818
- "model.vision_tower.vision_model.head.mlp.fc2.bias": "model-00001-of-00003.safetensors",
819
- "model.vision_tower.vision_model.head.mlp.fc2.weight": "model-00001-of-00003.safetensors",
820
- "model.vision_tower.vision_model.head.probe": "model-00001-of-00003.safetensors",
821
- "model.vision_tower.vision_model.post_layernorm.bias": "model-00001-of-00003.safetensors",
822
- "model.vision_tower.vision_model.post_layernorm.weight": "model-00001-of-00003.safetensors"
823
- }
824
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modeling_granite_vision_embedding.py DELETED
@@ -1,190 +0,0 @@
1
- from typing import ClassVar, Optional
2
-
3
- import numpy as np
4
- import torch
5
- from torch import nn
6
- from transformers import LlavaNextPreTrainedModel
7
- from transformers.models.llava_next.modeling_llava_next import LlavaNextForConditionalGeneration
8
- from transformers.models.llava_next.modeling_llava_next import unpad_image, get_anyres_image_grid_shape
9
-
10
- from .granite_vision_embedding_config import GraniteVisionEmbConfig
11
-
12
- class LlavaNextWithCustomPacking(LlavaNextForConditionalGeneration):
13
-
14
- def pack_image_features(
15
- self,
16
- image_features,
17
- image_sizes,
18
- vision_feature_select_strategy,
19
- image_newline=None
20
- ):
21
- """
22
- Reshape, unpad and then pack each image_feature into a single image_features tensor containing all visual vectors.
23
-
24
- Args:
25
- image_features (`List[torch.Tensor]` of length num_images, each of shape `(num_patches, image_length, embed_dim)`)
26
- List of image feature tensor, each contains all the visual feature of all patches.
27
- image_sizes (`torch.Tensor` of shape `(num_images, 2)`)
28
- Actual image size of each images (H, W).
29
- vision_feature_select_strategy (`str`)
30
- The feature selection strategy used to select the vision feature from the vision backbone.
31
- image_newline (`torch.Tensor` of shape `(embed_dim)`)
32
- New line embedding vector.
33
- Returns:
34
- image_features (`torch.Tensor` of shape `(all_feat_len, embed_dim)`)
35
- feature_lens (`List[int]`)
36
- token length of each image in image_features
37
- """
38
-
39
- base_image_feature_location = self.config.base_image_feature_location
40
- new_image_features = []
41
- feature_lens = []
42
- for image_idx, image_feature in enumerate(image_features):
43
- if image_feature.shape[0] > 1:
44
- base_image_feature = image_feature[0]
45
- image_feature = image_feature[1:]
46
- height = width = self.config.vision_config.image_size // self.config.vision_config.patch_size
47
-
48
- num_patch_height, num_patch_width = get_anyres_image_grid_shape(
49
- image_sizes[image_idx],
50
- self.config.image_grid_pinpoints,
51
- self.config.vision_config.image_size,
52
- )
53
-
54
- if (
55
- np.prod(image_feature.shape) % (num_patch_height * num_patch_width * height * width) != 0
56
- and vision_feature_select_strategy == "default"
57
- ):
58
- print(
59
- "Image feature shape does not line up with the provided patch size. "
60
- "You may be using the `default` vision_feature_select_strategy with a"
61
- " visual encoder that does not have CLS."
62
- )
63
-
64
- image_feature = image_feature.view(num_patch_height, num_patch_width, height, width, -1)
65
- image_feature = image_feature.permute(4, 0, 2, 1, 3).contiguous()
66
- image_feature = image_feature.flatten(1, 2).flatten(2, 3)
67
- image_feature = unpad_image(image_feature, image_sizes[image_idx])
68
- if image_newline is not None:
69
- image_feature = torch.cat(
70
- (
71
- image_feature,
72
- image_newline[:, None, None]
73
- .expand(*image_feature.shape[:-1], 1)
74
- .to(image_feature.device, image_feature.dtype),
75
- ),
76
- dim=-1,
77
- )
78
- image_feature = image_feature.flatten(1, 2).transpose(0, 1)
79
- if base_image_feature_location == "last":
80
- image_feature = torch.cat((image_feature, base_image_feature), dim=0)
81
- else:
82
- image_feature = torch.cat((base_image_feature, image_feature), dim=0)
83
-
84
- else:
85
- image_feature = image_feature[0]
86
- if image_newline is not None:
87
- image_feature = torch.cat((image_feature, image_newline[None].to(image_feature)), dim=0)
88
- new_image_features.append(image_feature)
89
- feature_lens.append(image_feature.size(0))
90
- image_features = torch.cat(new_image_features, dim=0)
91
- feature_lens = torch.tensor(feature_lens, dtype=torch.long, device=image_features.device)
92
- return image_features, feature_lens
93
-
94
-
95
- class GraniteVisionEmb(LlavaNextPreTrainedModel):
96
- """
97
- GraniteVisionEmb model implementation.
98
- """
99
-
100
- main_input_name: ClassVar[str] = "doc_input_ids" # transformers-related
101
- config_class = GraniteVisionEmbConfig
102
-
103
- def __init__(self, config: GraniteVisionEmbConfig):
104
- super().__init__(config=config)
105
-
106
- model = LlavaNextWithCustomPacking(config=config)
107
- if model.language_model._tied_weights_keys is not None:
108
- self._tied_weights_keys = [f"model.language_model.{k}" for k in model.language_model._tied_weights_keys]
109
- self.model = model
110
-
111
- self.dim = 128
112
- self.custom_text_proj = nn.Linear(self.model.config.text_config.hidden_size, self.dim)
113
-
114
- self.post_init()
115
-
116
- def forward(self, *args, **kwargs) -> torch.Tensor:
117
- # Delete output_hidden_states from kwargs
118
- kwargs.pop("output_hidden_states", None)
119
- if "pixel_values" in kwargs:
120
- kwargs["pixel_values"] = kwargs["pixel_values"].to(dtype=self.dtype)
121
-
122
- outputs = self.model(*args, output_hidden_states=True, **kwargs) # (batch_size, sequence_length, hidden_size)
123
- last_hidden_states = outputs.hidden_states[-1] # (batch_size, sequence_length, hidden_size)
124
-
125
- attention_mask = kwargs["attention_mask"]
126
- if "pixel_values" in kwargs:
127
- input_ids = kwargs['input_ids']
128
- image_mask = (input_ids == self.config.image_token_index)
129
- # inputs_embeds = last_hidden_states.masked_scatter(image_mask)
130
- N, M = image_mask.shape
131
- # Create an index matrix: each row is 0, 1, ..., M-1
132
- idx = torch.arange(M, device=image_mask.device).expand(N, M)
133
- # Replace False positions with -1 so they are ignored by topk (since all valid indices are >=0)
134
- masked_idx = torch.where(image_mask, idx, torch.tensor(-1, device=image_mask.device))
135
- topk_values, _ = torch.topk(masked_idx, k=729, dim=1)
136
- last_k_indices, _ = torch.sort(topk_values, dim=1)
137
- last_k_indices_exp = last_k_indices.unsqueeze(-1).expand(-1, -1, last_hidden_states.size(-1))
138
- last_hidden_states = torch.gather(last_hidden_states, 1, last_k_indices_exp)
139
- attention_mask = torch.gather(attention_mask, 1, last_k_indices)
140
-
141
- attention_mask = attention_mask.unsqueeze(-1)
142
-
143
- proj = self.custom_text_proj(last_hidden_states) # (batch_size, sequence_length, dim)
144
-
145
- # L2 normalization
146
- proj = proj / (proj.norm(dim=-1, keepdim=True) + 1e-8)
147
-
148
- # proj = proj * kwargs["attention_mask"].unsqueeze(-1) # (batch_size, sequence_length, dim)
149
- proj = proj * attention_mask # (batch_size, sequence_length, dim)
150
-
151
- return proj
152
-
153
- def get_input_embeddings(self):
154
- return self.model.language_model.get_input_embeddings()
155
-
156
- def set_input_embeddings(self, value):
157
- self.model.language_model.set_input_embeddings(value)
158
-
159
- def get_output_embeddings(self):
160
- return self.model.language_model.get_output_embeddings()
161
-
162
- def set_output_embeddings(self, new_embeddings):
163
- self.model.language_model.set_output_embeddings(new_embeddings)
164
-
165
- def set_decoder(self, decoder):
166
- self.model.language_model.set_decoder(decoder)
167
-
168
- def get_decoder(self):
169
- return self.model.language_model.get_decoder()
170
-
171
- def tie_weights(self):
172
- return self.model.language_model.tie_weights()
173
-
174
- def resize_token_embeddings(
175
- self,
176
- new_num_tokens: Optional[int] = None,
177
- pad_to_multiple_of=None,
178
- ) -> nn.Embedding:
179
- model_embeds = self.model.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
180
-
181
- # Update vocab size
182
- self.config.text_config.vocab_size = model_embeds.num_embeddings
183
- self.config.vocab_size = model_embeds.num_embeddings
184
- self.model.vocab_size = model_embeds.num_embeddings
185
-
186
- return model_embeds
187
-
188
- @property
189
- def patch_size(self) -> int:
190
- return self.model.vision_tower.config.patch_size
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
preprocessor_config.json DELETED
@@ -1,137 +0,0 @@
1
- {
2
- "crop_size": {
3
- "height": 384,
4
- "width": 384
5
- },
6
- "default_to_square": false,
7
- "do_center_crop": true,
8
- "do_convert_rgb": null,
9
- "do_normalize": true,
10
- "do_pad": true,
11
- "do_rescale": true,
12
- "do_resize": true,
13
- "image_grid_pinpoints": [
14
- [
15
- 384,
16
- 768
17
- ],
18
- [
19
- 384,
20
- 1152
21
- ],
22
- [
23
- 384,
24
- 1536
25
- ],
26
- [
27
- 384,
28
- 1920
29
- ],
30
- [
31
- 384,
32
- 2304
33
- ],
34
- [
35
- 384,
36
- 2688
37
- ],
38
- [
39
- 384,
40
- 3072
41
- ],
42
- [
43
- 384,
44
- 3456
45
- ],
46
- [
47
- 384,
48
- 3840
49
- ],
50
- [
51
- 768,
52
- 384
53
- ],
54
- [
55
- 768,
56
- 768
57
- ],
58
- [
59
- 768,
60
- 1152
61
- ],
62
- [
63
- 768,
64
- 1536
65
- ],
66
- [
67
- 768,
68
- 1920
69
- ],
70
- [
71
- 1152,
72
- 384
73
- ],
74
- [
75
- 1152,
76
- 768
77
- ],
78
- [
79
- 1152,
80
- 1152
81
- ],
82
- [
83
- 1536,
84
- 384
85
- ],
86
- [
87
- 1536,
88
- 768
89
- ],
90
- [
91
- 1920,
92
- 384
93
- ],
94
- [
95
- 1920,
96
- 768
97
- ],
98
- [
99
- 2304,
100
- 384
101
- ],
102
- [
103
- 2688,
104
- 384
105
- ],
106
- [
107
- 3072,
108
- 384
109
- ],
110
- [
111
- 3456,
112
- 384
113
- ],
114
- [
115
- 3840,
116
- 384
117
- ]
118
- ],
119
- "image_mean": [
120
- 0.5,
121
- 0.5,
122
- 0.5
123
- ],
124
- "image_processor_type": "LlavaNextImageProcessor",
125
- "image_std": [
126
- 0.5,
127
- 0.5,
128
- 0.5
129
- ],
130
- "processor_class": "GraniteVisionEmbProcessor",
131
- "resample": 3,
132
- "rescale_factor": 0.00392156862745098,
133
- "size": {
134
- "height": 384,
135
- "width": 384
136
- }
137
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
processing_granite_vision_embedding.py DELETED
@@ -1,439 +0,0 @@
1
- import math
2
- from typing import ClassVar, List, Optional, Tuple, Union
3
-
4
- import torch
5
- from PIL import Image, ImageOps
6
- from transformers import BatchFeature, LlavaNextProcessor
7
-
8
-
9
- def round_by_factor(number: float, factor: int) -> int:
10
- """Returns the closest integer to 'number' that is divisible by 'factor'."""
11
- return round(number / factor) * factor
12
-
13
-
14
- def ceil_by_factor(number: float, factor: int) -> int:
15
- """Returns the smallest integer greater than or equal to 'number' that is divisible by 'factor'."""
16
- return math.ceil(number / factor) * factor
17
-
18
-
19
- def floor_by_factor(number: float, factor: int) -> int:
20
- """Returns the largest integer less than or equal to 'number' that is divisible by 'factor'."""
21
- return math.floor(number / factor) * factor
22
-
23
-
24
- class GraniteVisionEmbProcessor(LlavaNextProcessor):
25
- """
26
- Processor for GraniteVisionEmb.
27
- """
28
-
29
- visual_prompt_prefix: ClassVar[str] = "<|user|>\n<image>\nDescribe the image.\n"
30
- system_message: ClassVar[
31
- str] = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."
32
- query_prefix: ClassVar[str] = "Query: "
33
- query_start: ClassVar[str] = "<|user|>\n"
34
-
35
- def __init__(self, *args, **kwargs):
36
- super().__init__(*args, **kwargs)
37
- self.factor = 14
38
- self.min_size = 384
39
- self.max_size = 384 * 2
40
- self.suffix_len = 10
41
- self.patch_size = 14
42
-
43
- @property
44
- def query_augmentation_token(self) -> str:
45
- """
46
- Return the query augmentation token.
47
- Query augmentation buffers are used as reasoning buffers during inference.
48
- """
49
- return self.tokenizer.pad_token
50
-
51
- @staticmethod
52
- def smart_resize_helper(
53
- width: int,
54
- height: int,
55
- factor: int,
56
- min_size: int,
57
- max_size: int
58
- ) -> Tuple[int, int]:
59
- """
60
- Returns the resized image dimensions such that:
61
- 1. The smaller dimension is set to 'min_size'.
62
- 2. The larger dimension is scaled proportionally to maintain aspect ratio.
63
- 3. If the larger dimension exceeds 'max_size', it is clipped to 'max_size',
64
- and the smaller dimension is adjusted accordingly to maintain aspect ratio.
65
- 4. Both dimensions are divisible by 'factor'.
66
- """
67
-
68
- # Determine scale factor based on min_size
69
- if height < width:
70
- scale_factor = min_size / height
71
- else:
72
- scale_factor = min_size / width
73
-
74
- new_width = round(width * scale_factor)
75
- new_height = round(height * scale_factor)
76
-
77
- # If the longer dimension exceeds max_size, adjust accordingly
78
- if max(new_width, new_height) > max_size:
79
- clip_factor = max_size / max(new_width, new_height)
80
- new_width = round(new_width * clip_factor)
81
- new_height = round(new_height * clip_factor)
82
-
83
- # Ensure dimensions are divisible by factor
84
- # new_width = round_by_factor(new_width, factor)
85
- # new_height = round_by_factor(new_height, factor)
86
-
87
- return new_width, new_height
88
-
89
- @staticmethod
90
- def pad_image_center(image: Image.Image,
91
- target_width: int,
92
- target_height: int,
93
- fill_color=(0, 0, 0)) -> Image.Image:
94
- """
95
- Pads the given image to be centered within the target dimensions.
96
-
97
- :param image: PIL Image to be padded.
98
- :param target_width: The desired width after padding.
99
- :param target_height: The desired height after padding.
100
- :param fill_color: Background color (default is black).
101
- :return: Padded image with centered content.
102
- """
103
-
104
- # Get original image size
105
- img_width, img_height = image.size
106
-
107
- # Compute padding values
108
- pad_left = (target_width - img_width) // 2
109
- pad_top = (target_height - img_height) // 2
110
- pad_right = target_width - img_width - pad_left
111
- pad_bottom = target_height - img_height - pad_top
112
-
113
- # Apply padding
114
- padded_image = ImageOps.expand(image, (pad_left, pad_top, pad_right, pad_bottom), fill_color).convert("RGB")
115
-
116
- return padded_image
117
-
118
- def smart_resize(self, image: Image.Image) -> Image.Image:
119
- """
120
- Resize and convert the image to the required format.
121
- """
122
- image_size = image.size
123
- resized_height, resized_width = self.smart_resize_helper(
124
- width=image_size[0],
125
- height=image_size[1],
126
- factor=self.factor,
127
- min_size=self.min_size,
128
- max_size=self.max_size
129
- )
130
- return image.convert("RGB").resize((resized_width, resized_height))
131
-
132
- def smart_resize_and_pad(self, image: Image.Image) -> Image.Image:
133
- """
134
- Resize and pad the image to the required format.
135
- """
136
- return self.resize_and_pad_centered_to_long_side(
137
- image=image,
138
- factor=self.factor,
139
- min_size=self.min_size,
140
- max_size=self.max_size,
141
- fill_color=0
142
- )
143
-
144
- def resize_and_pad_centered_to_long_side(
145
- self,
146
- image: Image.Image,
147
- factor: int,
148
- min_size: int,
149
- max_size: int,
150
- fill_color=0
151
- ) -> Image.Image:
152
- """
153
- Resizes and pads an image such that:
154
- - The long side is set to `max_size`.
155
- - The short side is scaled proportionally but not below `min_size`.
156
- - The image is centered within the final padded area.
157
-
158
- :param image: PIL Image
159
- :param factor: Factor to make dimensions divisible by
160
- :param min_size: Minimum allowed size for the short side
161
- :param max_size: Target size for the long side
162
- :param fill_color: Background padding color (default black)
163
- :return: Resized and padded image
164
- """
165
-
166
- # Get original size
167
- width, height = image.size
168
-
169
- if min_size == -1 or max_size == -1:
170
- return image.convert("RGB")
171
-
172
- # Step 1: scale long side to max_size, keep aspect ratio
173
- if width > height:
174
- scale_factor = max_size / width
175
- target_width = max_size
176
- max_scale_factor = max(min_size / height, scale_factor)
177
- target_height = round(height * max_scale_factor)
178
- else:
179
- scale_factor = max_size / height
180
- target_height = max_size
181
- max_scale_factor = max(min_size / width, scale_factor)
182
- target_width = round(width * max_scale_factor)
183
-
184
- # Resize the image
185
- resized_image = image.resize((target_width, target_height), Image.LANCZOS)
186
- final_image = resized_image.convert("RGB")
187
-
188
- return final_image
189
-
190
- def resize_and_pad_centered(self,
191
- image: Image.Image,
192
- factor: int,
193
- min_size: int,
194
- max_size: int,
195
- fill_color=0
196
- ) -> Image.Image:
197
- """
198
- Resizes and pads an image such that:
199
- - The short side is set to `min_size`.
200
- - The long side is scaled proportionally but clipped to `max_size`.
201
- - The image is centered within the final padded area.
202
-
203
- :param image: PIL Image
204
- :param factor: Factor to make dimensions divisible by
205
- :param min_size: Minimum size for the short side
206
- :param max_size: Maximum allowed size for the long side
207
- :param fill_color: Background padding color (default black)
208
- :return: Resized and padded image
209
- """
210
-
211
- # Get original size
212
- width, height = image.size
213
-
214
- if min_size == -1 or max_size == -1:
215
- return image.convert("RGB")
216
-
217
- # Determine scale factor based on the short side (min_size)
218
- if width < height:
219
- scale_factor = min_size / width
220
- target_width = min_size
221
- max_scale_factor = min(max_size / height, scale_factor)
222
- target_height = round(height * max_scale_factor)
223
- else:
224
- scale_factor = min_size / height
225
- target_height = min_size
226
- max_scale_factor = min(max_size / width, scale_factor)
227
- target_width = round(width * max_scale_factor)
228
-
229
- # Ensure the longer side does not exceed max_size
230
- # if max(target_width, target_height) > max_size:
231
- # clip_factor = max_size / max(target_width, target_height)
232
- # target_width = round(target_width * clip_factor)
233
- # target_height = round(target_height * clip_factor)
234
-
235
- # Ensure dimensions are divisible by factor
236
- # target_width = round_by_factor(target_width, factor)
237
- # target_height = round_by_factor(target_height, factor)
238
-
239
- # Resize the image
240
- resized_image = image.resize((target_width, target_height), Image.LANCZOS)
241
-
242
- # Determine final padded dimensions (aligned to short side)
243
- if width < height:
244
- final_width, final_height = min_size, max_size
245
- else:
246
- final_width, final_height = max_size, min_size
247
-
248
- # Compute padding to center the image
249
- pad_left = (final_width - target_width) // 2
250
- pad_top = (final_height - target_height) // 2
251
- pad_right = final_width - target_width - pad_left
252
- pad_bottom = final_height - target_height - pad_top
253
-
254
- # Apply centered padding
255
- # final_image = ImageOps.expand(resized_image, (pad_left, pad_top, pad_right, pad_bottom), fill_color).convert("RGB")
256
- final_image = resized_image.convert("RGB")
257
-
258
- return final_image
259
-
260
- def format_data(self, question, image):
261
- return [
262
- {
263
- "role": "system",
264
- "content": [{"type": "text", "text": self.system_message}],
265
- },
266
- {
267
- "role": "user",
268
- "content": [
269
- {
270
- "type": "image",
271
- "image": image,
272
- },
273
- {
274
- "type": "text",
275
- "text": question,
276
- },
277
- ],
278
- }
279
- ]
280
-
281
- def format_data_wo_role(self, question, image=None):
282
- return [
283
- {
284
- "role": "user",
285
- "content": [
286
- {
287
- "type": "image",
288
- "image": image,
289
- },
290
- {
291
- "type": "text",
292
- "text": question,
293
- },
294
- ],
295
- }
296
- ]
297
-
298
- def process_images(
299
- self,
300
- images: List[Image.Image],
301
- ) -> BatchFeature:
302
- """
303
- Process images.
304
- """
305
- # texts_doc = [self.apply_chat_template(self.format_data_wo_role(self.visual_prompt_prefix, img),tokenize=False ) for img in images]
306
- texts_doc = [self.visual_prompt_prefix for _ in images]
307
- images = [self.smart_resize_and_pad(image) for image in images]
308
-
309
- batch_doc = self(
310
- text=texts_doc,
311
- images=images,
312
- return_tensors="pt",
313
- padding="longest",
314
- )
315
- return batch_doc
316
-
317
- def process_queries(self, queries, max_length=2048, suffix=None):
318
- if suffix is None:
319
- suffix = self.query_augmentation_token * self.suffix_len
320
-
321
- processed = []
322
- for q in queries:
323
- q = self.query_start + self.query_prefix + q + ' ' + q
324
- q += suffix + "\n"
325
- processed.append(q)
326
-
327
- return self(
328
- text=processed,
329
- images=None,
330
- return_tensors="pt",
331
- padding="longest",
332
- truncation=True,
333
- max_length=max_length,
334
- )
335
-
336
- def score(
337
- self,
338
- qs: List[torch.Tensor],
339
- ps: List[torch.Tensor],
340
- device: Optional[Union[str, torch.device]] = None,
341
- **kwargs,
342
- ) -> torch.Tensor:
343
- """
344
- Compute the MaxSim score (ColBERT-like) for the given multi-vector query and passage embeddings.
345
- """
346
- return self.score_multi_vector(qs, ps, device=device, **kwargs)
347
-
348
- def get_n_patches(
349
- self,
350
- image_size: Tuple[int, int],
351
- patch_size: int,
352
- ) -> Tuple[int, int]:
353
- n_patches_x = self.image_processor.size["width"] // patch_size
354
- n_patches_y = self.image_processor.size["height"] // patch_size
355
-
356
- return n_patches_x, n_patches_y
357
-
358
- def get_image_mask(self, batch_images: BatchFeature) -> torch.Tensor:
359
- return batch_images.input_ids == self.image_token_id
360
-
361
- @staticmethod
362
- def score_single_vector(
363
- qs: List[torch.Tensor],
364
- ps: List[torch.Tensor],
365
- device: Optional[Union[str, torch.device]] = None,
366
- ) -> torch.Tensor:
367
- """
368
- Compute the dot product score for the given single-vector query and passage embeddings.
369
- """
370
-
371
- if len(qs) == 0:
372
- raise ValueError("No queries provided")
373
- if len(ps) == 0:
374
- raise ValueError("No passages provided")
375
-
376
- qs_stacked = torch.stack(qs).to(device)
377
- ps_stacked = torch.stack(ps).to(device)
378
-
379
- scores = torch.einsum("bd,cd->bc", qs_stacked, ps_stacked)
380
- assert scores.shape[0] == len(qs), f"Expected {len(qs)} scores, got {scores.shape[0]}"
381
-
382
- scores = scores.to(torch.float32)
383
- return scores
384
-
385
- @staticmethod
386
- def score_multi_vector(
387
- qs: Union[torch.Tensor, List[torch.Tensor]],
388
- ps: Union[torch.Tensor, List[torch.Tensor]],
389
- batch_size: int = 128,
390
- device: Optional[Union[str, torch.device]] = None,
391
- ) -> torch.Tensor:
392
- """
393
- Compute the late-interaction/MaxSim score (ColBERT-like) for the given multi-vector
394
- query embeddings (`qs`) and passage embeddings (`ps`). For us, a passage is the
395
- image of a document page.
396
-
397
- Because the embedding tensors are multi-vector and can thus have different shapes, they
398
- should be fed as:
399
- (1) a list of tensors, where the i-th tensor is of shape (sequence_length_i, embedding_dim)
400
- (2) a single tensor of shape (n_passages, max_sequence_length, embedding_dim) -> usually
401
- obtained by padding the list of tensors.
402
-
403
- Args:
404
- qs (`Union[torch.Tensor, List[torch.Tensor]`): Query embeddings.
405
- ps (`Union[torch.Tensor, List[torch.Tensor]`): Passage embeddings.
406
- batch_size (`int`, *optional*, defaults to 128): Batch size for computing scores.
407
- device (`Union[str, torch.device]`, *optional*): Device to use for computation. If not
408
- provided, uses `get_torch_device("auto")`.
409
-
410
- Returns:
411
- `torch.Tensor`: A tensor of shape `(n_queries, n_passages)` containing the scores. The score
412
- tensor is saved on the "cpu" device.
413
- """
414
-
415
- if len(qs) == 0:
416
- raise ValueError("No queries provided")
417
- if len(ps) == 0:
418
- raise ValueError("No passages provided")
419
-
420
- scores_list: List[torch.Tensor] = []
421
-
422
- for i in range(0, len(qs), batch_size):
423
- scores_batch = []
424
- qs_batch = torch.nn.utils.rnn.pad_sequence(qs[i: i + batch_size], batch_first=True, padding_value=0).to(
425
- device
426
- )
427
- for j in range(0, len(ps), batch_size):
428
- ps_batch = torch.nn.utils.rnn.pad_sequence(
429
- ps[j: j + batch_size], batch_first=True, padding_value=0
430
- ).to(device)
431
- scores_batch.append(torch.einsum("bnd,csd->bcns", qs_batch, ps_batch).max(dim=3)[0].sum(dim=2))
432
- scores_batch = torch.cat(scores_batch, dim=1).cpu()
433
- scores_list.append(scores_batch)
434
-
435
- scores = torch.cat(scores_list, dim=0)
436
- assert scores.shape[0] == len(qs), f"Expected {len(qs)} scores, got {scores.shape[0]}"
437
-
438
- scores = scores.to(torch.float32)
439
- return scores
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
processor_config.json DELETED
@@ -1,6 +0,0 @@
1
- {
2
- "processor_class": "GraniteVisionEmbProcessor",
3
- "auto_map": {
4
- "AutoProcessor": "processing_granite_vision_embedding.GraniteVisionEmbProcessor"
5
- }
6
- }
 
 
 
 
 
 
 
special_tokens_map.json DELETED
@@ -1,35 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|start_of_role|>",
4
- "<|end_of_role|>",
5
- "<|tool_call|>"
6
- ],
7
- "bos_token": {
8
- "content": "<|end_of_text|>",
9
- "lstrip": false,
10
- "normalized": false,
11
- "rstrip": false,
12
- "single_word": false
13
- },
14
- "eos_token": {
15
- "content": "<|end_of_text|>",
16
- "lstrip": false,
17
- "normalized": false,
18
- "rstrip": false,
19
- "single_word": false
20
- },
21
- "pad_token": {
22
- "content": "<|end_of_text|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false
27
- },
28
- "unk_token": {
29
- "content": "<|end_of_text|>",
30
- "lstrip": false,
31
- "normalized": false,
32
- "rstrip": false,
33
- "single_word": false
34
- }
35
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json DELETED
@@ -1,208 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "0": {
6
- "content": "<|end_of_text|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "1": {
14
- "content": "<fim_prefix>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "2": {
22
- "content": "<fim_middle>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "3": {
30
- "content": "<fim_suffix>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "4": {
38
- "content": "<fim_pad>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "5": {
46
- "content": "<filename>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "6": {
54
- "content": "<gh_stars>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "7": {
62
- "content": "<issue_start>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "8": {
70
- "content": "<issue_comment>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "9": {
78
- "content": "<issue_closed>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "10": {
86
- "content": "<jupyter_start>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "11": {
94
- "content": "<jupyter_text>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "12": {
102
- "content": "<jupyter_code>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "13": {
110
- "content": "<jupyter_output>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "14": {
118
- "content": "<empty_output>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": true
124
- },
125
- "15": {
126
- "content": "<commit_before>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": true
132
- },
133
- "16": {
134
- "content": "<commit_msg>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": true
140
- },
141
- "17": {
142
- "content": "<commit_after>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": true
148
- },
149
- "18": {
150
- "content": "<reponame>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": true
156
- },
157
- "49152": {
158
- "content": "<|start_of_role|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": true
164
- },
165
- "49153": {
166
- "content": "<|end_of_role|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": true
172
- },
173
- "49154": {
174
- "content": "<|tool_call|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": true
180
- },
181
- "49155": {
182
- "content": "<image>",
183
- "lstrip": false,
184
- "normalized": false,
185
- "rstrip": false,
186
- "single_word": false,
187
- "special": true
188
- }
189
- },
190
- "additional_special_tokens": [
191
- "<|start_of_role|>",
192
- "<|end_of_role|>",
193
- "<|tool_call|>"
194
- ],
195
- "bos_token": "<|end_of_text|>",
196
- "chat_template": "{%- if tools %}\n {{- '<|start_of_role|>available_tools<|end_of_role|>\n' }}\n {%- for tool in tools %}\n {{- tool | tojson(indent=4) }}\n {%- if not loop.last %}\n {{- '\n\n' }}\n {%- endif %}\n {%- endfor %}\n {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- for message in messages if message['role'] == 'system'%}{% else %}<|system|>\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n{% endfor %}{%- for message in messages %}\n {%- if message['role'] == 'system' %}\n {{- '<|system|>\n' + message['content'] + '\n' }}\n {%- elif message['role'] == 'user' %}\n {{- '<|user|>\n' + message['content'] + '\n' }}\n {%- elif message['role'] == 'assistant' %}\n {{- '<|assistant|>\n' + message['content'] + '<|end_of_text|>' }}\n {%- elif message['role'] == 'assistant_tool_call' %}\n {{- '<|start_of_role|>assistant<|end_of_role|><|tool_call|>' + message['content'] + '<|end_of_text|>\n' }}\n {%- elif message['role'] == 'tool_response' %}\n {{- '<|start_of_role|>tool_response<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if loop.last and add_generation_prompt %}\n {{- '<|assistant|>\n' }}\n {%- endif %}\n{%- endfor %}",
197
- "clean_up_tokenization_spaces": true,
198
- "do_image_splitting": false,
199
- "eos_token": "<|end_of_text|>",
200
- "errors": "replace",
201
- "extra_special_tokens": {},
202
- "model_max_length": 131072,
203
- "pad_token": "<|end_of_text|>",
204
- "padding_side": "right",
205
- "tokenizer_class": "GPT2Tokenizer",
206
- "unk_token": "<|end_of_text|>",
207
- "vocab_size": 49152
208
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vocab.json DELETED
The diff for this file is too large to render. See raw diff