microsoft
/

Phi-4-multimodal-instruct

Automatic Speech Recognition

text-generation

speech-summarization

speech-translation

visual-question-answering

phi-4-multimodal

Model card Files Files and versions

nguyenbh commited on Mar 4

Commit

1ea5d67

·

verified ·

1 Parent(s): a11e830

Update readme

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -46,6 +46,7 @@ widget:
   - role: user
     content: Can you provide ways to eat combinations of bananas and dragonfruits?
 library_name: transformers
 ---
 ## Model Summary
@@ -407,8 +408,9 @@ model = AutoModelForCausalLM.from_pretrained(
     model_path,
     device_map="cuda",
     torch_dtype="auto",
-    trust_remote_code=True,
-    attn_implementation='flash_attention_2',
 ).cuda()
 # Load generation config
@@ -466,6 +468,8 @@ response = processor.batch_decode(
 print(f'>>> Response\n{response}')
 ```
 ## Responsible AI Considerations
 Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
@@ -561,7 +565,7 @@ Note that by default, the Phi-4-multimodal-instruct model uses flash attention,
 * NVIDIA H100
 If you want to run the model on:
-* NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
 ## License
 The model is licensed under the [MIT license](./LICENSE).

   - role: user
     content: Can you provide ways to eat combinations of bananas and dragonfruits?
 library_name: transformers
+paper: arxiv.org/abs/2503.01743
 ---
 ## Model Summary
     model_path,
     device_map="cuda",
     torch_dtype="auto",
+    trust_remote_code=True,
+    # if you do not Ampere or later GPUs, change attention to "eager"
+    _attn_implementation='flash_attention_2',
 ).cuda()
 # Load generation config
 print(f'>>> Response\n{response}')
 ```
+**Notes**:
 ## Responsible AI Considerations
 Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
 * NVIDIA H100
 If you want to run the model on:
+* NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with _attn_implementation="eager"
 ## License
 The model is licensed under the [MIT license](./LICENSE).