Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,11 @@ On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
|
29 |
|
30 |

|
31 |
|
|
|
|
|
|
|
|
|
|
|
32 |
# Quantization Recipe
|
33 |
|
34 |
First need to install the required packages:
|
@@ -213,7 +218,7 @@ We can run the quantized model on a mobile phone using [ExecuTorch](https://gith
|
|
213 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
214 |
|
215 |
ExecuTorch's LLM export scripts require the checkpoint keys and parameters have certain names, which differ from those used in Hugging Face.
|
216 |
-
So we first use a
|
217 |
```Shell
|
218 |
python -m executorch.examples.models.phi_4_mini.convert_weights $(hf download pytorch/Phi-4-mini-instruct-INT8-INT4) pytorch_model_converted.bin
|
219 |
```
|
|
|
29 |
|
30 |

|
31 |
|
32 |
+
⚠️ **Caveat:** Our mobile demo apps have **regressed support for the Phi-4 tokenizer**, so this model will not currently run in our official demo apps.
|
33 |
+
If you are using your own runner, you can still load and run the `.pte` file successfully.
|
34 |
+
(See https://github.com/pytorch/executorch/issues/14077 for details and tracking.)
|
35 |
+
|
36 |
+
|
37 |
# Quantization Recipe
|
38 |
|
39 |
First need to install the required packages:
|
|
|
218 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
219 |
|
220 |
ExecuTorch's LLM export scripts require the checkpoint keys and parameters have certain names, which differ from those used in Hugging Face.
|
221 |
+
So we first use a script that converts the Hugging Face checkpoint key names to ones that ExecuTorch expects:
|
222 |
```Shell
|
223 |
python -m executorch.examples.models.phi_4_mini.convert_weights $(hf download pytorch/Phi-4-mini-instruct-INT8-INT4) pytorch_model_converted.bin
|
224 |
```
|