Update README.md

Browse files

Files changed (1) hide show

README.md +15 -13

README.md CHANGED Viewed

@@ -20,8 +20,8 @@ pipeline_tag: text-generation
 [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) is quantized by the PyTorch team using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (INT8-INT4).
 The model is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
-We provide the [quantized pte](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/phi4-mini-INT8-INT4.pte) for direct use in ExecuTorch.
-(The provided pte file is exported with the default max_seq_length/max_context_length of 128; if you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
 # Running in a mobile app
 The [pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/phi4-mini-INT8-INT4.pte) can be run with ExecuTorch on a mobile phone.  See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
@@ -213,27 +213,29 @@ We can run the quantized model on a mobile phone using [ExecuTorch](https://gith
 Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
 We first convert the [quantized checkpoint](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/pytorch_model.bin) to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
-The following script does this for you.  We have uploaded the converted checkpoint [pytorch_model_converted.bin](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/pytorch_model_converted.bin) for convenience.
 ```Shell
-python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin pytorch_model_converted.bin
 ```
 Once the checkpoint is converted, we can export to ExecuTorch's pte format with the XNNPACK delegate.
-The below command exports with a max_seq_length/max_context_length of 128, the default value, but it can be changed as desired.
 ```Shell
-PARAMS="executorch/examples/models/phi_4_mini/config.json"
-python -m executorch.examples.models.llama.export_llama \
   --model "phi_4_mini" \
-  --checkpoint "pytorch_model_converted.bin" \
-  --params "$PARAMS" \
   -kv \
   --use_sdpa_with_kv_cache \
   -X \
-  --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
-  --max_seq_length 128 \
-  --max_context_length 128 \
-  --output_name="phi4-mini-INT8-INT4.pte"
 ```
 After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).

 [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) is quantized by the PyTorch team using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (INT8-INT4).
 The model is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
+We provide the [quantized pte](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/model.pte) for direct use in ExecuTorch.
+(The provided pte file is exported with at max_seq_length/max_context_length of 1024; if you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
 # Running in a mobile app
 The [pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/phi4-mini-INT8-INT4.pte) can be run with ExecuTorch on a mobile phone.  See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
 Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
 We first convert the [quantized checkpoint](https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4/blob/main/pytorch_model.bin) to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
+The following script does this for you.
 ```Shell
+HF_MODEL_DIR=$(hf download pytorch/Phi-4-mini-instruct-INT8-INT4)
+python -m executorch.examples.models.phi_4_mini.convert_weights $HF_MODEL_DIR pytorch_model_converted.bin
 ```
 Once the checkpoint is converted, we can export to ExecuTorch's pte format with the XNNPACK delegate.
+The below command exports with a max_seq_length/max_context_length of 1024, but it can be changed as desired.
 ```Shell
+python-m executorch.examples.models.llama.export_llama \
   --model "phi_4_mini" \
+  --checkpoint pytorch_model_converted.bin \
+  --params examples/models/phi_4_mini/config/config.json \
+  --output_name model.pte \
   -kv \
   --use_sdpa_with_kv_cache \
   -X \
+  --xnnpack-extended-ops \
+  --max_context_length 1024 \
+  --max_seq_length 1024 \
+  --dtype fp32 \
+  --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'
 ```
 After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).