Llama3.2-3B
Run Llama3.2-3B optimized for Qualcomm NPUs with nexaSDK.
Quickstart
- Install nexaSDK and create a free account at sdk.nexa.ai
- Activate your device with your access token:
nexa config set license '<access_token>'
- Run the model on Qualcomm NPU in one line:
nexa infer NexaAI/Llama3.2-3B-NPU-Turbo
Model Description
Llama3.2-3B is a 3-billion-parameter language model from Meta’s Llama 3.2 series.
It is designed to provide a balance of efficiency and capability, making it suitable for deployment on a wide range of devices while maintaining strong performance on core language understanding and generation tasks.
Trained on diverse, high-quality datasets, Llama3.2-3B supports multiple languages and is optimized for scalability, fine-tuning, and real-world applications.
Features
- Lightweight yet capable: delivers strong performance with a smaller memory footprint.
- Conversational AI: context-aware dialogue for assistants and agents.
- Content generation: text completion, summarization, code comments, and more.
- Reasoning & analysis: step-by-step problem solving and explanation.
- Multilingual: supports understanding and generation in multiple languages.
- Customizable: can be fine-tuned for domain-specific or enterprise use.
Use Cases
- Personal and enterprise chatbots
- On-device AI applications
- Document and report summarization
- Education and tutoring tools
- Specialized models in verticals (e.g., healthcare, finance, legal)
Inputs and Outputs
Input:
- Text prompts or conversation history (tokenized input sequences).
Output:
- Generated text: responses, explanations, or creative content.
- Optionally: raw logits/probabilities for advanced downstream tasks.
License
References