--- license: apache-2.0 tags: - onnx - mistral - instruction-tuned - alex-ai - fp32 library_name: onnxruntime model_creator: techAInewb model_name: mistral-nemo-2407-fp32 --- # Mistral-Nemo Instruct 2407 β€” ONNX FP32 Export This repository contains the ONNX-formatted FP32 export of the **Mistral-Nemo Instruct 2407** model, compatible with ONNX Runtime. ## 🧠 Model Summary This is the **flagship release** of the Alex AI project β€” and to our knowledge, the **first-ever open ONNX-format export of Mistral-Nemo Instruct 2407** for full-stack experimentation and deployment. - **Architecture**: Mistral-Transformer hybrid, instruction-tuned for reasoning and alignment - **Format**: ONNX (graph + external weights) - **Precision**: FP32 (float32) - **Exported Using**: PyTorch β†’ ONNX via `torch.onnx.export` This model forms the foundation for future research in quantization, NPU acceleration, memory-routing, and lightweight agent design. It is being positioned as a clean and transparent baseline for community optimization β€” with future support for AMD Vitis AI, Olive, and quantized variants. ## πŸ“ Files Included | File | Description | |--------------------|-----------------------------------------| | `model.onnx` | The model graph | | `model.onnx.data` | External tensor weights (~27GB) | | `config.json` | Model configuration metadata | | `requirements.txt` | Runtime dependencies | | `LICENSE` | Apache 2.0 License | ## βœ… Requirements Install the required packages: ```bash pip install -r requirements.txt ``` ## πŸš€ Usage Example ```python import onnxruntime as ort import numpy as np session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"]) input_ids = np.array([[0, 1, 2, 3, 4]], dtype=np.int64) attention_mask = np.ones_like(input_ids) outputs = session.run(None, { "input_ids": input_ids, "attention_mask": attention_mask }) print(outputs[0].shape) # (1, 5, vocab_size) ``` ## πŸ’‘ Project Vision The Alex AI project was created to explore what’s possible when we combine precision reasoning, self-evolving memory, and strict efficiency β€” all under real-world constraints. This model is a public cornerstone for research in ONNX deployment, quantization, agent routing, and modular NPU workflows. It is open, transparent, and designed for practical extension. We believe high-quality tools shouldn’t be locked behind paywalls. ## 🀝 Get Involved - GitHub: [github.com/techAInewb](https://github.com/techAInewb) - Hugging Face: [huggingface.co/techAInewb](https://huggingface.co/techAInewb) - Project Chat: Coming soon Contributions, forks, and optimization experiments are welcome! ## πŸ’Έ Support the Project If you’d like to support open-source AI development, please consider donating: **[🫢 Donate via PayPal](https://www.paypal.com/paypalme/AlexAwakens)** _Message: "Thank you for your donation to the Alex AI project!"_ ## πŸ“œ License This model is released under the [Apache 2.0 License](./LICENSE). ## πŸ§ͺ Inference Validation This model has been validated using ONNX Runtime in a local Windows 11 environment: - **System**: AMD Ryzen 5 7640HS, 16GB RAM, RTX 3050 (6GB), Windows 11 Home - **Runtime**: `onnxruntime==1.17.0`, Python 3.10, Conda environment `alex-dev` Test inference was run with: ```python input_ids = np.array([[0, 1, 2, 3, 4]], dtype=np.int64) attention_mask = np.ones_like(input_ids) ``` **Result:** - βœ… Model loaded and executed without error - βœ… Output logits shape: `(1, 5, 131072)` - ⚠️ Memory usage may exceed 20GB for full batch sizes β€” ensure pagefile is set appropriately (we used 350GB) - 🚫 No GPU or CUDA acceleration used for this test β€” CPU-only validation This confirms that full ONNX FP32 export is working and stable, even under real-world hardware constraints.