feat: Add CPU support
#18
by
gabegoodhart
- opened
Description
This PR adds support to modeling_nemotron.py
for running inference on CPU. This is a cleaned up version of the edits I made while working on support in llama.cpp
.
Changes
- Handle failed imports of
rmsnorm_fn
- Add un-optimized implementation of
MambaRMSNormGated.forward
- Fix
NemotronHMamba2Mixer.torch_forward
to userepeat_interleaved
forB
andC
(see discussion here)