Commit Graph

6 Commits

Author SHA1 Message Date
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce (#4635)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
tomeras91
f121f13ddf
[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness (#4954)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-10 11:09:37 +03:00
tomeras91
8d31e16877
[TRTLLM-4923][feat] Paged mamba cache (#4822)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-04 09:27:08 +03:00
tomeras91
bf9cd11fd4
[TRTLLM-4783][feat] Mamba2 kernel updates for Nemotron-H (#4494)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-01 13:56:44 +03:00
hlu1
cd2bcdc1a9
Fix create_weights in attention (#3692)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-24 07:30:00 +08:00
Luis Vega
0bda1f9780
feat: Nemotron-H model support (#3430)
* added files for nemotron-h

Signed-off-by: Luis Vega <lvega@nvidia.com>

* use try/except to import RMSNorm

Signed-off-by: Luis Vega <lvega@nvidia.com>

---------

Signed-off-by: Luis Vega <lvega@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-16 14:05:56 -07:00