HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce ( #4635 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
tomeras91
f121f13ddf
[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness ( #4954 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-10 11:09:37 +03:00
tomeras91
8d31e16877
[TRTLLM-4923][feat] Paged mamba cache ( #4822 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-04 09:27:08 +03:00
tomeras91
bf9cd11fd4
[TRTLLM-4783][feat] Mamba2 kernel updates for Nemotron-H ( #4494 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-01 13:56:44 +03:00
hlu1
cd2bcdc1a9
Fix create_weights in attention ( #3692 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-24 07:30:00 +08:00
Luis Vega
0bda1f9780
feat: Nemotron-H model support ( #3430 )
...
* added files for nemotron-h
Signed-off-by: Luis Vega <lvega@nvidia.com>
* use try/except to import RMSNorm
Signed-off-by: Luis Vega <lvega@nvidia.com>
---------
Signed-off-by: Luis Vega <lvega@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-16 14:05:56 -07:00