TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 11:11:36 +08:00

History

William Zhang a4049fc557 [#9413 ][fix] Minor fixes to nemotron H and custom models in AD (#9416 ) * Why? There were a couple of issues with the recently merged custom model injection for AutoDeploy + the reference implementation of nemotron H: - `d_mlp` was left in despite being mathematically always null (could lead to runtime issues during sharding). - the custom model mapping was inherited by children factories. * What? This commit fixes these issues, and refactors the key of the custom implementation to be based on the name of the configuration class as well. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-11-24 20:17:33 -08:00
..
_utils_test	[#9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy (#9322 )	2025-11-24 19:50:10 -08:00
unit	[#9413 ][fix] Minor fixes to nemotron H and custom models in AD (#9416 )	2025-11-24 20:17:33 -08:00

[#9413 ][fix] Minor fixes to nemotron H and custom models in AD (#9416 )

* Why?

There were a couple of issues with the recently merged custom model
injection for AutoDeploy + the reference implementation of nemotron
H:
- `d_mlp` was left in despite being mathematically always null (could
  lead to runtime issues during sharding).
- the custom model mapping was inherited by children factories.

* What?

This commit fixes these issues, and refactors the key of the custom
implementation to be based on the name of the configuration class as
well.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2025-11-24 20:17:33 -08:00

_utils_test

[#9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy (#9322 )

2025-11-24 19:50:10 -08:00

unit

[#9413 ][fix] Minor fixes to nemotron H and custom models in AD (#9416 )

2025-11-24 20:17:33 -08:00