TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 19:21:52 +08:00

History

William Zhang 478b6b20a1 [#9230 ][refactor] Replace nemotron patches with custom model implementation (#9751 ) [#9230][refactor] Replace nemotron patches with custom model implementation * Why? Patching for nemotron H models was growing out of hand, and made certain optimizations more complex than they needed to be. * What? This commit finally gets rid of them, and replaces them with the custom model implementation in `modeling_nemotron_h.py`. Closes #9230 Closes NvBug 5747867 Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-18 19:36:27 -08:00
..
multigpu	[https://nvbugs/5731717 ][fix] fixed flashinfer build race condition during test (#9983 )	2025-12-15 20:30:24 -08:00
singlegpu	[#9230 ][refactor] Replace nemotron patches with custom model implementation (#9751 )	2025-12-18 19:36:27 -08:00

[#9230 ][refactor] Replace nemotron patches with custom model implementation (#9751 )

[#9230][refactor] Replace nemotron patches with custom model implementation

* Why?

Patching for nemotron H models was growing out of hand, and made certain
optimizations more complex than they needed to be.

* What?

This commit finally gets rid of them, and replaces them with the custom
model implementation in `modeling_nemotron_h.py`.

Closes #9230
Closes NvBug 5747867

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2025-12-18 19:36:27 -08:00

multigpu

[https://nvbugs/5731717 ][fix] fixed flashinfer build race condition during test (#9983 )

2025-12-15 20:30:24 -08:00

singlegpu

[#9230 ][refactor] Replace nemotron patches with custom model implementation (#9751 )

2025-12-18 19:36:27 -08:00