TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-24 04:33:04 +08:00

History

William Zhang 11a0b276fb [#9230 ][feat] Slimmed down implementation of nemotron H (#9235 ) * Why? The reference nemotron H code on HuggingFace is out of date, and therefore bugged, and has several untested code paths. This makes an already hairy patching system even hairier. The proposal is to do away with those patches, and replace the original implementation with one that is heavily slimmed down. * What? This PR sets the basis for an alternative path with such a slimmed down implementation that: - fixes bugs in the current HF implementation - adds no new dependencies to TensorRT-LLM - does away with unnecessary features for TensorRT-LLM/ AutoDeploy: - no training related code (dropout, gradient checkpointing, etc.) - no caching logic (we want to replace it with our own anyway) - no attention masking where possible - reuses existing AD custom ops for mamba SSM update / causal conv1d / attention In order for the above to be usable in the AD apparatus, `AutoModelForCausalLMFactory` is extended to allow registrations of custom model implementations. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>		2025-11-23 03:13:32 -08:00
..
compile	[None][feat] Autodeploy add triton configs and optimize mamba prefill (#9083 )	2025-11-13 19:15:43 -08:00
config	[#9096 ][feature] Auto Deploy: configurable fused MoE backend (#9194 )	2025-11-19 21:50:22 -08:00
custom_ops	[TRTLLM-9082][feat] AutoDeploy: Move the moe Align kernel to AOT (#9106 )	2025-11-21 16:05:48 -08:00
distributed	[#9152 ][fix] AutoDeploy fused_allreduce_residual_rmsnorm to support demollm mode (#9197 )	2025-11-18 22:15:29 +02:00
export	[#9230 ][feat] Slimmed down implementation of nemotron H (#9235 )	2025-11-23 03:13:32 -08:00
models	[#9230 ][feat] Slimmed down implementation of nemotron H (#9235 )	2025-11-23 03:13:32 -08:00
shim	[#9237 ][feat] enable iter stats in autodeploy (#9278 )	2025-11-19 19:29:29 +01:00
transform	[#9388 ][fix] AutoDeploy: Fix cutlass BF16 MoE kernel invocation (#9339 )	2025-11-21 17:05:03 -08:00
utils	[None][autodeploy] fix weight extraction for graph based quantized checkpoints (#9109 )	2025-11-13 13:14:24 -08:00
__init__.py	[AutoDeploy] merge feat/ad-2025-07-07 (#6196 )	2025-07-23 05:11:04 +08:00
llm_args.py	[#9237 ][feat] enable iter stats in autodeploy (#9278 )	2025-11-19 19:29:29 +01:00
llm.py	[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856 )	2025-11-06 22:37:03 -08:00