TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 19:21:52 +08:00

History

Ziyi Xiong f2aee0db03 [TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>		2025-12-15 13:28:54 +08:00
..
attention	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 )	2025-12-15 12:42:25 +08:00
auto_deploy	[None][feat] AutoDeploy: prepare_metadata revisited (#9764 )	2025-12-12 20:14:14 +08:00
compilation	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
debugger	Fix: fix nvbug 5356427 (#5464 )	2025-06-25 22:24:26 +08:00
executor	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 )	2025-12-12 22:29:05 +08:00
misc	[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835 )	2025-12-10 20:41:04 +08:00
modeling	[TRTLLM-7967][chore] Add more tests (#9415 )	2025-12-08 11:57:32 -08:00
models/checkpoints/hf	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 )	2025-12-05 16:07:20 +01:00
modules	[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858 )	2025-12-15 10:47:15 +08:00
multi_gpu	[https://nvbugs/5597647 ][ci] Unwaive fixed tests. (#9812 )	2025-12-12 02:29:30 +08:00
multi_gpu_modeling	[https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. (#8440 )	2025-11-20 12:43:13 -05:00
multimodal	[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604 )	2025-12-15 08:42:30 +08:00
ray_orchestrator	[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793 )	2025-12-13 01:02:11 -08:00
sampler	[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 )	2025-12-15 13:28:54 +08:00
speculative	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 )	2025-12-09 11:06:31 -05:00
thop	[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852 )	2025-12-14 10:47:24 +08:00
helpers.py	[#8733 ][feat] Add Llama4 MoE handling to AutoDeploy (#9556 )	2025-12-04 08:03:33 +02:00
pattern_watcher.py	[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804 )	2025-05-09 11:04:01 +08:00
test_connector.py	[None][feat] KV Cache Connector API (#7228 )	2025-08-28 23:09:27 -04:00