TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
xinhe-nv	286ace22ed	[None][chore] Add failed cases into waives.txt (#9242 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-18 19:27:55 -08:00
Ivy Zhang	782dfca7e8	[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error (#9172 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 18:26:32 -08:00
Patrice Castonguay	9b0f45298f	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-18 20:59:17 -05:00
xinhe-nv	35658eab55	[None][chore] Add failed cases into waives.txt (#9193 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-18 17:47:55 -08:00
Enwei Zhu	7c4777a571	[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-18 17:40:12 -08:00
Lizhi Zhou	c789000a62	[https://nvbugs/5649010 ][fix] increase status-checking interval to avoid instability (#9203 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-19 08:55:42 +08:00
Bo Deng	34f845bf69	[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-11-18 14:46:20 -08:00
Ajinkya Rasane	8d7cda2318	[None][chore] Update the Flux autodeploy example (#8434 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-11-18 14:16:04 -08:00
Ziyi Xiong	7c4344b92e	[https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-18 15:41:56 -05:00
mpikulski	04fb481da3	[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-18 09:41:59 -08:00
Kaiyu Xie	d076aa44d3	[None] [tests] Unwaive wide ep related tests (#9204 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-18 08:54:46 -08:00
Zheyu Fu	c4e02d7f04	[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-11-18 11:13:39 -05:00
Ivy Zhang	160b361588	[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check (#9088 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 05:55:00 -08:00
Ivy Zhang	ca41a71f92	[TRTLLM-8948][test] Add long bench case (#9165 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 04:41:48 -08:00
Tri Dao	fc088e642c	[None][feat] Support Glm4MoeForCausalLM (#8256 ) Signed-off-by: Tri Dao <daominhtri0503@gmail.com> Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-11-18 09:43:21 +08:00
QI JUN	c3376fa114	[None][ci] split speculative test case into several small cases (#9209 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-17 17:02:25 -08:00
Robin Kobus	df41f220a2	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-17 18:07:13 +01:00
Kaiyu Xie	04be5a704e	[None] [fix] Fix missing ActivationType issue (#9171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-17 10:43:25 +08:00
Anthony Chang	86cfb3ea7e	[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-11-17 10:04:29 +08:00
Emma Qiao	d16b1a84c5	[None][infra] Waive a failed case in pre-merge stage 11/16 (#9192 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-17 09:36:56 +08:00
sunnyqgg	7862b15a65	[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-11-17 09:01:53 +08:00
Emma Qiao	2854f0cf3d	[None][infra] Waive failed tests for main branch 11/15 (#9187 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-16 01:48:25 -08:00
brb-nv	63237494db	[None][chore] Waive failing tests blocking pre-merge (#9189 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-16 01:06:03 -08:00
Erin	fe69243157	[None][chore] Add placement test for ray executor (#9122 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-11-14 23:10:59 -08:00
Chang Liu	bed4e95e9f	[https://nvbugs/5629887 ][fix] Add missing device count guard for DSv32 multiGPU tests (#9159 )	2025-11-14 07:52:23 -08:00
xinhe-nv	49b7e6301a	[None][chore] Add failed cases into waives.txt (#9156 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-14 06:28:22 -08:00
mpikulski	80bf840e69	[TRTLLM-9295][fix] unflake test_overlap_scheduler.py::test_overlap_scheduler_consis… (#9146 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-14 11:36:22 +01:00
yuanjingx87	d72321a32e	[None][ci] Waive unittest/_torch/sampler/test_torch_sampler.py::TestBatchedSampling (#9161 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-11-14 01:49:26 -08:00
Suyog Gupta	d12cb9436d	[None][feat] Autodeploy add triton configs and optimize mamba prefill (#9083 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-13 19:15:43 -08:00
QI JUN	3c950910a0	[None][ci] waive test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0] (#9162 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-13 18:56:37 -08:00
heyuhhh	f07e9977c6	[None] [feat] Use triton kernels for RocketKV prediction module (#8682 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-11-13 18:51:09 -08:00
Tailing Yuan	cc4c980e03	[None][feat] Add Qwen3-Next to layer-wise benchmarks (#9065 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-11-14 10:03:00 +08:00
JunyiXu-nv	fdb0787e85	[None][chore] Support json_schema in response_format (#8934 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-14 09:43:13 +08:00
Erin	44d1c75701	[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server (#8765 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-11-13 17:21:24 -08:00
Neta Zmora	34dc6869f3	[#8732 ][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011 ) Update TRTLLM Cutlass MoE kernels with ReLU2 activation. Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function. The PR adds this and adds an API to set the activation function, in general. The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954. The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-13 16:54:45 -08:00
dongxuy04	a370643b26	[None][fix] support topk autotuner input for expert slot per group larger than 32 (#9087 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-11-14 08:37:20 +08:00
Frida Hou	e96a3d294d	[None][autodeploy] minor refactor to rmsnorm transforms (#8657 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-13 13:13:58 -08:00
Ziyi Xiong	a7aaf50541	[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding (#8706 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-13 10:20:16 -05:00
William Zhang	121140cfec	[None][fixes] Add tool call parsing fixes and Qwen3 coder parser (#8817 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-11-13 04:34:38 -08:00
Lizhi Zhou	48a27c7bef	[https://nvbugs/5633340 ][chore] unwaive test_auto_scaling.py::test_disagg_server_restart (#9131 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-13 01:45:36 -08:00
Emma Qiao	d0ea417ec8	[None][infra] Waive failed tests for main 11/13 (#9132 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-13 01:00:40 -08:00
xinhe-nv	548f5ce4bc	[None][fix] waive failed tests (#9090 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-12 23:40:00 -08:00
xinhe-nv	8fa3c55c76	[None][chore] Remove closed bugs (#9114 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-12 22:49:37 -08:00
ruodil	c86e36fe38	[None][test] add deepseek and qwen cases for rtx series (#8839 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-11-12 22:28:02 -08:00
HuiGao-NV	cde18c12da	[https://nvbugs/5640873 ][fix] Move thop tests to pre-merge (#9094 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-13 13:08:13 +08:00
Zhang Ge	49df731b96	[#6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels (#6917 ) Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-13 12:14:58 +08:00
Yan Chunwei	4fd93bdc2c	[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming (#9118 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-12 19:55:09 -08:00
Yan Chunwei	8a8883bc73	[None][chore] Waive test_llm_rpc_streaming (#9113 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-13 11:06:26 +08:00
Zhenhuan Chen	943b05e2d3	[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-13 10:34:17 +08:00
QI JUN	3416efbc29	[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill (#9111 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-13 10:06:32 +08:00

1 2 3 4 5 ...

2012 Commits