TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-12 05:53:33 +08:00

Author	SHA1	Message	Date
Zheyu Fu	9b33ea751b	Merge branch 'main' into fix_spec_gate	2025-12-22 12:22:51 -08:00
tcherckez-nvidia	12e1cb8d7e	[#9717 ][chore] Refactor MoE code to use enums (#9910 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-22 15:14:56 -05:00
JunyiXu-nv	aaa87abf41	[TRTLLM-7906][feat] Support multiple post process for Responses API (#9908 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-22 11:33:34 -05:00
Emma Qiao	ba14a9308e	[None][infra] Waive failed cases on 12/22 (#10200 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 00:05:45 +08:00
Pengyun Lin	0f308e95f9	[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend (#9911 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-22 21:37:22 +08:00
William Zhang	a6a88985cf	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-22 06:32:49 -05:00
Bo Li	472fe497dc	[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. (#9822 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-22 05:57:12 -05:00
Yan Chunwei	ea6cd76c55	[None][refactor] simplify get_stats and get_kvcache_events with rpc (#9980 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-22 18:23:43 +08:00
Zheyu Fu	0d1cd5f4d2	Merge branch 'main' into fix_spec_gate	2025-12-22 01:50:15 -08:00
Perkz Zheng	c87f1a6b39	[https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs (#10089 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-22 04:45:33 -05:00
shuyixiong	9e9523c3cc	[https://nvbugs/5762016 ][chore] Skip a ray test (#10194 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-22 17:06:19 +08:00
Zheyu Fu	6ea3cb59fc	Merge branch 'main' into fix_spec_gate	2025-12-21 23:52:35 -08:00
JadoTu	7421224d69	[None][fix] NVFP4 linear method's weight and weight_scale padding (#10148 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-22 15:00:31 +08:00
xinhe-nv	d30ee8101e	[None][chore] Remove closed bugs (#10182 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-22 01:58:17 -05:00
Zheyu Fu	402c7fb6fd	Merge branch 'main' into fix_spec_gate	2025-12-21 19:43:42 -08:00
Yuxian Qiu	237fd0eae4	[https://nvbugs/5666821 ][chore] unwaive tests. (#9958 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-22 11:39:45 +08:00
Zheyu Fu	b51ee2bb0d	Merge branch 'main' into fix_spec_gate Signed-off-by: Zheyu Fu <zheyuf@nvidia.com>	2025-12-21 19:38:26 -08:00
TensorRT LLM	f8501f3cc8	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-22 03:08:12 +00:00
Fanrong Li	f0bd60a395	[https://nvbugs/5684820 ][fix] fix the detokenizer issue for DeepSeek-v3.2 (#10106 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-22 10:56:33 +08:00
Jin Li	066b653940	[TRTLLM-9880][feat] Include torch compile tests in QA test list (#10149 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-22 10:37:09 +08:00
Yuxian Qiu	2f139ee07e	[https://nvbugs/5701445 ][chore] unwaive test. (#9949 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-22 10:12:54 +08:00
Chuang Zhu	914dd39127	[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test (#9735 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-22 09:29:24 +08:00
dominicshanshan	d274a4c5d3	[https://nvbugs/5701457 ][fix] Unwaive ray test. (#10175 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-22 09:25:58 +08:00
Enwei Zhu	5549067966	[None][ci] Waive GPTOSS test case (#10155 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-22 08:50:44 +08:00
Balaram Buddharaju	5266475014	[None][feat] Cudagraph updates for helix parallelism (#10141 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 15:21:52 -05:00
shuyixiong	4fc6036276	[https://nvbugs/5702793 ][fix] Fix view operation on uncontiguous tensor (#10147 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-21 11:47:20 -05:00
bhsueh_NV	cd4b4f43fa	[None][feat] Support Eagle3 on Mistral Large3 (#9971 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-21 10:25:45 -05:00
Kaiyu Xie	5a611cb8f5	[None] [feat] Enhancements to slurm scripts (#10112 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-21 10:24:56 -05:00
Emma Qiao	aa5dbb7ca5	[None][infra] Waive failed tests for main branch on 12/21 (#10184 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-21 22:23:46 +08:00
xxi	5ae154022a	[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… (#10067 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-12-21 08:14:50 -05:00
Eran Geva	b15f987972	[None][chore] removed duplicated test from l0_b200.yml (#10090 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-21 11:34:01 +02:00
Bo Li	a66eeab537	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-12-21 02:52:42 -05:00
Zheyu Fu	9dcea00d91	Merge branch 'main' into fix_spec_gate Signed-off-by: Zheyu Fu <zheyuf@nvidia.com>	2025-12-20 23:16:16 -08:00
Balaram Buddharaju	dcd3f7b5ea	[https://nvbugs/5744427 ][fix] Fix accuracy test OOM (#10173 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 02:03:38 -05:00
TensorRT LLM	6c76148b56	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-21 03:08:20 +00:00
Zheyu Fu	9ce84b8f3d	Merge branch 'main' into fix_spec_gate	2025-12-20 15:39:49 -08:00
Bo Li	77e37d9dd0	[https://nvbugs/5753250 ][infra] Further waive all tests in _test_openai_responses.py (#10176 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-20 10:25:14 -05:00
Enwei Zhu	2ce785f39a	[https://nvbugs/5643631 ][fix] Fix hostfunc seg fault (#10028 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-20 07:58:43 -05:00
Enwei Zhu	21a93fbf9d	[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset (#10043 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-20 03:12:41 -05:00
Zheyu Fu	4e13ea6a10	Merge branch 'main' into fix_spec_gate	2025-12-19 23:24:45 -08:00
TensorRT LLM	3f25db9d3e	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-20 03:07:30 +00:00
Yuxian Qiu	3b3069b390	[https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. (#10121 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-20 09:42:07 +08:00
Zheyu Fu	e561c467c3	Fix conflict Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-12-20 01:26:14 +00:00
Zheyu Fu	ec7d5ef574	Merge branch 'main' into fix_spec_gate Signed-off-by: Zheyu Fu <zheyuf@nvidia.com>	2025-12-19 17:24:31 -08:00
Zheyu Fu	972572d621	Fix conflict Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-12-20 01:23:13 +00:00
Zheyu Fu	ab45d6a7c7	Waive dynamic spec decode unit test Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-12-20 01:20:06 +00:00
Yuxian Qiu	e75331480f	[None][fix] fix draft_lengths for CUDA graph capture. (#10004 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-20 09:04:48 +08:00
Anish Shanbhag	7c82605327	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
Balaram Buddharaju	bee9051484	[None][chore] Waive timing out pre-merge test (#10167 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-19 17:56:33 -05:00
Gal Hubara-Agam	20b69a982a	[#10056 ][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 (#10131 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-19 13:28:42 -08:00

1 2 3 4 5 ...

4325 Commits