TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
Jhao-Ting Chen	92d90fa29a	[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS (#10018 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski	0027a01ad5	[https://nvbugs/5680312 ][fix] Updated test waiving (#9630 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-23 09:38:12 -08:00
Grzegorz Kwasniewski	06900a7f19	[TRTLLM-9565][fix] Fix deepseek sharding (#9984 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-23 10:28:14 -05:00
Emma Qiao	984c20e0b2	[None][infra] Waive failed cases on 12/23 (#10236 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-23 08:48:54 -05:00
dongfengy	e284d0bf80	[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests (#10209 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 07:43:13 -05:00
tcherckez-nvidia	64bb1a5155	[None][chore] Update AD coverage to use torch-cudagraph (#10233 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-23 07:20:32 -05:00
Roey Azran	8408c40d8b	[https://nvbugs/5702786 ][fix] Fix race conditions in KV cache communication during unexpected termination (#10076 ) Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>	2025-12-23 14:09:51 +02:00
Xianjie Qiao	871c6b435c	[None] [feat] skip batch_tokenize_prompts in CustomDataset (#10214 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-12-23 17:40:57 +08:00
Yukun He	522f1d2bc3	[https://nvbugs/5764627 ][chore] waive the time-out test (#10222 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-23 16:36:06 +08:00
Balaram Buddharaju	f2e00a75de	[None][chore] Remove helix test from rtx test list (#10224 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 03:07:37 -05:00
Shiyu Li	3ddc9d2b48	[https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. (#10062 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-12-23 16:07:07 +08:00
chenfeiz0326	48c875f8ea	[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test (#9990 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-23 16:02:38 +08:00
Bo Li	cc1323be24	[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. (#10197 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-23 02:13:37 -05:00
Yiqing Yan	59b05dc0a8	[None][chore] Bump version to 1.2.0rc7 (#10216 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-12-23 15:07:47 +08:00
Chuang Zhu	53db3b2612	[https://nvbugs/5741884 ][fix] unwaive disagg sampler (#10189 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-23 14:38:07 +08:00
xinhe-nv	77b591f73b	[None][chore] Add failed cases into waives.txt (#10177 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <lijie@nvidia.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-12-23 13:43:50 +08:00
Harshini Komali	d691371eaf	[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf (#9310 ) Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-23 13:25:55 +08:00
Pamela Peng	5bc7ffe379	[None][test] Add qa tests for RTX 6K (#10210 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-12-22 22:47:09 -05:00
TensorRT LLM	18f8b22956	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-23 03:10:39 +00:00
fredricz-20070104	621156ad44	[None][chore] Fix GB300 support issues (#10196 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-23 10:42:41 +08:00
Li Min	1e82ff7a0c	[TRTLLM-9989][fix] Fix tvm_ffi aaarch64 issue. (#10199 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-23 10:20:40 +08:00
Yuxian Qiu	696f754ef4	[None][fix] avoid implicit cudaStreamSynchronize in sample_async. (#10120 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-23 10:15:40 +08:00
Tailing Yuan	648196f8ae	[TRTLLM-9432][feat] Reduce synchronization and recompilation for qwen3-next (#9691 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-12-23 10:14:29 +08:00
Faraz	f05af48bca	[https://nvbugs/5747674 ][fix] Add contiguous() before view() in load_expert_w3_w1_weight and load (#10136 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-12-22 21:03:34 -05:00
Fanrong Li	0d2500c631	[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser (#10126 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-23 08:46:47 +08:00
Grzegorz Kwasniewski	ccc64da287	[TRTLLM-9847][fix] WAR fix hanging fused allreduce. (#10087 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-12-23 00:03:32 +01:00
tcherckez-nvidia	12e1cb8d7e	[#9717 ][chore] Refactor MoE code to use enums (#9910 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-22 15:14:56 -05:00
JunyiXu-nv	aaa87abf41	[TRTLLM-7906][feat] Support multiple post process for Responses API (#9908 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-22 11:33:34 -05:00
Emma Qiao	ba14a9308e	[None][infra] Waive failed cases on 12/22 (#10200 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-23 00:05:45 +08:00
Pengyun Lin	0f308e95f9	[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend (#9911 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-22 21:37:22 +08:00
William Zhang	a6a88985cf	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-22 06:32:49 -05:00
Bo Li	472fe497dc	[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. (#9822 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-22 05:57:12 -05:00
Yan Chunwei	ea6cd76c55	[None][refactor] simplify get_stats and get_kvcache_events with rpc (#9980 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-22 18:23:43 +08:00
Perkz Zheng	c87f1a6b39	[https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs (#10089 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-22 04:45:33 -05:00
shuyixiong	9e9523c3cc	[https://nvbugs/5762016 ][chore] Skip a ray test (#10194 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-22 17:06:19 +08:00
JadoTu	7421224d69	[None][fix] NVFP4 linear method's weight and weight_scale padding (#10148 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-22 15:00:31 +08:00
xinhe-nv	d30ee8101e	[None][chore] Remove closed bugs (#10182 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-22 01:58:17 -05:00
Yuxian Qiu	237fd0eae4	[https://nvbugs/5666821 ][chore] unwaive tests. (#9958 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-22 11:39:45 +08:00
TensorRT LLM	f8501f3cc8	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-22 03:08:12 +00:00
Fanrong Li	f0bd60a395	[https://nvbugs/5684820 ][fix] fix the detokenizer issue for DeepSeek-v3.2 (#10106 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-22 10:56:33 +08:00
Jin Li	066b653940	[TRTLLM-9880][feat] Include torch compile tests in QA test list (#10149 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-22 10:37:09 +08:00
Yuxian Qiu	2f139ee07e	[https://nvbugs/5701445 ][chore] unwaive test. (#9949 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-22 10:12:54 +08:00
Chuang Zhu	914dd39127	[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test (#9735 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-12-22 09:29:24 +08:00
dominicshanshan	d274a4c5d3	[https://nvbugs/5701457 ][fix] Unwaive ray test. (#10175 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-22 09:25:58 +08:00
Enwei Zhu	5549067966	[None][ci] Waive GPTOSS test case (#10155 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-22 08:50:44 +08:00
Balaram Buddharaju	5266475014	[None][feat] Cudagraph updates for helix parallelism (#10141 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 15:21:52 -05:00
shuyixiong	4fc6036276	[https://nvbugs/5702793 ][fix] Fix view operation on uncontiguous tensor (#10147 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-21 11:47:20 -05:00
bhsueh_NV	cd4b4f43fa	[None][feat] Support Eagle3 on Mistral Large3 (#9971 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-21 10:25:45 -05:00
Kaiyu Xie	5a611cb8f5	[None] [feat] Enhancements to slurm scripts (#10112 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-21 10:24:56 -05:00
Emma Qiao	aa5dbb7ca5	[None][infra] Waive failed tests for main branch on 12/21 (#10184 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-21 22:23:46 +08:00

1 2 3 4 5 ...

4336 Commits