Commit Graph

4318 Commits

Author SHA1 Message Date
TensorRT LLM
18f8b22956 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-23 03:10:39 +00:00
fredricz-20070104
621156ad44
[None][chore] Fix GB300 support issues (#10196)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-23 10:42:41 +08:00
Li Min
1e82ff7a0c
[TRTLLM-9989][fix] Fix tvm_ffi aaarch64 issue. (#10199)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-23 10:20:40 +08:00
Yuxian Qiu
696f754ef4
[None][fix] avoid implicit cudaStreamSynchronize in sample_async. (#10120)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-23 10:15:40 +08:00
Tailing Yuan
648196f8ae
[TRTLLM-9432][feat] Reduce synchronization and recompilation for qwen3-next (#9691)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-23 10:14:29 +08:00
Faraz
f05af48bca
[https://nvbugs/5747674][fix] Add contiguous() before view() in load_expert_w3_w1_weight and load (#10136)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-12-22 21:03:34 -05:00
Fanrong Li
0d2500c631
[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser (#10126)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-23 08:46:47 +08:00
Grzegorz Kwasniewski
ccc64da287
[TRTLLM-9847][fix] WAR fix hanging fused allreduce. (#10087)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 00:03:32 +01:00
tcherckez-nvidia
12e1cb8d7e
[#9717][chore] Refactor MoE code to use enums (#9910)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-22 15:14:56 -05:00
JunyiXu-nv
aaa87abf41
[TRTLLM-7906][feat] Support multiple post process for Responses API (#9908)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-22 11:33:34 -05:00
Emma Qiao
ba14a9308e
[None][infra] Waive failed cases on 12/22 (#10200)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-23 00:05:45 +08:00
Pengyun Lin
0f308e95f9
[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend (#9911)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-22 21:37:22 +08:00
William Zhang
a6a88985cf
[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758)
* Why?

Certain VLMs like the Qwen family need more than just the multimodal
embeddings in the language model, and need MRoPE position IDs and
deltas. Prior to this commit, only the embeddings could be communicated
from the encoder worker to the prefill worker.

* What?

This commit extends the `DisaggregatedParams` to include the MRoPE
information. It also adjusts several pieces of code required to
communicate that between E, P and D workers.

Closes TRTLLM-9409.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-22 06:32:49 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. (#9822)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Yan Chunwei
ea6cd76c55
[None][refactor] simplify get_stats and get_kvcache_events with rpc (#9980)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-22 18:23:43 +08:00
Perkz Zheng
c87f1a6b39
[https://nvbugs/5503479][fix] update trtllm-gen kernels to address few bugs (#10089)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
shuyixiong
9e9523c3cc
[https://nvbugs/5762016][chore] Skip a ray test (#10194)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-22 17:06:19 +08:00
JadoTu
7421224d69
[None][fix] NVFP4 linear method's weight and weight_scale padding (#10148)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-22 15:00:31 +08:00
xinhe-nv
d30ee8101e
[None][chore] Remove closed bugs (#10182)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-22 01:58:17 -05:00
Yuxian Qiu
237fd0eae4
[https://nvbugs/5666821][chore] unwaive tests. (#9958)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-22 11:39:45 +08:00
TensorRT LLM
f8501f3cc8 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-22 03:08:12 +00:00
Fanrong Li
f0bd60a395
[https://nvbugs/5684820][fix] fix the detokenizer issue for DeepSeek-v3.2 (#10106)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-22 10:56:33 +08:00
Jin Li
066b653940
[TRTLLM-9880][feat] Include torch compile tests in QA test list (#10149)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-22 10:37:09 +08:00
Yuxian Qiu
2f139ee07e
[https://nvbugs/5701445][chore] unwaive test. (#9949)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-22 10:12:54 +08:00
Chuang Zhu
914dd39127
[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test (#9735)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-22 09:29:24 +08:00
dominicshanshan
d274a4c5d3
[https://nvbugs/5701457][fix] Unwaive ray test. (#10175)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-22 09:25:58 +08:00
Enwei Zhu
5549067966
[None][ci] Waive GPTOSS test case (#10155)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-22 08:50:44 +08:00
Balaram Buddharaju
5266475014
[None][feat] Cudagraph updates for helix parallelism (#10141)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 15:21:52 -05:00
shuyixiong
4fc6036276
[https://nvbugs/5702793][fix] Fix view operation on uncontiguous tensor (#10147)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-21 11:47:20 -05:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 (#9971)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00
Kaiyu Xie
5a611cb8f5
[None] [feat] Enhancements to slurm scripts (#10112)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-21 10:24:56 -05:00
Emma Qiao
aa5dbb7ca5
[None][infra] Waive failed tests for main branch on 12/21 (#10184)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-21 22:23:46 +08:00
xxi
5ae154022a
[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… (#10067)
Signed-off-by: xxi <xxi@nvidia.com>
2025-12-21 08:14:50 -05:00
Eran Geva
b15f987972
[None][chore] removed duplicated test from l0_b200.yml (#10090)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-21 11:34:01 +02:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. (#9821)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Balaram Buddharaju
dcd3f7b5ea
[https://nvbugs/5744427][fix] Fix accuracy test OOM (#10173)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 02:03:38 -05:00
TensorRT LLM
6c76148b56 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-21 03:08:20 +00:00
Bo Li
77e37d9dd0
[https://nvbugs/5753250][infra] Further waive all tests in _test_openai_responses.py (#10176)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-20 10:25:14 -05:00
Enwei Zhu
2ce785f39a
[https://nvbugs/5643631][fix] Fix hostfunc seg fault (#10028)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 07:58:43 -05:00
Enwei Zhu
21a93fbf9d
[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset (#10043)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 03:12:41 -05:00
TensorRT LLM
3f25db9d3e [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-20 03:07:30 +00:00
Yuxian Qiu
3b3069b390
[https://nvbugs/5747930][fix] Use offline tokenizer for whisper models. (#10121)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:42:07 +08:00
Yuxian Qiu
e75331480f
[None][fix] fix draft_lengths for CUDA graph capture. (#10004)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:04:48 +08:00
Anish Shanbhag
7c82605327
[None][fix] enable KV cache reuse for config database (#10094) 2025-12-19 15:16:56 -08:00
Balaram Buddharaju
bee9051484
[None][chore] Waive timing out pre-merge test (#10167)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-19 17:56:33 -05:00
Gal Hubara-Agam
20b69a982a
[#10056][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 (#10131)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-19 13:28:42 -08:00
Chang Liu
5489d188a4
[None][fix] Revert the change and remove device count guard for DSv32 (#9631)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 15:00:55 -05:00
longcheng-nv
b882393d69
[https://nvbugs/5720357][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case (#10027)
Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 14:58:01 -05:00
Venky
dfa11d810e
[TRTC-102][docs] --extra_llm_api_options->--config in docs/examples/tests (#10005) 2025-12-19 13:48:43 -05:00
JunyiXu-nv
7b71ff6b8a
[https://nvbugs/5722653][fix] Unwaive fixed test (#10157)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-19 11:19:20 -05:00