Zheyu Fu
9b33ea751b
Merge branch 'main' into fix_spec_gate
2025-12-22 12:22:51 -08:00
tcherckez-nvidia
12e1cb8d7e
[ #9717 ][chore] Refactor MoE code to use enums ( #9910 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-22 15:14:56 -05:00
JunyiXu-nv
aaa87abf41
[TRTLLM-7906][feat] Support multiple post process for Responses API ( #9908 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-22 11:33:34 -05:00
Emma Qiao
ba14a9308e
[None][infra] Waive failed cases on 12/22 ( #10200 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-23 00:05:45 +08:00
Pengyun Lin
0f308e95f9
[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend ( #9911 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-22 21:37:22 +08:00
William Zhang
a6a88985cf
[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg ( #9758 )
...
* Why?
Certain VLMs like the Qwen family need more than just the multimodal
embeddings in the language model, and need MRoPE position IDs and
deltas. Prior to this commit, only the embeddings could be communicated
from the encoder worker to the prefill worker.
* What?
This commit extends the `DisaggregatedParams` to include the MRoPE
information. It also adjusts several pieces of code required to
communicate that between E, P and D workers.
Closes TRTLLM-9409.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-22 06:32:49 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. ( #9822 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Yan Chunwei
ea6cd76c55
[None][refactor] simplify get_stats and get_kvcache_events with rpc ( #9980 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-22 18:23:43 +08:00
Zheyu Fu
0d1cd5f4d2
Merge branch 'main' into fix_spec_gate
2025-12-22 01:50:15 -08:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
shuyixiong
9e9523c3cc
[ https://nvbugs/5762016 ][chore] Skip a ray test ( #10194 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-22 17:06:19 +08:00
Zheyu Fu
6ea3cb59fc
Merge branch 'main' into fix_spec_gate
2025-12-21 23:52:35 -08:00
JadoTu
7421224d69
[None][fix] NVFP4 linear method's weight and weight_scale padding ( #10148 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-22 15:00:31 +08:00
xinhe-nv
d30ee8101e
[None][chore] Remove closed bugs ( #10182 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-22 01:58:17 -05:00
Zheyu Fu
402c7fb6fd
Merge branch 'main' into fix_spec_gate
2025-12-21 19:43:42 -08:00
Yuxian Qiu
237fd0eae4
[ https://nvbugs/5666821 ][chore] unwaive tests. ( #9958 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-22 11:39:45 +08:00
Zheyu Fu
b51ee2bb0d
Merge branch 'main' into fix_spec_gate
...
Signed-off-by: Zheyu Fu <zheyuf@nvidia.com>
2025-12-21 19:38:26 -08:00
TensorRT LLM
f8501f3cc8
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-22 03:08:12 +00:00
Fanrong Li
f0bd60a395
[ https://nvbugs/5684820 ][fix] fix the detokenizer issue for DeepSeek-v3.2 ( #10106 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-22 10:56:33 +08:00
Jin Li
066b653940
[TRTLLM-9880][feat] Include torch compile tests in QA test list ( #10149 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-22 10:37:09 +08:00
Yuxian Qiu
2f139ee07e
[ https://nvbugs/5701445 ][chore] unwaive test. ( #9949 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-22 10:12:54 +08:00
Chuang Zhu
914dd39127
[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test ( #9735 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-22 09:29:24 +08:00
dominicshanshan
d274a4c5d3
[ https://nvbugs/5701457 ][fix] Unwaive ray test. ( #10175 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-22 09:25:58 +08:00
Enwei Zhu
5549067966
[None][ci] Waive GPTOSS test case ( #10155 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-22 08:50:44 +08:00
Balaram Buddharaju
5266475014
[None][feat] Cudagraph updates for helix parallelism ( #10141 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 15:21:52 -05:00
shuyixiong
4fc6036276
[ https://nvbugs/5702793 ][fix] Fix view operation on uncontiguous tensor ( #10147 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-21 11:47:20 -05:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 ( #9971 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00
Kaiyu Xie
5a611cb8f5
[None] [feat] Enhancements to slurm scripts ( #10112 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-21 10:24:56 -05:00
Emma Qiao
aa5dbb7ca5
[None][infra] Waive failed tests for main branch on 12/21 ( #10184 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-21 22:23:46 +08:00
xxi
5ae154022a
[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… ( #10067 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-12-21 08:14:50 -05:00
Eran Geva
b15f987972
[None][chore] removed duplicated test from l0_b200.yml ( #10090 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-21 11:34:01 +02:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Zheyu Fu
9dcea00d91
Merge branch 'main' into fix_spec_gate
...
Signed-off-by: Zheyu Fu <zheyuf@nvidia.com>
2025-12-20 23:16:16 -08:00
Balaram Buddharaju
dcd3f7b5ea
[ https://nvbugs/5744427 ][fix] Fix accuracy test OOM ( #10173 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 02:03:38 -05:00
TensorRT LLM
6c76148b56
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-21 03:08:20 +00:00
Zheyu Fu
9ce84b8f3d
Merge branch 'main' into fix_spec_gate
2025-12-20 15:39:49 -08:00
Bo Li
77e37d9dd0
[ https://nvbugs/5753250 ][infra] Further waive all tests in _test_openai_responses.py ( #10176 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-20 10:25:14 -05:00
Enwei Zhu
2ce785f39a
[ https://nvbugs/5643631 ][fix] Fix hostfunc seg fault ( #10028 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 07:58:43 -05:00
Enwei Zhu
21a93fbf9d
[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset ( #10043 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 03:12:41 -05:00
Zheyu Fu
4e13ea6a10
Merge branch 'main' into fix_spec_gate
2025-12-19 23:24:45 -08:00
TensorRT LLM
3f25db9d3e
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-20 03:07:30 +00:00
Yuxian Qiu
3b3069b390
[ https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. ( #10121 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:42:07 +08:00
Zheyu Fu
e561c467c3
Fix conflict
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-12-20 01:26:14 +00:00
Zheyu Fu
ec7d5ef574
Merge branch 'main' into fix_spec_gate
...
Signed-off-by: Zheyu Fu <zheyuf@nvidia.com>
2025-12-19 17:24:31 -08:00
Zheyu Fu
972572d621
Fix conflict
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-12-20 01:23:13 +00:00
Zheyu Fu
ab45d6a7c7
Waive dynamic spec decode unit test
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-12-20 01:20:06 +00:00
Yuxian Qiu
e75331480f
[None][fix] fix draft_lengths for CUDA graph capture. ( #10004 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:04:48 +08:00
Anish Shanbhag
7c82605327
[None][fix] enable KV cache reuse for config database ( #10094 )
2025-12-19 15:16:56 -08:00
Balaram Buddharaju
bee9051484
[None][chore] Waive timing out pre-merge test ( #10167 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-19 17:56:33 -05:00
Gal Hubara-Agam
20b69a982a
[ #10056 ][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 ( #10131 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-19 13:28:42 -08:00