Commit Graph

1538 Commits

Author SHA1 Message Date
yunruis
126cd707e3
[None][opt] Add batch waiting when scheduling (#7416)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-23 10:27:37 +08:00
Chang Liu
998857bcde
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-22 19:07:18 -07:00
Enwei Zhu
8330d5363a
[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 09:10:09 +08:00
xxi
d471655242
[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick (#7712) 2025-09-23 08:41:38 +08:00
Enwei Zhu
59f57598a7
[https://nvbugs/5504086][fix] Fix MTP vanilla (#7904)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 08:38:28 +08:00
ChristinaZ
be576a3152
[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-23 08:24:21 +08:00
Jin Li
b5391b4ac6
[https://nvbugs/5516665][fix] Fix CUTLASS moe fake impl errors (#7714)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-22 11:08:39 -07:00
Linda
b1738c3f18
[https://nvbugs/5477359][fix] Removing test waivers (#7877)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-22 08:59:13 -07:00
Wanli Jiang
2a30f11d63
[None][chore] Upgrade transformers to 4.56.0 (#7523)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-22 22:20:16 +08:00
Emma Qiao
324301ccba
[None][infra] Skip failed test for nvbugs 5532023 (#7905)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 03:49:44 -07:00
Yechan Kim
f77aca9f2c
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-22 03:40:02 -07:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
Emma Qiao
d330d0005c
[None][infra] Waive a failed case on main (#7901)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 00:37:01 -07:00
xinhe-nv
9c1b75e978
[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-22 00:12:43 -07:00
Wanli Jiang
f5bfd68a50 [https://nvbugs/5509024][fix] Print full parsed outputs and update keywords for multimodal model (#7670)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yi Zhang
f9c9c3f50a [https://nvbugs/5355219][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Ivy Zhang
022bc96fb6 [https://nvbugs/5512734][fix] Update kv cache config for maverick (#7710)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
bhsueh_NV
ef557f880b [https://nvbugs/5437405][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yanchao Lu
5c8b022d1e [None][ci] Test waives for the release/1.0 branch 09/15 (#7700)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Simeng Liu
99995846b3 [https://nvbugs/5470782][chore] Remove the skip statement in 1.0 rele… (#7573)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
peaceh-nv
541b7fda89 [https://nvbugs/5503423][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 (#7603)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yan Chunwei
afca2fcbe0 [https://nvbugs/5351244][fix] test_mpi_session (#7501)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yuxian Qiu
2d46dda6a7 [https://nvbugs/5448754][fix] Download HF model for all nodes. (#6824)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Lizhi Zhou
293d9fb612 [https://nvbugs/5448767][fix] disable kv cache reuse for disagg pp>1 tests (#7354)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Stefan Niebler
8aead224fb
[https://nvbugs/5513423][fix] Correctly respect min_tokens in PyTorch Workflow (#7808)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
2025-09-21 22:15:18 -07:00
peaceh-nv
9dc7316b7f
[https://nvbugs/5512556][unwaive] Unwaive DeepSeek PP tests (#7828)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-22 10:26:30 +08:00
dongxuy04
9eb8084ca9
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-21 11:01:51 -07:00
Ziyi Xiong
897c4dd23b
[https://nvbugs/5517404][fix] Use the correct cuda graph for dynamic spec dec (#7728)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-21 08:20:48 +08:00
Yan Chunwei
4509d97780
[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing (#7840)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-20 06:24:22 -07:00
Chang Liu
2e317a7db6
[https://nvbugs/5520490][fix] Fix intermittent test failures by avoiding external web data pulls (#7879)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-19 17:24:13 -07:00
Mike Iovine
8030b540ac
[https://nvbugs/5522462][fix] Fix FP8 scout illegal memory access (#7845)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-19 10:30:37 -04:00
pcastonguay
fbe325ce57
[https://nvbugs/5471108][chore] Unwaiving disagg acc test (#7686)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-19 08:56:09 -04:00
Yuxian Qiu
7d28acdbf0
[https://nvbugs/5522332][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783) (#7797)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 18:50:40 +08:00
Liao Lanyu
18095a7cb8
[https://nvbugs/5503440][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-19 18:13:33 +08:00
xinhe-nv
efb763402f
[None][chore] Add failed cases into waives.txt (#7841)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-19 17:59:47 +08:00
Ivy Zhang
0ac51487f4
[None][chore] remove cli cases for rtx6k (#7833)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:33:59 +08:00
Ivy Zhang
6b33bcced2
[None][test] Add accuracy benchmark in stress test (#7561)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:09:46 +08:00
dominicshanshan
451475e0dc
[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956. (#7853)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-19 14:54:59 +08:00
Emma Qiao
ea079fa530
[None][infra] Waive failed tests in post-merge (#7859)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-19 14:16:12 +08:00
ruodil
c5453103d6
[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-19 11:12:53 +08:00
fredricz-20070104
fc4e6d3702
[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-19 10:12:55 +08:00
Yuxian Qiu
d6ebcf7c4a
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 09:40:49 +08:00
Ziyi Xiong
420f0fbcf5
[https://nvbugs/5522851][fix] Correct the logic to update kv_lens_cuda (#7790)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-19 08:11:29 +08:00
QI JUN
7646da2d85
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-19 07:19:50 +08:00
sunnyqgg
80dd8fe197
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001)
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-18 12:05:36 -04:00
Li Min
d921fc3352
[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-18 21:20:04 +08:00
xinhe-nv
d3a907131a
[https://nvbugs/5519462][fix] Add failed cases into waives.txt (#7817)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 20:01:06 +08:00
Wanli Jiang
fe104dc20d
[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 17:37:16 +08:00
xinhe-nv
d909f80379
[TRTLLM-7250][fix] Add failed cases into waives.txt (#7807)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 17:13:07 +08:00
Wanli Jiang
a7ca0fff54
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 16:26:20 +08:00