xinhe-nv
|
fe3d607c4b
|
[TRTQA-2920][fix] Add failed cases into waives.txt (#6581)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-05 12:41:23 +10:00 |
|
Enwei Zhu
|
899b74c357
|
[None][doc] Fix blog4 typo (#6612)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-05 10:20:37 +08:00 |
|
kris1025
|
6a3a921284
|
[TRTLLM-6685][feat] Add speculative metrics for trt llm bench (#6476)
Signed-off-by: linquanh <linquanh@nvidia.com>
|
2025-08-04 15:22:57 -07:00 |
|
brb-nv
|
6135f75f87
|
[None][chore] Update Gemma3 closeness check to mitigate flakiness (#6591)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-04 10:10:58 -04:00 |
|
Olya Kozlova
|
13cc1c4878
|
[TRTLLM-5271][feat] best_of/n for pytorch workflow (#5997)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
2025-08-04 14:08:06 +02:00 |
|
Ivy Zhang
|
f3651adea8
|
[None][test] update invalid test name (#6596)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-04 08:01:05 -04:00 |
|
Emma Qiao
|
5d8a5a0cb8
|
[None][Infra]Waive failed case in post-merge on main (#6602)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-08-04 19:39:44 +08:00 |
|
Yiteng Niu
|
a4e518de51
|
[TRTLLM-6364] [fix] Update PR title regex to allow optional spaces between ticket and type (#6598)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
|
2025-08-04 18:34:25 +08:00 |
|
brb-nv
|
87e4e9f468
|
[None][chore] Add unit test for Gemma3 lora (#6560)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-04 04:56:57 -04:00 |
|
Yiqing Yan
|
3916dbd98b
|
[None][chore] Bump version to 1.0.0rc6 (#6597)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-04 04:39:15 -04:00 |
|
Pengyun Lin
|
a15e33351d
|
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens (#6259)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-08-04 15:09:51 +08:00 |
|
Bruce-Lee-LY
|
8c82ee2803
|
[fix] xqa precision for fp16/bf16 kv cache (#6573)
Signed-off-by: Bruce-Lee-LY <yong-li14@tsinghua.org.cn>
Co-authored-by: Bruce-Lee-LY <yong-li14@tsinghua.org.cn>
|
2025-08-04 14:34:20 +08:00 |
|
xinhe-nv
|
a54972e463
|
[None][fix] remove closed bugs (#6576)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-04 15:52:11 +10:00 |
|
Yuan Tong
|
a2f271c8e0
|
[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-08-04 13:51:01 +08:00 |
|
Leslie Fang
|
b9fe0fa7ec
|
[None][infra] Enable test of chunked prefill with logit post processor (#6483)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-04 01:46:07 -04:00 |
|
Leslie Fang
|
a60190836c
|
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-04 01:45:24 -04:00 |
|
Yiqing Yan
|
4763e94156
|
[TRTLLM-5563][infra] Move test_rerun.py to script folder (#6571)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-04 13:26:04 +08:00 |
|
ruodil
|
6459725bf9
|
test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) (#6533)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-04 15:22:39 +10:00 |
|
Zhenhua Wang
|
59d91b8b94
|
[None][chore] add online help to build_wheel.py and fix a doc link (#6391)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
|
2025-08-04 13:14:55 +08:00 |
|
Yiteng Niu
|
2279cec4ce
|
[https://nvbugs/5430932][infra] update namelist (#6585)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
|
2025-08-04 11:51:08 +08:00 |
|
Yiteng Niu
|
7bf0a48899
|
[None][infra] update namelist (#6465)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
|
2025-08-04 11:32:33 +08:00 |
|
Zac Patel
|
18d1941083
|
[doc] Update perf_overview.md for release 0.21 (#6270)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
Perkz Zheng
|
03430ed379
|
[https://nvbugspro.nvidia.com/bug/5415268] fix illegal smem access with chunked attention (#6401)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
QI JUN
|
5913282e17
|
doc: update release notes (#6438)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
Ivy Zhang
|
5eefdf2c75
|
tests: Add llama4 functional cases (#6392)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
QI JUN
|
e1eca33dfc
|
doc: update release notes (#6324)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
QI JUN
|
3f47117870
|
doc: update known issues (#6247)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
ruodil
|
8d82ccca63
|
test: modify max_lora_rank of phi4_multimodal to 320 (#6474)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-04 12:20:22 +10:00 |
|
Yechan Kim
|
ee6ab5be96
|
chore: add EXAONE4 accuracy test (#6397)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-04 10:14:16 +08:00 |
|
Jinyang Yuan
|
df90202b51
|
[fix] Fix DeepSeek w4a8 weight loading (#6498)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-08-04 10:12:06 +08:00 |
|
Ivy Zhang
|
7547a7d0a2
|
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-03 22:11:26 -04:00 |
|
Jhao-Ting Chen
|
6edaa23c1c
|
[None][feat] Multi-block mode for Hopper spec dec XQA kernel (#4416)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-08-03 14:31:33 -07:00 |
|
Chuang Zhu
|
542f552d0b
|
use cudaSetDevice to create context ,fix nvbug 5394497 (#6403)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-08-03 13:32:55 -04:00 |
|
Yiqing Yan
|
3f7abf87bc
|
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-03 11:18:59 +08:00 |
|
Jhao-Ting Chen
|
4da5cfc511
|
[None][infra] add eagle3 one model accuracy tests (#6264)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-08-02 16:07:46 -07:00 |
|
Robin Kobus
|
918fedf952
|
[None][refactor] Simplify finish reasons handling in DecoderState (#6524)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-08-02 07:17:43 +02:00 |
|
Shunkangz
|
67a3fd858b
|
[None][feat] Add support of scheduling attention dp request (#6246)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-08-01 20:38:01 -04:00 |
|
Richard Huo
|
31802de0b0
|
[None][fix] Serialize the window_size in the kv event (#6526)
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
|
2025-08-01 15:25:18 -07:00 |
|
Lizhi Zhou
|
6f34f3489b
|
[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-08-01 13:33:34 -04:00 |
|
xinhe-nv
|
263c6c0ad0
|
test: skip post blackwell (#6357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-01 13:10:14 -04:00 |
|
Lucas Liebenwein
|
5247df6ae2
|
[AutoDeploy] merge feat/ad-2025-07-22 (#6520)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Gal Agam <ghubaraagam@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: haoguo <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Gal Agam <ghubaraagam@cw-dfw-h100-004-328-012.cm.cluster>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-08-01 08:51:08 -07:00 |
|
Emma Qiao
|
16febefee0
|
[None][Infra] - Skip failed tests in post-merge (#6558)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-08-01 22:21:23 +08:00 |
|
yunruis
|
a20ab5cbdb
|
[https://nvbugs/5381276][fix] fix warning for fused_a_gemm (#6402)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-08-01 09:37:21 -04:00 |
|
brb-nv
|
7447d6ed85
|
[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-01 09:19:54 -04:00 |
|
liji-nv
|
1daa8c3232
|
[https://nvbugs/5340941][https://nvbugs/5375785] - fix: Wrap attentio… (#6355)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-08-01 07:38:06 -04:00 |
|
Yanchao Lu
|
f39d621c3b
|
[None][infra] Pin the version for triton to 3.3.1 (#6508) (#6519) (#6549)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-01 07:33:24 -04:00 |
|
xinhe-nv
|
fca0d37798
|
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 (#6552)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-01 20:27:22 +10:00 |
|
juney-nvidia
|
137413fbf4
|
[None][doc] Exposing the latest tech blogs in README.md (#6553)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
|
2025-08-01 17:41:52 +08:00 |
|
chenfeiz0326
|
ba5bdbb138
|
[None][chore] Disable add special tokens for Llama3.3 70B (#6482)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-08-01 17:03:27 +08:00 |
|
Kaiyu Xie
|
147ad69368
|
[None][doc] blog: Scaling Expert Parallelism in TensorRT-LLM (Part 2: Performance Status and Optimization) (#6547)
Signed-off-by: Kaiyu XIe <26294424+kaiyux@users.noreply.github.com>
|
2025-08-01 16:46:15 +08:00 |
|