Mike Iovine
|
5e6f1bcd24
|
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-03 10:12:10 -08:00 |
|
Yechan Kim
|
f48968b6cc
|
[TRTLLM-6928][fix] Refactor multimodal unittest (#8453)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-03 06:01:07 -08:00 |
|
Fanrong Li
|
e9f78c687a
|
[https://nvbugs/5625962][chore] unwaive DS-v32-fp4 tests (#8853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-03 00:34:52 -08:00 |
|
dongfengy
|
6d6797c792
|
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
|
2025-11-02 16:44:02 -08:00 |
|
Fanrong Li
|
f0dc746738
|
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-31 14:38:31 -07:00 |
|
Yuxian Qiu
|
025d2926df
|
[https://nvbugs/5599515][fix] Fix PP bubbles. (#8687)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-10-31 10:13:56 +08:00 |
|
Mike Iovine
|
b87448b009
|
[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-10-30 15:47:04 -04:00 |
|
HuiGao-NV
|
ae57738bae
|
[https://nvbugs/5547414][fix] Use cached models (#8755)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-10-29 19:10:10 -07:00 |
|
Iman Tabrizian
|
ae6875fe10
|
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-10-29 08:04:26 -07:00 |
|
Chang Liu
|
81eb861df0
|
[None][chore] Enable GPQA in CI for DeepSeek V3.2 (#8712)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-10-29 04:22:22 -07:00 |
|
dongfengy
|
083f3637f1
|
[https://nvbugs/5596343][test] Update test waive to get back some coverage (#8702)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
|
2025-10-28 14:05:48 -07:00 |
|
Anish Shanbhag
|
a09b38a862
|
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-28 09:17:26 -07:00 |
|
dongfengy
|
5a01f382c1
|
[https://nvbugs/5575913][fix] Use separate thresholds for 120b/20b gptoss (#8664)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
|
2025-10-28 10:35:07 -04:00 |
|
Bo Li
|
9c4432f8a4
|
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-10-27 13:23:06 -04:00 |
|
xinhe-nv
|
0ac5cbcac4
|
[None][chore] Add failed cases into waives.txt (#8669)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-27 02:36:28 -04:00 |
|
Chenghao Zhang
|
a6d20f6f9b
|
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2025-10-25 15:26:45 -04:00 |
|
Simeng Liu
|
2b27810198
|
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-10-24 19:09:07 -07:00 |
|
Chang Liu
|
e47c787dd7
|
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-24 13:40:41 -04:00 |
|
Chuang Zhu
|
2420918e5b
|
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-10-24 08:58:16 -04:00 |
|
xinhe-nv
|
2aaedd08cd
|
[TRTLLM-8638][fix] fix test issues (#8557)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-24 02:16:55 -04:00 |
|
xinhe-nv
|
59375e8bed
|
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8590)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-24 00:02:42 -04:00 |
|
Anthony Chang
|
8a3b870e09
|
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-23 09:14:18 +08:00 |
|
sunnyqgg
|
90080e0e09
|
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-10-22 09:58:22 +08:00 |
|
Chenghao Zhang
|
bac9e8c2ad
|
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469)
|
2025-10-21 15:32:01 -07:00 |
|
Suyog Gupta
|
7050b1ea49
|
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-10-20 15:31:52 -07:00 |
|
Pamela Peng
|
b818a912d7
|
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models (#8190)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
|
2025-10-20 06:36:09 -04:00 |
|
Lucas Liebenwein
|
41169fb20c
|
[None][feat] AutoDeploy: chunked prefill support (#8158)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-18 00:47:35 -07:00 |
|
h-guo18
|
55fed1873c
|
[None][chore] AutoDeploy: cleanup old inference optimizer configs (#8039)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-17 15:55:57 -04:00 |
|
xinhe-nv
|
bc833d3de3
|
[TRTLLM-8638][fix] add waives tests (#8445)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-17 03:37:53 -07:00 |
|
zhhuang-nv
|
7a2bab93f0
|
[None][test] Add post merge test for Seed-OSS-36B-Instruct (#8321)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
|
2025-10-17 02:30:33 -07:00 |
|
bhsueh_NV
|
69325e1aa3
|
[https://nvbugs/5574556][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI (#8351)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-10-16 22:46:19 +08:00 |
|
Enwei Zhu
|
526cad37d7
|
[https://nvbugs/5568951][fix] Fix guided decoding disagg tests (#8311)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-10-16 22:46:19 +08:00 |
|
Ivy Zhang
|
be2ab98233
|
[None][chore] Update constaintfor release (#8211)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-10-16 22:46:19 +08:00 |
|
Yukun He
|
179c7dc501
|
[https://nvbugs/5536131][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-10-16 22:46:19 +08:00 |
|
xinhe-nv
|
f70eff30b3
|
[TRTLLM-8638][fix] waive llam4 tests on H20 (#8416)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-16 03:14:56 -07:00 |
|
dongfengy
|
7a0aa64973
|
[None][fix] Refactor triton paddings (#6980)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
|
2025-10-15 12:59:01 -07:00 |
|
Yuxian Qiu
|
3450fe9944
|
[None][fix] Fix dummy load format for key models. (#7993)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-10-14 11:18:39 +08:00 |
|
xinhe-nv
|
72fcff1044
|
[None][fix] add timeout for llama4 (#8254)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-12 21:04:20 -07:00 |
|
Guoming Zhang
|
989c25fcba
|
[None][doc] Add qwen3-next doc into deployment guid and test case into L0. (#8288)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Faradawn Yang <faradawny@gmail.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-10-13 10:25:45 +08:00 |
|
xinhe-nv
|
b555f1ff98
|
[None][chore] Add failed cases into waives.txt (#8229)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-09 23:45:28 -07:00 |
|
Lucas Liebenwein
|
3492391feb
|
[None][chore] AutoDeploy: clean up accuracy test configs (#8134)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-06 12:51:01 -07:00 |
|
Jonas Yang CN
|
88ea2c4ee9
|
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-10-04 08:12:24 +08:00 |
|
Lucas Liebenwein
|
2c454e8003
|
[None][feat] AutoDeploy: Nemotron-H accuracy test (#8133)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-03 15:39:03 -07:00 |
|
Lucas Liebenwein
|
5faa5e9dd8
|
[None][feat] AutoDeploy: dive deeper into token generation bugs + enable_block_reuse (#8108)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-03 04:57:26 -07:00 |
|
Erin
|
293637e0a1
|
[https://nvbugs/5556020][chore] waive test_eagle3 (#8119)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-10-02 05:33:21 -04:00 |
|
mpikulski
|
fc7f78c400
|
[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling (#8110)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-02 10:20:32 +02:00 |
|
Cheng Hang
|
cdce68c3e0
|
[TRTLLM-6741][fix] Add heuristics for lm head tp size when enable_lm_head_tp_in_adp=True (#7891)
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-30 09:24:35 +08:00 |
|
Ivy Zhang
|
1e2e851db8
|
[None][chore] update test case constraint (#8020)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-09-29 13:25:09 +08:00 |
|
Ivy Zhang
|
0ecafd84da
|
[None][chore] Update chunked prefill test case configs (#7868)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-09-29 10:37:34 +08:00 |
|
Iman Tabrizian
|
33282351a2
|
[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-09-27 19:29:30 -04:00 |
|