DylanChen-NV
74dca0aa7b
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support ( #5029 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-09 23:16:42 +08:00
Omer Ullman Argov
a32f7083b4
[ci] parallelize torch unittests ( #5714 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-09 11:05:57 +03:00
Dom Brown
3e3b1769ad
[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner ( #5764 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-07-09 08:21:58 +01:00
Erin
e277766f0d
chores: merge examples for v1.0 doc ( #5736 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
Lucas Liebenwein
d14dd2f597
[AutoDeploy] re-enable waive for flaky AD test ( #5867 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-07-09 11:47:48 +09:00
Bo Li
9d894bc0cb
fix: [ https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. ( #5842 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-09 10:17:05 +08:00
brb-nv
2bd09ed2d4
fix: Skip rope scaling for local layers in Gemma3 VLM ( #5857 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-09 10:10:33 +08:00
Venky
e27215ca03
test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) ( #5654 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-08 18:16:21 -07:00
xavier-nvidia
b6013da198
Fix GEMM+AR fusion on blackwell ( #5563 )
...
Signed-off-by: xsimmons <xsimmons@nvidia.com>
2025-07-09 08:48:47 +08:00
Fridah-nv
a79b73f577
fix: [5376140] [AutoDeploy] Update unit tests: skip all_close assert for dropout in attention, increase tolerance for rope op test ( #5855 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-07-09 09:13:31 +09:00
Yan Chunwei
e50d95c40d
chore [TRTLLM-6161]: add LLM speculative decoding example ( #5706 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-09 07:33:11 +08:00
Pamela Peng
da8c7372d4
[TRTLLM-5366][feat]Add support for sm121 ( #5524 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.
2025-07-08 14:27:00 -07:00
Chang Liu
08a3dfeb2b
[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava ( #5814 )
2025-07-08 09:53:11 -07:00
Dom Brown
e3ccca06e1
test: reduce redundant test cases for TRTLLM Gen FP8 MoE ( #5845 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-07-09 00:40:33 +09:00
Kaiyu Xie
bb5b16fcb9
feat: Return context response immediately when stream_interval > 1 ( #5836 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-09 00:19:57 +09:00
Raayan Dhar
e3268a4221
[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg ( #5732 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-07-08 09:39:58 -04:00
Yegor
b01d1c28f7
[feat] Detokenize option in /v1/completions request ( #5382 )
...
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
2025-07-08 19:36:04 +08:00
Yiqing Yan
ec0d7e64b9
[Infra] - Waive L0 test ( #5837 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-08 17:54:06 +08:00
xinhe-nv
89bbb230cc
tests: waive failed cases on main ( #5781 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-08 19:44:12 +10:00
nv-guomingz
c8fa08da5c
doc: update cuda_graph_config usage part in DS R1 docs ( #5796 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-08 16:54:46 +09:00
Enwei Zhu
55f86ce7ab
[NvBug 5362426] fix: Fix prompt adapter TP2 case ( #5782 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-08 16:01:36 +09:00
Venky
9258187e98
Waive some test_llama_eagle3 unittests ( #5811 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-08 15:35:27 +09:00
liji-nv
95978e3044
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage ( #5700 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-08 12:42:15 +08:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
Yechan Kim
5bc3a15f10
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL ( #5522 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-07 18:03:12 -07:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
Omer Ullman Argov
1191555cce
[ci] speedup fused moe tests ( #5726 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-07 18:03:15 +03:00
Robin Kobus
30a19fcf7c
[TRTLLM-6291] feat: Add user-provided speculative decoding support ( #5204 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-07 16:30:43 +02:00
DylanChen-NV
5ca2b9bb15
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow ( #5615 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-07 18:04:57 +08:00
Yi Zhang
ed1b3c884a
fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1 tests ( #5774 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-07 18:38:54 +09:00
Yan Chunwei
dfce61f4b9
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler ( #5751 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-07 17:05:14 +08:00
xinhe-nv
ded38ebdbd
test: [CI] remove closed bugs ( #5770 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-07 18:06:07 +10:00
Bo Li
9db2e9ee47
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. ( #5772 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-07 14:58:32 +08:00
Yanchao Lu
2013034948
[Test] - Waive or fix few known test failures ( #5769 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 21:14:16 +08:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow ( #5333 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
Chuang Zhu
ffc0b8f5da
Cache transceiver support VSWA ( #5505 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-05 01:18:42 +09:00
Yiqing Yan
7f3ea058f0
[Infra] - Waive L0 flaky test ( #5759 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 19:25:12 +09:00
Shunkangz
32339d1b20
Raise shut down error for each request ( #4936 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-07-04 18:58:24 +09:00
xinhe-nv
3869b969a6
test: [CI] Add failed cases into waives.txt ( #5718 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 17:24:48 +09:00
Faraz
81c0764012
Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 ( #5724 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-07-04 16:53:20 +09:00
Yiqing Yan
b8fef809ae
[Infra] - Waive L0 test ( #5748 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 15:04:49 +08:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation ( #5476 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
Emma Qiao
a0135c0f6f
[Infra] - Waive failed cases on release/0.21 ( #5674 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-04 13:14:13 +08:00
brb-nv
cdaa6abce7
fix: Investigate Gemma3 1B decoder output discrepancy ( #5564 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yi Zhang
73d30a23c7
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests ( #5397 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Zheng Duan
cb9f596dbe
[nvbug 5300551] test: increase block count in eviction test ( #5465 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
nv-guomingz
d0b3d2ac65
fix: https://nvbugs/5362398 ( #5609 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yan Chunwei
77288d3671
fix [nvbug5351244]: test_mpi_session submit sync/async ( #5608 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
xinhe-nv
7f837b6e8b
tests: waive failures on main ( #5704 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 12:39:12 +09:00
Venky
4762e0b244
Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example ( #5740 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-07-04 11:01:08 +09:00