Commit Graph

891 Commits

Author SHA1 Message Date
liji-nv
95978e3044
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage (#5700)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-08 12:42:15 +08:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818) 2025-07-08 13:15:30 +09:00
Yechan Kim
5bc3a15f10
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-07 18:03:12 -07:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
Omer Ullman Argov
1191555cce
[ci] speedup fused moe tests (#5726)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-07 18:03:15 +03:00
Robin Kobus
30a19fcf7c
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-07 16:30:43 +02:00
DylanChen-NV
5ca2b9bb15
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-07 18:04:57 +08:00
Yi Zhang
ed1b3c884a
fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1 tests (#5774)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-07 18:38:54 +09:00
Yan Chunwei
dfce61f4b9
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler (#5751)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-07 17:05:14 +08:00
xinhe-nv
ded38ebdbd
test: [CI] remove closed bugs (#5770)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-07 18:06:07 +10:00
Bo Li
9db2e9ee47
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-07 14:58:32 +08:00
Yanchao Lu
2013034948
[Test] - Waive or fix few known test failures (#5769)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 21:14:16 +08:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
Chuang Zhu
ffc0b8f5da
Cache transceiver support VSWA (#5505)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-05 01:18:42 +09:00
Yiqing Yan
7f3ea058f0
[Infra] - Waive L0 flaky test (#5759)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 19:25:12 +09:00
Shunkangz
32339d1b20
Raise shut down error for each request (#4936)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-07-04 18:58:24 +09:00
xinhe-nv
3869b969a6
test: [CI] Add failed cases into waives.txt (#5718)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 17:24:48 +09:00
Faraz
81c0764012
Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5724)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-07-04 16:53:20 +09:00
Yiqing Yan
b8fef809ae
[Infra] - Waive L0 test (#5748)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 15:04:49 +08:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation (#5476)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
Emma Qiao
a0135c0f6f [Infra] - Waive failed cases on release/0.21 (#5674)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-04 13:14:13 +08:00
brb-nv
cdaa6abce7 fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yi Zhang
73d30a23c7 test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Zheng Duan
cb9f596dbe [nvbug 5300551] test: increase block count in eviction test (#5465)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
nv-guomingz
d0b3d2ac65 fix:https://nvbugs/5362398 (#5609)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yan Chunwei
77288d3671 fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
xinhe-nv
7f837b6e8b
tests: waive failures on main (#5704)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 12:39:12 +09:00
Venky
4762e0b244
Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example (#5740)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-07-04 11:01:08 +09:00
Lucas Liebenwein
24ac9b5f69
[AutoDeploy] merge feat/ad-2025-06-29 (#5737)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-07-04 10:21:18 +09:00
Netanel Haber
f91379b7e8
delete duplicate eagle3 and ngram tests (#5711)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-07-03 15:47:26 +03:00
Omer Ullman Argov
c72856188c
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-03 08:06:10 -04:00
Emma Qiao
530897388c
[Infra] - Waive a failed case on main (#5702)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-03 06:09:27 -04:00
tomeras91
7dbecf7272
[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-03 11:07:51 +03:00
Emma Qiao
2a5fdebf10
[Infra] - Waive failed tests for main 0702 (#5671)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 22:05:07 -04:00
Fridah-nv
afef5127f0
feat:[AutoDeploy] E2E build example for llama4 VLM (#3922)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-07-02 19:29:34 -04:00
Emma Qiao
31699cbeb1
[Infra] - Set default timeout to 1hr and remove some specific settings (#5667)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 08:37:54 -04:00
Jhao-Ting Chen
77082cde38
[https://nvbugspro.nvidia.com/bug/5329655] [feat] Pytorch path add spec dec param to attention op (#5146)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-07-02 04:54:43 -04:00
qixiang-99
ca7b6ec8d8
Feat/pytorch vswa kvcachemanager (#5151)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-02 15:58:00 +08:00
Yan Chunwei
2d69b55fe8
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-02 14:21:37 +08:00
Xiaowei Wang
32dfdfba30
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-07-01 23:02:41 -04:00
HuiGao-NV
10c50515c2
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-07-02 09:49:20 +08:00
Aurelien Chartier
fa95e402a5
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-01 12:16:09 -07:00
liji-nv
c345f5876c
[feat] Support torch compile for attention dp (#5086)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-01 13:48:52 -04:00
Kaiyu Xie
f9a455651b
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-01 09:35:25 -04:00
Yan Chunwei
3bc703d450 ci: unwaive llmapi launch test (#5281)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Emma Qiao
178fc3f655 [Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-01 20:12:55 +08:00
Yan Chunwei
ee7fcbf20e [nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
ruodil
ded203d8aa test: set enable_attention_dp=True in default deepseek settings (#5461)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
brb-nv
4ef60d5fbb nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Ivy Zhang
61213e3562 tests: fix typos in qa test (#5421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-01 20:12:55 +08:00