Commit Graph

1972 Commits

Author SHA1 Message Date
Emma Qiao
d0ea417ec8
[None][infra] Waive failed tests for main 11/13 (#9132)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-13 01:00:40 -08:00
xinhe-nv
548f5ce4bc
[None][fix] waive failed tests (#9090)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 23:40:00 -08:00
xinhe-nv
8fa3c55c76
[None][chore] Remove closed bugs (#9114)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 22:49:37 -08:00
ruodil
c86e36fe38
[None][test] add deepseek and qwen cases for rtx series (#8839)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-11-12 22:28:02 -08:00
HuiGao-NV
cde18c12da
[https://nvbugs/5640873][fix] Move thop tests to pre-merge (#9094)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-13 13:08:13 +08:00
Zhang Ge
49df731b96
[#6507][fix] Fix precision issue due to KV layout mismatch for split/concat kernels (#6917)
Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-13 12:14:58 +08:00
Yan Chunwei
4fd93bdc2c
[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming (#9118)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-12 19:55:09 -08:00
Yan Chunwei
8a8883bc73
[None][chore] Waive test_llm_rpc_streaming (#9113)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-13 11:06:26 +08:00
Zhenhuan Chen
943b05e2d3
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-13 10:34:17 +08:00
QI JUN
3416efbc29
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill (#9111)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-13 10:06:32 +08:00
dongxuy04
9241ccaf27
[None][feat] Enable EPLB for trtllm-gen and cutlass backend (#8886)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-12 12:30:27 -08:00
Chenghao Zhang
5f26c31954
[https://nvbugs/5636912][fix] AutoDeploy: Unwaive the test (#9018)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-12 12:26:38 -08:00
Patrice Castonguay
8a751a0e56
[None][chore] Remove is_disaggregated param in executor request queue (#9049)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-12 13:37:15 -05:00
Fanrong Li
780d4f9dc5
[None][feat] Add MTP>1 support for DS-v3.2 (#9045)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-12 09:56:12 -08:00
Iman Tabrizian
cdde15b275
[TRTLLM-8540][feat] Add support for disagg in DSv3.2 (#8735)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-11-12 08:21:11 -08:00
mpikulski
264d38e6c5
[TRTLLM-9175][test] ensure sampling is async (#9076)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-12 15:27:52 +01:00
yufeiwu-nv
b7a2574c60
[https://nvbugs/5568991][test] Remove Phi-3 models (#9066)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-12 03:16:36 -08:00
QI JUN
4003dc7574
[None][ci] waive some test cases of disaggregated serving (#9085)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-12 15:06:21 +08:00
Emma Qiao
bb6eb9510d
[None][infra] Waive a failed case of disaggregated/test_disaggregated.py (#9074)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 19:38:32 -08:00
QI JUN
fd703fbb7b
[None][ci] run speculative unit tests serially (#9080)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 19:06:44 -08:00
Lucas Liebenwein
aca56097cb
[None][fix] AutoDeploy: update nano3 accuracy test (#9061)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-11 12:26:31 -08:00
QI JUN
524754b6fd
[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 10:13:45 -08:00
Chenghao Zhang
ec9cf715a2
[None][feat] AutoDeploy: Perf improvement for mamba layers (#8991)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-11 08:27:07 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
mpikulski
b151de4a8f
[TRTLLM-8377][test] unit tests for TorchSampler batched sampling (#9012)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-11 07:16:42 -08:00
HuiGao-NV
23c388c58b
[https://nvbugs/5616189][fix] Make more cases use local cached models (#8935)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-11 03:14:05 -08:00
QI JUN
0ce22ce928
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] (#9069)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 02:11:15 -08:00
Yiqing Yan
b7d51c5549
[None][chore] Remove duplicated waive test (#9067)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-11 16:49:49 +08:00
Emma Qiao
da1f0e2465
[None][infra] Waive failed tests on main 11/11 (#9058)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 13:19:30 +08:00
xinhe-nv
fac522056c
[None][chore] Add failed cases into waives.txt (#8998)
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-11-11 12:40:59 +08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py (#8938)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
dongfengy
972c21c142
[None][chore] Clean up unused and confusing code in moe test (#9019)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-10 18:52:21 -08:00
Yechan Kim
0938a3ad2a
[https://nvbugs/5644187][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-11 10:24:31 +09:00
Frida Hou
f40e1f7496
[https://nvbugs/5625972][fix] Add context manager to fix FakeTensorProp (#9047)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-10 16:25:58 -08:00
xiweny
50c486367a
[https://nvbugs/5619396][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-10 08:12:14 -08:00
ChristinaZ
2e7769d1e8
[None][feat] Add customized topk and related unit tests for DSA (#8882)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-10 03:35:35 -08:00
xinhe-nv
f848d844d9
[None][chore] Add failed cases into waives.txt (#9030)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-09 23:36:05 -08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
Bo Li
67af7c15a5
[https://nvbugs/5637037][fix] Update unwaive list. (#9001)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-10 08:53:07 +08:00
Emma Qiao
183778d58a
[None][infra] Waive failed tests for main 11/07 (#9008)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-08 08:51:35 -08:00
Emma Qiao
2af6a537ad
[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-08 06:34:24 -08:00
mpikulski
533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
Chang Liu
7081f254cf
[None][perf] Add custom indexer k cache scatter op (#8960)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 11:24:26 -08:00
Patrice Castonguay
d8ea0b967f
[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-07 07:33:51 -08:00
Yuxian Qiu
7b82ba90da
[https://nvbugs/5629790][chore] unwaive test. (#8967)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-07 18:41:32 +08:00
Stefan Niebler
326a201473
[https://nvbugs/5508536][fix] Take Over (#8627): Reintroduce: Move stop_criteria to sample_async (#7041) (#8794)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Lizhi Zhou
b26e1617f2
[https://nvbugs/5633340][fix] kill processes properly after test (#8970)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-06 21:45:38 -08:00