Commit Graph

989 Commits

Author SHA1 Message Date
Stanley Sun
9518e14f69
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (#6115)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-17 20:55:04 +10:00
Yi Zhang
a718486900
fix: Fix DeepSeek R1 CI (#6129)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-17 18:24:49 +08:00
nv-guomingz
9b45499caa
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-17 18:05:45 +08:00
Erin
de60ae47e3
chores: unwaive a few tests for v1.0 (#6107)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-17 17:59:51 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Iman Tabrizian
d4d21a106e
[fix] Release slots with spec decode + disagg (#5975) (#6032)
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-17 12:58:18 +08:00
chenfeiz0326
fe070a0168
test: Update Llama4 Scout FP4 & FP8 accuracy tests (#5901)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-07-17 09:41:18 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
qixiang-99
e09e409dfb
Fix: Enhance ModelConfig for kv cache size calculations (#5868)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-16 14:41:31 -07:00
shaharmor98
e0836f9ca9
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-17 00:50:30 +08:00
Wanli Jiang
9354114f68
fix: Update trtllm args issues with extra nested config (#5996)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 12:41:45 -04:00
Emma Qiao
e30d7bec38
[Infra] - Waive failed cases in post-merge on main (#6096)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-16 22:41:18 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Ivy Zhang
dda91b5117
tests: add QA test cases (#5959)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:14:25 +08:00
Yan Chunwei
7568deb2f1
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig (#6001)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:05:38 +08:00
Ivy Zhang
763012a88a
[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill (#6051)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:04:08 +08:00
peaceh-nv
f5f31beee1
feat: Add deepseek-lite tests for RTX pro 6000 (#5903)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-16 15:51:45 +08:00
Zheng Duan
385af53a4d
[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test (#6041)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-16 13:52:13 +08:00
Wanli Jiang
8679a058a3
fix: Unable to load phi4-model with tp_size>1 (#5962)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 11:39:41 +08:00
Aurelien Chartier
6a47cac981
feat: Add support for Triton request cancellation (#5898)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-15 20:52:43 -04:00
danielafrimi
edab7532dd
feat/add latency support for trtllm bench (#3730)
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Daniel Afrimi <dafrimi@nvidia.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-07-15 13:13:49 -07:00
brb-nv
9214ac662a
test: Add regression tests for Gemma3 VLM (#6033)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-15 11:37:56 -07:00
Fanrong Li
7a1af1c738
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 (#5989)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-07-16 01:33:12 +09:00
MinaHuai
9ebc3ab9c4
[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision (#5998)
Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>
2025-07-15 10:01:35 -04:00
Jaedeok Kim
ab1c54709d
fix: adjust window sizes of VSWA at torch backend (#5880)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2025-07-15 17:41:54 +08:00
ruodil
2a147c4d01
test: add llama_v3.3_70b_cases in perf test (#6035)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:59 +10:00
ruodil
2504aa552e
test: add recursive updating pytorch config and change MOE backend format in perf test (#6046)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:15 +10:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Yiqing Yan
6b35afaf1b
[Infra][TRTLLM-6013] - Fix stage name in single stage test rerun report (#5672)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-15 12:27:21 +09:00
ixlmar
f225f5cd2e
[nvbugs-5318143] fix: restrict PyTorch memory usage to avoid OOMs (#5964)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-15 06:49:42 +08:00
Iman Tabrizian
c4ee535afb
[fix] fix eagle3 two model disaggregated serving test (#6014)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-15 04:26:04 +09:00
brb-nv
f5f5be9e94
enh: Bidirectional mask with multiple images for Gemma3 (#5976)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 22:39:18 +08:00
brb-nv
1a2d96919c
feat: Update Gemma3 Vision Encoder (#5973)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 22:38:10 +08:00
Clay
dbf29184dc
fix #4974: A thread leak issue in scaffolding unittest (#5020)
Signed-off-by: Clay <ccs96307@gmail.com>
2025-07-14 20:22:03 +09:00
Kaiyu Xie
aa97fbb2ad
[Nvbug/5383670] fix: switch test case to non-fp4 ckpt for more GPU coverage (#5882)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-14 20:21:46 +09:00
Yiqing Yan
c720d7f779
Waive L0 test (#6002)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-14 19:55:34 +09:00
Zhanrui Sun
3a0ef73414
infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check (#5709)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-14 18:52:13 +09:00
Zhenhuan Chen
30608a5e6d [https://nvbugs/5355316] fix: update torch.compile option to fix triton store_cubin error (#5865)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-14 17:17:30 +08:00
Robin Kobus
5a61d64b5b [nvbugs/5345391] fix: chunked prefill + overlap scheduling (#5761)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
3fcaa8a310 [nvbug 5327706][fix] fix mgmn postprocess error (#5835)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
ruodil
347520494b test: remove duplicate cases in perf sanity test (#5870)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Bo Li
6d79559f3e fix: [https://nvbugs/5351130][https://nvbugs/5333654] Unwaive for bug 5351130 and 5333654. (#5821)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Bo Li
2991cf4b80 fix: [https://nvbugspro.nvidia.com/bug/5345215] Unwaive for bug 5345215. (#5606)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yan Chunwei
3e1fd983c3 [nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
388b4919b8 [nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
6992616c1f [nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
ruodil
278a1a7df3 test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases (#5693)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Iman Tabrizian
c8874a7f94 [nvbug/5337601][fix] Fix disagg + speculative decoding (#5558)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yi Zhang
9cc4e5d50e [nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00