Yan Chunwei
5eae3184fa
[None][chore] add missing tests to test list ( #6590 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-06 22:12:27 +08:00
Yechan Kim
1aed7511fe
[ https://nvbugs/5430124 ][fix] Mistral mixture_text_image test case fix ( #6648 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-06 06:58:58 -07:00
Iman Tabrizian
13ecb4aced
[ https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests ( #6644 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-06 09:08:29 -04:00
Pengyun Lin
79fc2f48c0
[None][chore] Enhance trtllm-serve example test ( #6604 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-06 20:30:35 +08:00
Zongfei Jing
0ff8df95b7
[ https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA ( #6588 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-06 16:44:21 +08:00
ruodil
907c180eb2
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 ( #6632 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 02:25:57 -04:00
Iman Tabrizian
43bd861ce1
Update allreduce benchmark for torch ( #6271 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-05 23:25:23 -07:00
ruodil
0bd99b5d6d
[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test ( #6650 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 01:45:13 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization ( #6061 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Yechan Kim
c17f4984e2
[None][feat] Refactor Llava-Next ( #6478 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-05 17:53:53 -07:00
Aurelien Chartier
6da95f29a9
[None][feat] Add support for fused gate_up_proj scales for FP8 blockwise ( #6496 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-05 11:22:32 -07:00
ixlmar
1ebceb790d
[TRTLLM-5508][feat] check input tokens + improve error handling ( #5170 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-08-05 18:27:43 +01:00
liji-nv
dcbfa7e509
[ https://nvbugs/5252313 ][fix] Fix torch compile + MTP ( #6554 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-05 10:31:29 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system ( #6464 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
Emma Qiao
78a75c2990
[None][Infra] - Split gb200 stages for each test ( #6594 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-05 07:10:00 -04:00
xinhe-nv
c32584125e
[TRTQA-2920][fix] Add failed cases into waives.txt ( #6600 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 ( #6589 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Ivy Zhang
08ed9d7305
[None][doc] add introduction doc on qa test ( #6535 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 17:02:17 +08:00
Ivy Zhang
d101a6cebc
[ https://nvbugs/5410279 ][test] resubmit timeout refactor ( #6337 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 16:39:25 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec ( #6379 )
...
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Leslie Fang
164acfa31e
[None][infra] Skip test_eagle3 test with device memory check ( #6617 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-05 02:36:03 -04:00
ruodil
7625845365
test: add README_release_test.md for perf test ( #6443 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-05 02:07:42 -04:00
xinhe-nv
a178cea324
[TRTLLM-6856][feat] add disaggregated serving tests to QA list ( #6536 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 12:47:53 +10:00
xinhe-nv
fe3d607c4b
[TRTQA-2920][fix] Add failed cases into waives.txt ( #6581 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-05 12:41:23 +10:00
brb-nv
6135f75f87
[None][chore] Update Gemma3 closeness check to mitigate flakiness ( #6591 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-04 10:10:58 -04:00
Olya Kozlova
13cc1c4878
[TRTLLM-5271][feat] best_of/n for pytorch workflow ( #5997 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-08-04 14:08:06 +02:00
Ivy Zhang
f3651adea8
[None][test] update invalid test name ( #6596 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 08:01:05 -04:00
Emma Qiao
5d8a5a0cb8
[None][Infra]Waive failed case in post-merge on main ( #6602 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-04 19:39:44 +08:00
brb-nv
87e4e9f468
[None][chore] Add unit test for Gemma3 lora ( #6560 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-04 04:56:57 -04:00
Pengyun Lin
a15e33351d
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens ( #6259 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-04 15:09:51 +08:00
xinhe-nv
a54972e463
[None][fix] remove closed bugs ( #6576 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 15:52:11 +10:00
Yuan Tong
a2f271c8e0
[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory ( #5034 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-04 13:51:01 +08:00
Leslie Fang
b9fe0fa7ec
[None][infra] Enable test of chunked prefill with logit post processor ( #6483 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:46:07 -04:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill ( #6386 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
ruodil
6459725bf9
test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) ( #6533 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 15:22:39 +10:00
Ivy Zhang
5eefdf2c75
tests: Add llama4 functional cases ( #6392 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
ruodil
8d82ccca63
test: modify max_lora_rank of phi4_multimodal to 320 ( #6474 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 12:20:22 +10:00
Yechan Kim
ee6ab5be96
chore: add EXAONE4 accuracy test ( #6397 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-04 10:14:16 +08:00
Ivy Zhang
7547a7d0a2
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list ( #6436 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-03 22:11:26 -04:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 ( #5678 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Jhao-Ting Chen
4da5cfc511
[None][infra] add eagle3 one model accuracy tests ( #6264 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-02 16:07:46 -07:00
Shunkangz
67a3fd858b
[None][feat] Add support of scheduling attention dp request ( #6246 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-01 20:38:01 -04:00
Richard Huo
31802de0b0
[None][fix] Serialize the window_size in the kv event ( #6526 )
...
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
2025-08-01 15:25:18 -07:00
Lizhi Zhou
6f34f3489b
[TRTLLM-6357][test] Add accuracy tests for Qwen3 ( #6177 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-01 13:33:34 -04:00
xinhe-nv
263c6c0ad0
test: skip post blackwell ( #6357 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 13:10:14 -04:00
Lucas Liebenwein
5247df6ae2
[AutoDeploy] merge feat/ad-2025-07-22 ( #6520 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Gal Agam <ghubaraagam@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: haoguo <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Gal Agam <ghubaraagam@cw-dfw-h100-004-328-012.cm.cluster>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-08-01 08:51:08 -07:00
Emma Qiao
16febefee0
[None][Infra] - Skip failed tests in post-merge ( #6558 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-01 22:21:23 +08:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 ( #6371 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
liji-nv
1daa8c3232
[ https://nvbugs/5340941 ][ https://nvbugs/5375785 ] - fix: Wrap attentio… ( #6355 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-01 07:38:06 -04:00
xinhe-nv
fca0d37798
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 ( #6552 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 20:27:22 +10:00
chenfeiz0326
ba5bdbb138
[None][chore] Disable add special tokens for Llama3.3 70B ( #6482 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-01 17:03:27 +08:00
Yukun He
90856bf97d
[ https://nvbugs/5419069 ][fix] Fix the mismatched layer name components. ( #6417 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-01 16:32:39 +08:00
Yang Li
ac23f4a80d
[TRTLLM-4279] fix: Add a protection test for checking trtllm custom ops ( #6515 )
...
Signed-off-by: Yang Li <56944310+yali-arch@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-01 15:59:09 +08:00
Ivy Zhang
71524a1a48
[ https://nvbugs/5419066 ][fix] Use trt flow LLM ( #6467 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-01 03:33:07 -04:00
Venky
ad5742b105
[fix] Update get_trtllm_bench_build_command to handle batch size and tokens ( #6313 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-01 00:08:09 -04:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell ( #6486 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
brb-nv
2eca0d5925
fix: Fix poor generation with FP8 Gemma3 1B checkpoint ( #6499 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-31 17:18:23 -07:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
Ziyi Xiong
8062e0fe7c
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically ( #6363 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-31 15:31:39 -04:00
tomeras91
6d5da9f7c2
[ https://nvbugs/5404046 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test ( #6485 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-31 21:35:10 +03:00
shaharmor98
0c42f54a39
Bugfix/fix nemotron nas lora support ( #6380 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-31 13:39:35 -04:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control ( #6220 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
Faraz
8e84df74b5
Fix e2e test failure for RTX6000 Pro ( #6420 )
...
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>
2025-07-30 23:32:44 -04:00
xinhe-nv
ca534e4798
test: add accuracy reference ( #6479 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-31 12:27:29 +10:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI ( #6477 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
brb-nv
0e16d1f070
test: Add time logging for lora tests ( #6466 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 14:02:43 -07:00
Anurag Mukkara
fac186e3b5
[nvbug/5409417] Unwaive llava test case ( #6460 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-07-30 14:38:47 -04:00
brb-nv
f6287e4498
Unwaive Gemma2 LoRA test on H100 ( #6461 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 12:56:12 -04:00
Bo Deng
24e7f4eece
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests ( #6439 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-31 00:41:37 +08:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm ( #6353 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
pcastonguay
0f083b9daf
fix: Unwaive triton cpp test [nvbug 5401088] ( #6412 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-30 11:25:18 -04:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. ( #6419 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
Chang Liu
b4065d8ca6
[TRTLLM-6654][feat] Add support for external multimodal embeddings ( #6263 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-07-30 10:00:15 -04:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend ( #6369 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
tomeras91
a2514d93fc
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 ( #6447 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-30 07:22:32 -04:00
Yechan Kim
22b29df38c
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 ( #6427 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 17:29:55 +08:00
xinhe-nv
d9ab3fd35e
tests: add TestNemotronH cuda graph tests ( #6390 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 18:45:58 +10:00
nv-guomingz
a5540acfce
chore: add trtllm-serve json schema example into doc. ( #6418 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 04:33:08 -04:00
2ez4bz
d6eed1b624
[fix] Switch placement of image placeholder for mistral 3.1 ( #6435 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-30 14:10:36 +08:00
xinhe-nv
c00d6763b2
test: [CI] Add failed cases into waives.txt ( #6457 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-30 12:36:58 +10:00
Venky
ab40369053
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests ( #6463 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 10:53:43 +10:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts ( #6345 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
Yan Chunwei
ad662ddcdd
chore: disallow arbitrary in llm_args.Configs ( #6367 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-29 16:16:52 -04:00
Yan Chunwei
1a6930986a
chore: remove unused kv_cache_dtype in api reference ( #6444 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-29 14:57:20 -04:00
Michal Guzek
7efe3cb0cd
[fix] Add detokenization-based stop word logic to LLM API ( #5948 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-07-29 10:16:59 -07:00
xinhe-nv
f1086e7d4f
test: [CI] remove closed bugs ( #6381 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-29 19:01:23 +10:00
xinhe-nv
4fbb344caf
test: [CI] Add failed cases into waives.txt ( #6423 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-29 19:00:30 +10:00
Yukun He
0eee2e2850
[5385981] fix: Update the usage of VisionAttention init API. ( #6413 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-29 16:41:48 +08:00
ruodil
e11255e9d0
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases ( #6430 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-29 15:52:45 +10:00
Michal Guzek
2573bb729d
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests ( #6303 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-28 14:02:14 -07:00
Aurelien Chartier
738ab61593
[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs ( #6339 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-28 12:36:44 -07:00
2ez4bz
cdca541148
[test] Unwaive mistral3.1 small E2E test ( #6352 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 14:37:42 -04:00
2ez4bz
60e4d3a9d4
[test] Add accuracy regression test for Mistral3.1 ( #6322 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 09:41:44 -07:00
ruodil
03632a679f
test: organize perf cases and add missing perflab cases in qa test list ( #6283 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-28 20:33:32 +10:00
xinhe-nv
971be1fe86
test: waive failed cases ( #6394 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-28 20:31:43 +10:00
Yan Chunwei
45d441e60c
[TRTLLM-5061] chore: add status tags to LLM API reference ( #5707 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 15:57:07 +08:00
Ivy Zhang
2945817cae
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name ( #6292 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-28 15:33:30 +08:00
Emma Qiao
b3ca159787
[Infa] - waive failed cases and fix a typo ( #6384 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-28 02:06:57 -04:00
Chang Liu
dc757799e1
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common ( #6266 )
2025-07-27 23:29:21 -04:00
Yan Chunwei
908f49a4ad
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch ( #6359 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 09:01:10 +08:00
Michal Guzek
08d57123f9
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache ( #5974 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-25 18:10:40 -04:00
Iman Tabrizian
c35c78ff58
[fix][nvbugs/5390810] Improve the check for disaggregated serving test ( #6301 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-25 12:47:01 -07:00
nv-guomingz
b8d4cb8beb
feat: Support JSON Schema in OpenAI-Compatible API ( #6321 )
...
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
2025-07-25 12:55:56 -04:00
pcastonguay
3805976e90
fix: Fixing kv_cache_events unit tests [nvbug 5362412] ( #6265 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-25 08:55:44 -04:00
xiaoqi
a0aecf0476
[feat]: support logit_bias ( #5354 )
...
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-25 09:37:41 +00:00
xinhe-nv
470544cf17
test: [CI] Add failed cases into waives.txt ( #6333 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-25 17:18:06 +10:00
xinhe-nv
6268a60ab3
tests: add test_chunked_prefill for llama4 ( #5549 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 23:02:00 -04:00
xinhe-nv
2dcfa90e99
test: skip llama3.3 70b test on cg4 ( #6293 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 19:29:56 -07:00
Mike Iovine
0f2f11f90b
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model ( #6104 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-24 21:50:11 -04:00
Shiyu Li
375f74ecb2
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. ( #6237 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-07-25 08:01:40 +08:00
Stefan Niebler
0df758ec9f
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration ( #6217 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-24 18:04:41 +02:00
bhsueh_NV
7b6aadc800
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend ( #6235 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-24 21:47:37 +08:00
Emma Qiao
0cc1f8c03d
[Infra] - Wiave failed tests in post-merge ( #6331 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-24 21:18:06 +08:00
Ivy Zhang
f290108cd8
tests: only get timeout value from pytest marker ( #6287 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-24 20:51:02 +08:00
liji-nv
14d94a3856
feat: Add non UB AR + Residual + Norm + Quant fusion ( #6320 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-24 05:51:43 -04:00
Iman Tabrizian
5fceaa6153
Revert "tests: add timeout_manager to tensorrt flow test cases ( #5942 )" ( #6309 )
2025-07-23 23:58:10 -04:00
Emma Qiao
82d03ca979
[Infra] - Increase unittest execution time since some test exceeds 1600 ( #6277 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-24 10:02:28 +08:00
Iman Tabrizian
7740bfa31d
Waive tests ( #6312 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-23 18:15:07 -07:00
Lucas Liebenwein
cf4f4e8d73
[AutoDeploy] disable flaky MoE nvfp4 test ( #6302 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-07-23 13:13:01 -04:00
Emma Qiao
cb737a5fcd
[Infra] - Skip failed cases ( #6299 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-23 21:26:31 +08:00
Stefan Niebler
2486eb778e
[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLLM Sampler ( #6223 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-23 12:30:50 +02:00
xinhe-nv
2b0fa24175
test: [CI] Add failed cases into waives.txt ( #6289 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-23 19:04:21 +10:00
YueWeng
ed62a06eef
[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue ( #6136 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-07-23 14:53:37 +08:00
Yechan Kim
83c3ed128b
chore: set default device to cpu on Multimodal models ( #5994 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-22 21:45:31 -07:00
Venky
9538c8d0e5
Add basic Nemo Ckpt Lora Loading in pytorch flow ( #6019 )
2025-07-22 19:42:45 -07:00
wili
8ecdeee300
[refactor] Simplification of Speculative decoding configs - Part 2 ( #5936 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-23 09:20:27 +08:00
Iman Tabrizian
bc2fb29c5e
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support ( #6224 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-23 05:27:16 +08:00
Lucas Liebenwein
41fb8aa8b1
[AutoDeploy] merge feat/ad-2025-07-07 ( #6196 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-07-23 05:11:04 +08:00
2ez4bz
ab7434ac62
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM ( #6152 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-22 11:06:41 -07:00
John Calderon
b7c8a672da
[Issue 6193] Fix gemma3vl weight loader ( #6233 )
...
Signed-off-by: John Calderon <johncalesp@gmail.com>
2025-07-22 10:32:18 -07:00
Linda
60073731ca
fix: bindings unit tests for nanobind ( #6221 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-22 14:51:43 +01:00
Stanley Sun
04f2d4b2eb
test: update test list for RTX6KD ( #6213 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-22 18:55:24 +08:00
Pengyun Lin
48ddc3d4b9
[fix]: Revert commit 388b491 ( #6143 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
pcastonguay
310bdd9830
fix: Fix triton backend build [nvbug 5396469] ( #6098 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Yi Zhang
eb7d0f84b5
[nvbugs/5368410][fix] Disable moe allreduce for multi node ( #5918 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Nikita Korobov
9d26b7891a
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm ( #5849 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Yan Chunwei
f194b65f3e
fix [nvbug/5351244]: address remote mpi session submit ( #5664 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Bo Li
537757e669
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. ( #5896 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Bo Li
db77d83a2a
bug: [ https://nvbugs/5368507 ] Fix test_generate_with_seed. ( #6206 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-22 12:28:38 +08:00
2ez4bz
37d0b68442
[fix] Fix flaky mistral E2E test ( #6230 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-22 11:55:28 +08:00
WeiHaocheng
fddb7f1141
feat: moe prepare support topk % 4 != 0 ( #5742 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-07-22 10:42:46 +08:00
Ivy Zhang
eb5cb5b642
tests: add timeout_manager to tensorrt flow test cases ( #5942 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-22 10:23:41 +08:00
Shunkangz
ee45e0c63f
feat: Refactor the fetching request logic ( #5786 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-07-22 09:16:28 +08:00
Chang Liu
7381f1dba7
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models ( #5444 )
...
Only supports qwen in this PR
2025-07-21 16:11:58 -07:00
Simeng Liu
4a0951f85c
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] ( #5859 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-21 15:46:37 -07:00
Mike Iovine
9645814bdf
[chore] Clean up quickstart_advanced.py ( #6021 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-21 15:00:59 -04:00
Yi Zhang
f9b0a911fb
test: Enable GB200 torch compile multi gpu tests ( #6145 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-21 22:17:13 +08:00
Pengyun Lin
9832bef07d
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve ( #5717 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-21 21:09:43 +08:00
Emma Qiao
e41507a253
[Infra] - Waive failed cases on recent post-merge ( #6212 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-21 21:00:18 +08:00
liji-nv
3e0fb60e50
[TRTLLM-4279] feat: Multistream initial support for torch compile flow ( #5847 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-21 19:10:22 +08:00
Linda
3efad2e58c
feat: nanobind bindings ( #6185 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
xinhe-nv
b46fd41026
test: [CI] remove closed bugs ( #6201 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-21 15:40:30 +08:00
Yuening Li
e8c068b4b1
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow ( #5850 )
...
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
Co-authored-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-07-21 15:17:35 +08:00
brb-nv
ca9bc5727e
fix: Flush stale PlanParams with custom attention mask ( #6163 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-21 09:55:09 +08:00
ruodil
6a3c9f8061
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test ( #5826 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-21 11:29:19 +10:00
danielafrimi
5300a99bd8
W4A8 GEMM ( #6005 )
...
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-07-20 17:34:57 +03:00
amitz-nv
98428f330e
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction ( #5616 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-20 08:00:14 +03:00
bhsueh_NV
2e14c8f443
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 ( #6065 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-20 10:25:25 +08:00
Ziyi Xiong
66030ef815
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support ( #6133 )
...
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-19 13:17:15 +08:00
wili
82d3587bb8
[refactor] Unify name of NGram speculative decoding ( #5937 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-19 12:59:57 +08:00
xiaoqi
28858c8711
feat(eagle3):support qwen3 dense model ( #5879 )
...
Signed-off-by: xq25478 <xq25478@qq.com>
2025-07-19 01:24:32 +08:00
Venky
22d4a8c48a
enh: Add script to map tests <-> jenkins stages & vice-versa ( #5177 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-19 00:50:40 +08:00
Bo Deng
2c6fa145ee
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests ( #6095 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-19 00:48:44 +08:00
Stefan Niebler
fd6ce7f20e
[ci] Speedup beam search unit tests with fixtures for LLM ( #5843 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-18 22:54:49 +08:00
Erin
9522cde464
fix: NVBug 5385576 py_batch_idx issue ( #6153 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-18 22:36:43 +08:00
Emma Qiao
77acb4f753
[Infra] - Waive failed tests in post-merge ( #6176 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-18 17:34:34 +08:00
Chuang Zhu
c0e416535e
fix single_disagg_test ( #6166 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-18 13:18:37 +08:00
Zhenhuan Chen
992b273045
[ https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e ( #6140 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-18 10:34:37 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
2ez4bz
8480c120b1
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge ( #6105 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-17 11:04:17 -07:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Stanley Sun
9518e14f69
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout ( #6115 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-17 20:55:04 +10:00
Yi Zhang
a718486900
fix: Fix DeepSeek R1 CI ( #6129 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-17 18:24:49 +08:00
nv-guomingz
9b45499caa
test: update max_beam_width to 1 due to torchsampler changes. ( #6101 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-17 18:05:45 +08:00
Erin
de60ae47e3
chores: unwaive a few tests for v1.0 ( #6107 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-17 17:59:51 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler ( #6000 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Iman Tabrizian
d4d21a106e
[fix] Release slots with spec decode + disagg ( #5975 ) ( #6032 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-17 12:58:18 +08:00
chenfeiz0326
fe070a0168
test: Update Llama4 Scout FP4 & FP8 accuracy tests ( #5901 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-07-17 09:41:18 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support ( #5644 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
qixiang-99
e09e409dfb
Fix: Enhance ModelConfig for kv cache size calculations ( #5868 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-16 14:41:31 -07:00
shaharmor98
e0836f9ca9
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats ( #5372 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-17 00:50:30 +08:00
Wanli Jiang
9354114f68
fix: Update trtllm args issues with extra nested config ( #5996 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 12:41:45 -04:00
Emma Qiao
e30d7bec38
[Infra] - Waive failed cases in post-merge on main ( #6096 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-16 22:41:18 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Ivy Zhang
dda91b5117
tests: add QA test cases ( #5959 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:14:25 +08:00
Yan Chunwei
7568deb2f1
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig ( #6001 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:05:38 +08:00
Ivy Zhang
763012a88a
[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill ( #6051 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:04:08 +08:00
peaceh-nv
f5f31beee1
feat: Add deepseek-lite tests for RTX pro 6000 ( #5903 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-16 15:51:45 +08:00
Zheng Duan
385af53a4d
[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test ( #6041 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-16 13:52:13 +08:00
Wanli Jiang
8679a058a3
fix: Unable to load phi4-model with tp_size>1 ( #5962 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 11:39:41 +08:00
Aurelien Chartier
6a47cac981
feat: Add support for Triton request cancellation ( #5898 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-15 20:52:43 -04:00
danielafrimi
edab7532dd
feat/add latency support for trtllm bench ( #3730 )
...
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Daniel Afrimi <dafrimi@nvidia.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-07-15 13:13:49 -07:00
brb-nv
9214ac662a
test: Add regression tests for Gemma3 VLM ( #6033 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-15 11:37:56 -07:00
Fanrong Li
7a1af1c738
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 ( #5989 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-07-16 01:33:12 +09:00
MinaHuai
9ebc3ab9c4
[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision ( #5998 )
...
Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>
2025-07-15 10:01:35 -04:00
Jaedeok Kim
ab1c54709d
fix: adjust window sizes of VSWA at torch backend ( #5880 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2025-07-15 17:41:54 +08:00
ruodil
2a147c4d01
test: add llama_v3.3_70b_cases in perf test ( #6035 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:59 +10:00
ruodil
2504aa552e
test: add recursive updating pytorch config and change MOE backend format in perf test ( #6046 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:15 +10:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Yiqing Yan
6b35afaf1b
[Infra][TRTLLM-6013] - Fix stage name in single stage test rerun report ( #5672 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-15 12:27:21 +09:00
ixlmar
f225f5cd2e
[nvbugs-5318143] fix: restrict PyTorch memory usage to avoid OOMs ( #5964 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-15 06:49:42 +08:00
Iman Tabrizian
c4ee535afb
[fix] fix eagle3 two model disaggregated serving test ( #6014 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-15 04:26:04 +09:00
brb-nv
f5f5be9e94
enh: Bidirectional mask with multiple images for Gemma3 ( #5976 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 22:39:18 +08:00
brb-nv
1a2d96919c
feat: Update Gemma3 Vision Encoder ( #5973 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 22:38:10 +08:00
Clay
dbf29184dc
fix #4974 : A thread leak issue in scaffolding unittest ( #5020 )
...
Signed-off-by: Clay <ccs96307@gmail.com>
2025-07-14 20:22:03 +09:00
Kaiyu Xie
aa97fbb2ad
[Nvbug/5383670] fix: switch test case to non-fp4 ckpt for more GPU coverage ( #5882 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-14 20:21:46 +09:00
Yiqing Yan
c720d7f779
Waive L0 test ( #6002 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-14 19:55:34 +09:00
Zhanrui Sun
3a0ef73414
infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check ( #5709 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-14 18:52:13 +09:00
Zhenhuan Chen
30608a5e6d
[ https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error ( #5865 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-14 17:17:30 +08:00
Robin Kobus
5a61d64b5b
[nvbugs/5345391] fix: chunked prefill + overlap scheduling ( #5761 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
3fcaa8a310
[nvbug 5327706][fix] fix mgmn postprocess error ( #5835 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
ruodil
347520494b
test: remove duplicate cases in perf sanity test ( #5870 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Bo Li
6d79559f3e
fix: [ https://nvbugs/5351130 ][ https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. ( #5821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Bo Li
2991cf4b80
fix: [ https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. ( #5606 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yan Chunwei
3e1fd983c3
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights ( #5744 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
388b4919b8
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend ( #5541 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
6992616c1f
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens ( #5201 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
ruodil
278a1a7df3
test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases ( #5693 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Iman Tabrizian
c8874a7f94
[nvbug/5337601][fix] Fix disagg + speculative decoding ( #5558 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yi Zhang
9cc4e5d50e
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation ( #5463 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yi Zhang
e5e87ecf34
test: Move some of the test from post merge to pre-merge, update dgx b200 test case ( #5640 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
brb-nv
869e88304a
[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test ( #5735 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
dominicshanshan
c9e7f831dc
Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default ( #5480 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-07-14 16:42:23 +08:00
Yan Chunwei
9c673e9707
[TRTLLM-6160] chore: add sampling examples for pytorch ( #5951 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 15:28:32 +09:00
Yan Chunwei
c30eead09f
[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch ( #5956 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 14:09:39 +08:00
QI JUN
ce39409530
fix cancel request logic ( #5800 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-14 10:23:20 +08:00
wili
3dfc819849
[BUG5374319][fix] WAR for draft-target-model unit tests error ( #5958 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-12 23:48:57 +09:00
Mike Iovine
8950223f6f
[fix] Remove SpecConfig and fix thread leak issues ( #5931 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-12 21:03:24 +09:00
Enwei Zhu
bc1d4fb5da
[NvBug 5378370] fix: Fix alltoall for llama4 (apply_router_weight_on_input=True) ( #5902 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-12 15:50:31 +09:00
Chang Liu
308776442a
[nvbug/5308432] fix: extend triton exit time for test_llava ( #5971 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-12 12:56:37 +09:00
Thor Johnsen
041f1fa513
[TRTLLM-6264] Fix flaky test_e2e.py::test_openai_lora ( #5885 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-07-11 16:20:41 -07:00
xinhe-nv
509363d858
tests: update sanity tests & fix tests ( #5906 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-11 19:48:19 +10:00
brb-nv
0385f89abc
test: Fix Gemma3 unit tests due to transformers upgrade ( #5921 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-10 17:24:10 -07:00
2ez4bz
c19840235d
[fix] Fix mistral unit tests due to transformers upgrade ( #5904 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-10 10:45:27 -07:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yiqing Yan
3aa53ec36c
[None] - Waive L0 tests ( #5915 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-10 18:33:17 +08:00
Enwei Zhu
055c4a9fe6
[NvBug 5370718, 5371538] fix: Fix incremental detokenization ( #5825 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-10 16:30:00 +08:00
CarstyYou
dc32f9ae73
[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm ( #5531 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-07-10 15:16:18 +08:00
Anthony Chang
7d21b55b5a
[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE ( #5723 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-07-10 14:06:50 +08:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner ( #5876 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
Venky
f57b3d6829
Waive unittest failures introduced by PR#5345 (removal of ScaffoldingOutput class) ( #5886 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-10 09:53:31 +08:00
peaceh-nv
76c3a12bcb
[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 ( #5636 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-10 09:20:30 +08:00
brb-nv
3209b31665
feat: Custom masking utils for Gemma3 VLM ( #5853 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-10 06:18:04 +09:00
2ez4bz
87fe44fd29
feat(models): Mistral3.1 VLM pytorch backend support ( #5529 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-09 13:17:40 -07:00
Chang Liu
b61a717275
[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes ( #5396 )
2025-07-10 05:12:53 +09:00
Wanli Jiang
3f7cedec7c
Update transformers to 4.53.0 ( #5747 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-09 09:32:24 -07:00
DylanChen-NV
74dca0aa7b
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support ( #5029 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-09 23:16:42 +08:00
Omer Ullman Argov
a32f7083b4
[ci] parallelize torch unittests ( #5714 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-09 11:05:57 +03:00
Dom Brown
3e3b1769ad
[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner ( #5764 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-07-09 08:21:58 +01:00