Ziyi Xiong
|
b4fcd5f592
|
[https://nvbugs/5441438][fix] Set correct draft length for the cuda graph dummy request (#6701)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-08-12 09:28:47 +08:00 |
|
Tracin
|
49bcaa4e95
|
Add gpt-oss GSM8K test. (#6732)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-08-10 22:45:43 -04:00 |
|
Leslie Fang
|
294e0d3dab
|
[https://nvbugs/5436461][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI (#6631)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-08 15:30:47 +08:00 |
|
Li Min
|
d913955952
|
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-08-08 15:03:48 +08:00 |
|
Enwei Zhu
|
aee828d98a
|
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-08 12:10:36 +08:00 |
|
Daniel Cámpora
|
efca359b66
|
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-08-07 22:19:37 -04:00 |
|
Enwei Zhu
|
1b9781e8e7
|
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-07 05:53:48 -04:00 |
|
xinhe-nv
|
0a467b00cc
|
[https://nvbugs/5409414][fix] fix Not registered specs (#6660)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-07 17:55:53 +10:00 |
|
hlu1
|
8207d5fd39
|
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-07 03:04:18 -04:00 |
|
liji-nv
|
dcbfa7e509
|
[https://nvbugs/5252313][fix] Fix torch compile + MTP (#6554)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-08-05 10:31:29 -04:00 |
|
Pengbo Wang @ NVIDIA
|
c289880afb
|
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-08-05 18:05:33 +08:00 |
|
Ivy Zhang
|
d101a6cebc
|
[https://nvbugs/5410279][test] resubmit timeout refactor (#6337)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-05 16:39:25 +08:00 |
|
Haohang Huang
|
c9eebcb454
|
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
|
2025-08-05 07:47:41 +00:00 |
|
Leslie Fang
|
164acfa31e
|
[None][infra] Skip test_eagle3 test with device memory check (#6617)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-05 02:36:03 -04:00 |
|
xinhe-nv
|
a178cea324
|
[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-05 12:47:53 +10:00 |
|
Leslie Fang
|
a60190836c
|
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-04 01:45:24 -04:00 |
|
Ivy Zhang
|
5eefdf2c75
|
tests: Add llama4 functional cases (#6392)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-04 11:19:58 +08:00 |
|
Yechan Kim
|
ee6ab5be96
|
chore: add EXAONE4 accuracy test (#6397)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-04 10:14:16 +08:00 |
|
Ivy Zhang
|
7547a7d0a2
|
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-03 22:11:26 -04:00 |
|
Jhao-Ting Chen
|
4da5cfc511
|
[None][infra] add eagle3 one model accuracy tests (#6264)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-08-02 16:07:46 -07:00 |
|
Lizhi Zhou
|
6f34f3489b
|
[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-08-01 13:33:34 -04:00 |
|
xinhe-nv
|
263c6c0ad0
|
test: skip post blackwell (#6357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-01 13:10:14 -04:00 |
|
brb-nv
|
7447d6ed85
|
[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-01 09:19:54 -04:00 |
|
liji-nv
|
1daa8c3232
|
[https://nvbugs/5340941][https://nvbugs/5375785] - fix: Wrap attentio… (#6355)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-08-01 07:38:06 -04:00 |
|
xinhe-nv
|
fca0d37798
|
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 (#6552)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-01 20:27:22 +10:00 |
|
chenfeiz0326
|
ba5bdbb138
|
[None][chore] Disable add special tokens for Llama3.3 70B (#6482)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-08-01 17:03:27 +08:00 |
|
brb-nv
|
2eca0d5925
|
fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-31 17:18:23 -07:00 |
|
xinhe-nv
|
ca534e4798
|
test: add accuracy reference (#6479)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-31 12:27:29 +10:00 |
|
bhsueh_NV
|
ae3a5fc918
|
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-31 09:37:23 +08:00 |
|
Bo Deng
|
24e7f4eece
|
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-31 00:41:37 +08:00 |
|
Wanli Jiang
|
9632dba02e
|
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-30 09:20:16 -07:00 |
|
pcastonguay
|
e7ae5e2824
|
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-07-30 09:42:13 -04:00 |
|
xinhe-nv
|
d9ab3fd35e
|
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 18:45:58 +10:00 |
|
Michal Guzek
|
2573bb729d
|
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-28 14:02:14 -07:00 |
|
2ez4bz
|
60e4d3a9d4
|
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 09:41:44 -07:00 |
|
xinhe-nv
|
971be1fe86
|
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-28 20:31:43 +10:00 |
|
Ivy Zhang
|
2945817cae
|
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-28 15:33:30 +08:00 |
|
xinhe-nv
|
470544cf17
|
test: [CI] Add failed cases into waives.txt (#6333)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-25 17:18:06 +10:00 |
|
xinhe-nv
|
6268a60ab3
|
tests: add test_chunked_prefill for llama4 (#5549)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-24 23:02:00 -04:00 |
|
xinhe-nv
|
2dcfa90e99
|
test: skip llama3.3 70b test on cg4 (#6293)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-24 19:29:56 -07:00 |
|
bhsueh_NV
|
7b6aadc800
|
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-24 21:47:37 +08:00 |
|
Iman Tabrizian
|
5fceaa6153
|
Revert "tests: add timeout_manager to tensorrt flow test cases (#5942)" (#6309)
|
2025-07-23 23:58:10 -04:00 |
|
Bo Li
|
537757e669
|
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. (#5896)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Ivy Zhang
|
eb5cb5b642
|
tests: add timeout_manager to tensorrt flow test cases (#5942)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-22 10:23:41 +08:00 |
|
Yi Zhang
|
f9b0a911fb
|
test: Enable GB200 torch compile multi gpu tests (#6145)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-21 22:17:13 +08:00 |
|
liji-nv
|
3e0fb60e50
|
[TRTLLM-4279] feat: Multistream initial support for torch compile flow (#5847)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-21 19:10:22 +08:00 |
|
bhsueh_NV
|
2e14c8f443
|
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-20 10:25:25 +08:00 |
|
xiaoqi
|
28858c8711
|
feat(eagle3):support qwen3 dense model (#5879)
Signed-off-by: xq25478 <xq25478@qq.com>
|
2025-07-19 01:24:32 +08:00 |
|
Yi Zhang
|
a718486900
|
fix: Fix DeepSeek R1 CI (#6129)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-17 18:24:49 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|