Commit Graph

217 Commits

Author SHA1 Message Date
Ivy Zhang
1f7a1645d6
[None][fix] update skip case (#7193)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 12:31:48 +08:00
Shi Xiaowei
d010b2043a
[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#7191)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-25 20:21:43 +08:00
Michal Guzek
7ea53ff516
[https://nvbugs/5433545][fix] TestPhi4MiniInstruct::test_auto_dtype - Use max_seq_len=4096 to fallback to the short RoPE factor (#6895)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-08-22 10:28:09 -07:00
Pamela Peng
1e5a6be55d
[https://nvbugs/5448442][fix] Skip trtllm moe backend for sm120 (#7010)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-08-21 13:34:07 -04:00
chenfeiz0326
5acf213a15
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#7075)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-20 18:11:00 -04:00
Jin Li
69846c6586
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6978)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-20 15:06:56 +08:00
Bo Deng
df00c81aea
[https://nvbugs/5448437][fix] fix some nixl tests (#6940)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-20 14:19:48 +08:00
brb-nv
da91256503
[None][chore] Waive E2E GB200 tests for Gemma3 27B (#6916)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-19 05:19:34 -04:00
Yechan Kim
d6c2a6a81f
[https://nvbugs/5448579][fix] EXAONE-4.0 accuracy test bugfix (#6888)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-19 09:29:32 +02:00
Nave Assaf
d4dd5b4f4d
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters… (#6987)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-19 09:19:50 +02:00
Ivy Zhang
055fdd9e31
[None][fix] update skip config (#6891)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-18 13:50:46 +08:00
William Zhang
c16aff5e3f
[https://nvbugs/5448525][fix] Mistral Small 3.1 accuracy tests (#6909)
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 11:17:37 +08:00
2ez4bz
ccb62ef97e
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731)
This commit adds some level of FP8 support to Mistral Small 3.1 by:

* disabling quantization for the vision sub-model since `modelopt` does
  support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
  checkpoint.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-13 21:25:55 -04:00
Ivy Zhang
fd8f417bf2
[None][fix] fix Llama3 eagle3 test case OOM (#6832)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-13 02:21:05 -04:00
xinhe-nv
0958efdcff
[None][chore] waive GB300 known issues (#6812)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-13 13:13:36 +08:00
Ivy Zhang
15bcf80596
[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-13 13:10:13 +08:00
Yuxian Qiu
cf00003f3d
[None][fix] fix CUDA graph config for test_llm_api_pytorch.py. (#6826)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-13 10:24:15 +08:00
Raayan Dhar
ddf8e8d1a0
[None][feat] adding support for disaggregated multi-instance tests (#6674)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-11 13:00:57 -07:00
Bo Deng
d289d85bff
[TRTLLM-6675][infra] Nixl test completion (#6623)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-08 10:15:54 +08:00
brb-nv
4adde41632
[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-07 13:14:04 -04:00
liji-nv
dcbfa7e509
[https://nvbugs/5252313][fix] Fix torch compile + MTP (#6554)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-05 10:31:29 -04:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Ivy Zhang
d101a6cebc
[https://nvbugs/5410279][test] resubmit timeout refactor (#6337)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 16:39:25 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Leslie Fang
164acfa31e
[None][infra] Skip test_eagle3 test with device memory check (#6617)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-05 02:36:03 -04:00
xinhe-nv
a178cea324
[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 12:47:53 +10:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
Ivy Zhang
5eefdf2c75 tests: Add llama4 functional cases (#6392)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
Yechan Kim
ee6ab5be96
chore: add EXAONE4 accuracy test (#6397)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-04 10:14:16 +08:00
Ivy Zhang
7547a7d0a2
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-03 22:11:26 -04:00
Jhao-Ting Chen
4da5cfc511
[None][infra] add eagle3 one model accuracy tests (#6264)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-02 16:07:46 -07:00
Lizhi Zhou
6f34f3489b
[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-01 13:33:34 -04:00
xinhe-nv
263c6c0ad0
test: skip post blackwell (#6357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 13:10:14 -04:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
liji-nv
1daa8c3232
[https://nvbugs/5340941][https://nvbugs/5375785] - fix: Wrap attentio… (#6355)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-01 07:38:06 -04:00
xinhe-nv
fca0d37798
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 (#6552)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 20:27:22 +10:00
chenfeiz0326
ba5bdbb138
[None][chore] Disable add special tokens for Llama3.3 70B (#6482)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-01 17:03:27 +08:00
brb-nv
2eca0d5925
fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-31 17:18:23 -07:00
xinhe-nv
ca534e4798
test: add accuracy reference (#6479)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-31 12:27:29 +10:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
Bo Deng
24e7f4eece
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-31 00:41:37 +08:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
xinhe-nv
d9ab3fd35e
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 18:45:58 +10:00
Michal Guzek
2573bb729d
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-28 14:02:14 -07:00
2ez4bz
60e4d3a9d4
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 09:41:44 -07:00
xinhe-nv
971be1fe86
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-28 20:31:43 +10:00
Ivy Zhang
2945817cae
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-28 15:33:30 +08:00
xinhe-nv
470544cf17
test: [CI] Add failed cases into waives.txt (#6333)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-25 17:18:06 +10:00
xinhe-nv
6268a60ab3
tests: add test_chunked_prefill for llama4 (#5549)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 23:02:00 -04:00