Commit Graph

498 Commits

Author SHA1 Message Date
xinhe-nv
e35fca4272
[TRTQA-2920][chore] improve hang tests (#6781)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-12 18:26:51 +08:00
Enwei Zhu
7c686ba8de
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill (#6774)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-12 09:30:06 +08:00
Ziyi Xiong
b4fcd5f592
[https://nvbugs/5441438][fix] Set correct draft length for the cuda graph dummy request (#6701)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-12 09:28:47 +08:00
Aurelien Chartier
56bfc3a6d2
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically (#6763)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-11 15:18:19 -07:00
Tracin
49bcaa4e95
Add gpt-oss GSM8K test. (#6732)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-08-10 22:45:43 -04:00
Chuang Zhu
c566a8d2a2
[None][fix] fix same pp disagg (#6730)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-10 22:45:15 -04:00
Bo Deng
767879ef85
[https://nvbugs/5431127][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6736)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 10:05:10 +08:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation (#5785)
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Leslie Fang
294e0d3dab
[https://nvbugs/5436461][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI (#6631)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-08 15:30:47 +08:00
Li Min
d913955952
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-08-08 15:03:48 +08:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
ruodil
22f45a0e19
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 22:57:04 -04:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Yuan Tong
db8dc97b7b
[None][fix] Migrate to new cuda binding package name (#6700)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-07 16:29:55 -04:00
Raayan Dhar
4055b764db
[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
2025-08-07 11:18:02 -04:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
xinhe-nv
0a467b00cc
[https://nvbugs/5409414][fix] fix Not registered specs (#6660)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 17:55:53 +10:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
ruodil
f30398470d
[None][chore] update readme for perf release test (#6664)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:00:45 +10:00
Yechan Kim
1aed7511fe
[https://nvbugs/5430124][fix] Mistral mixture_text_image test case fix (#6648)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-06 06:58:58 -07:00
ruodil
907c180eb2
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 02:25:57 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization (#6061)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Yechan Kim
c17f4984e2
[None][feat] Refactor Llava-Next (#6478)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-05 17:53:53 -07:00
ixlmar
1ebceb790d
[TRTLLM-5508][feat] check input tokens + improve error handling (#5170)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-08-05 18:27:43 +01:00
liji-nv
dcbfa7e509
[https://nvbugs/5252313][fix] Fix torch compile + MTP (#6554)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-05 10:31:29 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Ivy Zhang
d101a6cebc
[https://nvbugs/5410279][test] resubmit timeout refactor (#6337)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 16:39:25 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Leslie Fang
164acfa31e
[None][infra] Skip test_eagle3 test with device memory check (#6617)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-05 02:36:03 -04:00
ruodil
7625845365
test: add README_release_test.md for perf test (#6443)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-05 02:07:42 -04:00
xinhe-nv
a178cea324
[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 12:47:53 +10:00
Pengyun Lin
a15e33351d
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens (#6259)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-04 15:09:51 +08:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
Ivy Zhang
5eefdf2c75 tests: Add llama4 functional cases (#6392)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
ruodil
8d82ccca63
test: modify max_lora_rank of phi4_multimodal to 320 (#6474)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 12:20:22 +10:00
Yechan Kim
ee6ab5be96
chore: add EXAONE4 accuracy test (#6397)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-04 10:14:16 +08:00
Ivy Zhang
7547a7d0a2
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-03 22:11:26 -04:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Jhao-Ting Chen
4da5cfc511
[None][infra] add eagle3 one model accuracy tests (#6264)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-02 16:07:46 -07:00
Lizhi Zhou
6f34f3489b
[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-01 13:33:34 -04:00
xinhe-nv
263c6c0ad0
test: skip post blackwell (#6357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 13:10:14 -04:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
liji-nv
1daa8c3232
[https://nvbugs/5340941][https://nvbugs/5375785] - fix: Wrap attentio… (#6355)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-01 07:38:06 -04:00
xinhe-nv
fca0d37798
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 (#6552)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 20:27:22 +10:00
chenfeiz0326
ba5bdbb138
[None][chore] Disable add special tokens for Llama3.3 70B (#6482)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-01 17:03:27 +08:00
Ivy Zhang
71524a1a48
[https://nvbugs/5419066][fix] Use trt flow LLM (#6467)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-01 03:33:07 -04:00
Venky
ad5742b105
[fix] Update get_trtllm_bench_build_command to handle batch size and tokens (#6313)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-01 00:08:09 -04:00
brb-nv
2eca0d5925
fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-31 17:18:23 -07:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. (#6232)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00