Commit Graph

1091 Commits

Author SHA1 Message Date
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
pcastonguay
0f083b9daf
fix: Unwaive triton cpp test [nvbug 5401088] (#6412)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-30 11:25:18 -04:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
Chang Liu
b4065d8ca6
[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-07-30 10:00:15 -04:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
tomeras91
a2514d93fc
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-30 07:22:32 -04:00
Yechan Kim
22b29df38c
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 17:29:55 +08:00
xinhe-nv
d9ab3fd35e
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 18:45:58 +10:00
nv-guomingz
a5540acfce
chore: add trtllm-serve json schema example into doc. (#6418)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 04:33:08 -04:00
2ez4bz
d6eed1b624
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-30 14:10:36 +08:00
xinhe-nv
c00d6763b2
test: [CI] Add failed cases into waives.txt (#6457)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-30 12:36:58 +10:00
Venky
ab40369053
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 10:53:43 +10:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
Yan Chunwei
ad662ddcdd
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-29 16:16:52 -04:00
Yan Chunwei
1a6930986a
chore: remove unused kv_cache_dtype in api reference (#6444)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-29 14:57:20 -04:00
Michal Guzek
7efe3cb0cd
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-07-29 10:16:59 -07:00
xinhe-nv
f1086e7d4f
test: [CI] remove closed bugs (#6381)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-29 19:01:23 +10:00
xinhe-nv
4fbb344caf
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-29 19:00:30 +10:00
Yukun He
0eee2e2850
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-29 16:41:48 +08:00
ruodil
e11255e9d0
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-29 15:52:45 +10:00
Michal Guzek
2573bb729d
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-28 14:02:14 -07:00
Aurelien Chartier
738ab61593
[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs (#6339)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-28 12:36:44 -07:00
2ez4bz
cdca541148
[test] Unwaive mistral3.1 small E2E test (#6352)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 14:37:42 -04:00
2ez4bz
60e4d3a9d4
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 09:41:44 -07:00
ruodil
03632a679f
test: organize perf cases and add missing perflab cases in qa test list (#6283)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-28 20:33:32 +10:00
xinhe-nv
971be1fe86
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-28 20:31:43 +10:00
Yan Chunwei
45d441e60c
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 15:57:07 +08:00
Ivy Zhang
2945817cae
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-28 15:33:30 +08:00
Emma Qiao
b3ca159787
[Infa] - waive failed cases and fix a typo (#6384)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-28 02:06:57 -04:00
Chang Liu
dc757799e1
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266) 2025-07-27 23:29:21 -04:00
Yan Chunwei
908f49a4ad
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 09:01:10 +08:00
Michal Guzek
08d57123f9
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-25 18:10:40 -04:00
Iman Tabrizian
c35c78ff58
[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-25 12:47:01 -07:00
nv-guomingz
b8d4cb8beb
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
2025-07-25 12:55:56 -04:00
pcastonguay
3805976e90
fix: Fixing kv_cache_events unit tests [nvbug 5362412] (#6265)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-25 08:55:44 -04:00
xiaoqi
a0aecf0476
[feat]: support logit_bias (#5354)
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-25 09:37:41 +00:00
xinhe-nv
470544cf17
test: [CI] Add failed cases into waives.txt (#6333)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-25 17:18:06 +10:00
xinhe-nv
6268a60ab3
tests: add test_chunked_prefill for llama4 (#5549)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 23:02:00 -04:00
xinhe-nv
2dcfa90e99
test: skip llama3.3 70b test on cg4 (#6293)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 19:29:56 -07:00
Mike Iovine
0f2f11f90b
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#6104)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-24 21:50:11 -04:00
Shiyu Li
375f74ecb2
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237)
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-07-25 08:01:40 +08:00
Stefan Niebler
0df758ec9f
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration (#6217)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-24 18:04:41 +02:00
bhsueh_NV
7b6aadc800
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-24 21:47:37 +08:00
Emma Qiao
0cc1f8c03d
[Infra] - Wiave failed tests in post-merge (#6331)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-24 21:18:06 +08:00
Ivy Zhang
f290108cd8
tests: only get timeout value from pytest marker (#6287)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-24 20:51:02 +08:00
liji-nv
14d94a3856
feat: Add non UB AR + Residual + Norm + Quant fusion (#6320)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-24 05:51:43 -04:00
Iman Tabrizian
5fceaa6153
Revert "tests: add timeout_manager to tensorrt flow test cases (#5942)" (#6309) 2025-07-23 23:58:10 -04:00
Emma Qiao
82d03ca979
[Infra] - Increase unittest execution time since some test exceeds 1600 (#6277)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-24 10:02:28 +08:00
Iman Tabrizian
7740bfa31d
Waive tests (#6312)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-23 18:15:07 -07:00
Lucas Liebenwein
cf4f4e8d73
[AutoDeploy] disable flaky MoE nvfp4 test (#6302)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-07-23 13:13:01 -04:00