Wanli Jiang
|
9632dba02e
|
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-30 09:20:16 -07:00 |
|
pcastonguay
|
0f083b9daf
|
fix: Unwaive triton cpp test [nvbug 5401088] (#6412)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-30 11:25:18 -04:00 |
|
nv-guomingz
|
03e38c9087
|
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 11:11:06 -04:00 |
|
Chang Liu
|
b4065d8ca6
|
[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-07-30 10:00:15 -04:00 |
|
pcastonguay
|
e7ae5e2824
|
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-07-30 09:42:13 -04:00 |
|
tomeras91
|
a2514d93fc
|
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-30 07:22:32 -04:00 |
|
Yechan Kim
|
22b29df38c
|
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 17:29:55 +08:00 |
|
xinhe-nv
|
d9ab3fd35e
|
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 18:45:58 +10:00 |
|
nv-guomingz
|
a5540acfce
|
chore: add trtllm-serve json schema example into doc. (#6418)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 04:33:08 -04:00 |
|
2ez4bz
|
d6eed1b624
|
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-30 14:10:36 +08:00 |
|
xinhe-nv
|
c00d6763b2
|
test: [CI] Add failed cases into waives.txt (#6457)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-30 12:36:58 +10:00 |
|
Venky
|
ab40369053
|
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 10:53:43 +10:00 |
|
Yechan Kim
|
d6eb8e2366
|
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 08:52:31 +08:00 |
|
Yan Chunwei
|
ad662ddcdd
|
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 16:16:52 -04:00 |
|
Yan Chunwei
|
1a6930986a
|
chore: remove unused kv_cache_dtype in api reference (#6444)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 14:57:20 -04:00 |
|
Michal Guzek
|
7efe3cb0cd
|
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
2025-07-29 10:16:59 -07:00 |
|
xinhe-nv
|
f1086e7d4f
|
test: [CI] remove closed bugs (#6381)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-29 19:01:23 +10:00 |
|
xinhe-nv
|
4fbb344caf
|
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-29 19:00:30 +10:00 |
|
Yukun He
|
0eee2e2850
|
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-29 16:41:48 +08:00 |
|
ruodil
|
e11255e9d0
|
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-29 15:52:45 +10:00 |
|
Michal Guzek
|
2573bb729d
|
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-28 14:02:14 -07:00 |
|
Aurelien Chartier
|
738ab61593
|
[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs (#6339)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-28 12:36:44 -07:00 |
|
2ez4bz
|
cdca541148
|
[test] Unwaive mistral3.1 small E2E test (#6352)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 14:37:42 -04:00 |
|
2ez4bz
|
60e4d3a9d4
|
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 09:41:44 -07:00 |
|
ruodil
|
03632a679f
|
test: organize perf cases and add missing perflab cases in qa test list (#6283)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-28 20:33:32 +10:00 |
|
xinhe-nv
|
971be1fe86
|
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-28 20:31:43 +10:00 |
|
Yan Chunwei
|
45d441e60c
|
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 15:57:07 +08:00 |
|
Ivy Zhang
|
2945817cae
|
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-28 15:33:30 +08:00 |
|
Emma Qiao
|
b3ca159787
|
[Infa] - waive failed cases and fix a typo (#6384)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-28 02:06:57 -04:00 |
|
Chang Liu
|
dc757799e1
|
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266)
|
2025-07-27 23:29:21 -04:00 |
|
Yan Chunwei
|
908f49a4ad
|
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 09:01:10 +08:00 |
|
Michal Guzek
|
08d57123f9
|
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-25 18:10:40 -04:00 |
|
Iman Tabrizian
|
c35c78ff58
|
[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-25 12:47:01 -07:00 |
|
nv-guomingz
|
b8d4cb8beb
|
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
|
2025-07-25 12:55:56 -04:00 |
|
pcastonguay
|
3805976e90
|
fix: Fixing kv_cache_events unit tests [nvbug 5362412] (#6265)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-25 08:55:44 -04:00 |
|
xiaoqi
|
a0aecf0476
|
[feat]: support logit_bias (#5354)
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-25 09:37:41 +00:00 |
|
xinhe-nv
|
470544cf17
|
test: [CI] Add failed cases into waives.txt (#6333)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-25 17:18:06 +10:00 |
|
xinhe-nv
|
6268a60ab3
|
tests: add test_chunked_prefill for llama4 (#5549)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-24 23:02:00 -04:00 |
|
xinhe-nv
|
2dcfa90e99
|
test: skip llama3.3 70b test on cg4 (#6293)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-24 19:29:56 -07:00 |
|
Mike Iovine
|
0f2f11f90b
|
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#6104)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-07-24 21:50:11 -04:00 |
|
Shiyu Li
|
375f74ecb2
|
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237)
Signed-off-by: Shiyu Li <shili@nvidia.com>
|
2025-07-25 08:01:40 +08:00 |
|
Stefan Niebler
|
0df758ec9f
|
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration (#6217)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-24 18:04:41 +02:00 |
|
bhsueh_NV
|
7b6aadc800
|
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-24 21:47:37 +08:00 |
|
Emma Qiao
|
0cc1f8c03d
|
[Infra] - Wiave failed tests in post-merge (#6331)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-24 21:18:06 +08:00 |
|
Ivy Zhang
|
f290108cd8
|
tests: only get timeout value from pytest marker (#6287)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-24 20:51:02 +08:00 |
|
liji-nv
|
14d94a3856
|
feat: Add non UB AR + Residual + Norm + Quant fusion (#6320)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-24 05:51:43 -04:00 |
|
Iman Tabrizian
|
5fceaa6153
|
Revert "tests: add timeout_manager to tensorrt flow test cases (#5942)" (#6309)
|
2025-07-23 23:58:10 -04:00 |
|
Emma Qiao
|
82d03ca979
|
[Infra] - Increase unittest execution time since some test exceeds 1600 (#6277)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-24 10:02:28 +08:00 |
|
Iman Tabrizian
|
7740bfa31d
|
Waive tests (#6312)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-23 18:15:07 -07:00 |
|
Lucas Liebenwein
|
cf4f4e8d73
|
[AutoDeploy] disable flaky MoE nvfp4 test (#6302)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-07-23 13:13:01 -04:00 |
|