Commit Graph

2088 Commits

Author SHA1 Message Date
Michal Guzek
b8719fe96d
[nvbug/5374773] chore: Update nanobind with fail_fast_on_attention_window_too_large changes (#6491)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-07-31 20:25:29 +01:00
tomeras91
6d5da9f7c2
[https://nvbugs/5404046][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6485)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-31 21:35:10 +03:00
shaharmor98
0c42f54a39
Bugfix/fix nemotron nas lora support (#6380)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-31 13:39:35 -04:00
Emma Qiao
baece56758
[None][infra] Pin the version for triton to 3.3.1 (#6508)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-31 19:25:15 +08:00
Yiqing Yan
d38c26bb78
[Infra][TRTLLM-5633] - Fix merge waive list (#6504)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-31 14:57:51 +08:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control (#6220)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
Venky
83e97659aa
[infra] Remove auto_assign_reviewers option from .coderabbit.yaml (#6490)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-07-30 23:07:21 -07:00
Wanli Jiang
fcd5706615
doc: add bielik model to support-matrix (#6480)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-31 00:48:53 -04:00
Faraz
8e84df74b5
Fix e2e test failure for RTX6000 Pro (#6420)
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>
2025-07-30 23:32:44 -04:00
xinhe-nv
ca534e4798
test: add accuracy reference (#6479)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-31 12:27:29 +10:00
dongjiyingdjy
17e0d0fb1a
fix: fix illeagel memory access (#6437)
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
2025-07-31 10:01:34 +08:00
Enwei Zhu
4b299cb77e
feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 (#6408)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-31 09:53:52 +08:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
Yechan Kim
83621e4b80
doc: update multimodal models on support-matrix.md (#6431)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-31 08:50:18 +08:00
Vadim Gimpelson
25cd4f215e
[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process (#6004)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
2025-07-31 09:35:24 +09:00
brb-nv
0e16d1f070
test: Add time logging for lora tests (#6466)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 14:02:43 -07:00
shaharmor98
f9cf683e39
add propagation of trust_remote_code to OpenAIServer (#6446)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-30 15:25:41 -04:00
Anurag Mukkara
fac186e3b5
[nvbug/5409417] Unwaive llava test case (#6460)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-07-30 14:38:47 -04:00
brb-nv
f6287e4498
Unwaive Gemma2 LoRA test on H100 (#6461)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 12:56:12 -04:00
Bo Deng
24e7f4eece
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-31 00:41:37 +08:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
NVShreyas
e67f4da9b5
[Perf]: Add residual, norm for nemotron_nas models (#6455)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2025-07-30 09:10:38 -07:00
pcastonguay
0f083b9daf
fix: Unwaive triton cpp test [nvbug 5401088] (#6412)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-30 11:25:18 -04:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
Chang Liu
b4065d8ca6
[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-07-30 10:00:15 -04:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
tomeras91
a2514d93fc
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-30 07:22:32 -04:00
Leslie Fang
d980928c96
[doc] update the doc of feature combination matrix (#6441)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-07-30 18:48:49 +08:00
Yiqing Yan
0cf2f6f154
[TRTLLM-5633] - Merge current waive list with the TOT waive list (#5198)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-30 17:50:05 +08:00
Yechan Kim
22b29df38c
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 17:29:55 +08:00
xinhe-nv
d9ab3fd35e
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 18:45:58 +10:00
nv-guomingz
a5540acfce
chore: add trtllm-serve json schema example into doc. (#6418)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 04:33:08 -04:00
QI JUN
2fe9cc0889
chore: remove draft_model_engine from init parameter list of PyExecutor (#6325)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-30 03:31:49 -04:00
QI JUN
1f39a11af0
chore: clean code of PyExecutor (#6445)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-30 02:11:43 -04:00
2ez4bz
d6eed1b624
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-30 14:10:36 +08:00
Jinyang Yuan
a427f5bece
[fix] Fix wide EP when using DeepEP with online EPLB (#6429)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-07-30 00:13:18 -04:00
Zheng Duan
c9ed1ab436
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-30 10:39:40 +08:00
xinhe-nv
c00d6763b2
test: [CI] Add failed cases into waives.txt (#6457)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-30 12:36:58 +10:00
peaceh-nv
5b420ad267
Rename layer to comply with deepseek (#6393)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-30 10:00:48 +08:00
Venky
ab40369053
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 10:53:43 +10:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
Yunfan Fan
1a8e28d295
[FIX] fix bugs caused by None attention_bias during Qwen3 model convert engine (#6344)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
2025-07-30 07:13:44 +08:00
Yan Chunwei
ad662ddcdd
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-29 16:16:52 -04:00
Yan Chunwei
1a6930986a
chore: remove unused kv_cache_dtype in api reference (#6444)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-29 14:57:20 -04:00
Michal Guzek
7efe3cb0cd
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-07-29 10:16:59 -07:00
Zhanrui Sun
c3729dbd7d
infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build (#4939)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-29 12:54:38 -04:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… (#6428)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
xinhe-nv
f1086e7d4f
test: [CI] remove closed bugs (#6381)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-29 19:01:23 +10:00
xinhe-nv
4fbb344caf
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-29 19:00:30 +10:00
Yukun He
0eee2e2850
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-29 16:41:48 +08:00