dongjiyingdjy
|
17e0d0fb1a
|
fix: fix illeagel memory access (#6437)
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
|
2025-07-31 10:01:34 +08:00 |
|
Enwei Zhu
|
4b299cb77e
|
feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 (#6408)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-31 09:53:52 +08:00 |
|
Vadim Gimpelson
|
25cd4f215e
|
[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process (#6004)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-07-31 09:35:24 +09:00 |
|
shaharmor98
|
f9cf683e39
|
add propagation of trust_remote_code to OpenAIServer (#6446)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-07-30 15:25:41 -04:00 |
|
Wanli Jiang
|
9632dba02e
|
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-30 09:20:16 -07:00 |
|
NVShreyas
|
e67f4da9b5
|
[Perf]: Add residual, norm for nemotron_nas models (#6455)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
|
2025-07-30 09:10:38 -07:00 |
|
Chang Liu
|
b4065d8ca6
|
[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-07-30 10:00:15 -04:00 |
|
pcastonguay
|
e7ae5e2824
|
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-07-30 09:42:13 -04:00 |
|
tomeras91
|
a2514d93fc
|
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-30 07:22:32 -04:00 |
|
QI JUN
|
2fe9cc0889
|
chore: remove draft_model_engine from init parameter list of PyExecutor (#6325)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-30 03:31:49 -04:00 |
|
QI JUN
|
1f39a11af0
|
chore: clean code of PyExecutor (#6445)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-30 02:11:43 -04:00 |
|
2ez4bz
|
d6eed1b624
|
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-30 14:10:36 +08:00 |
|
Jinyang Yuan
|
a427f5bece
|
[fix] Fix wide EP when using DeepEP with online EPLB (#6429)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-30 00:13:18 -04:00 |
|
Zheng Duan
|
c9ed1ab436
|
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-30 10:39:40 +08:00 |
|
peaceh-nv
|
5b420ad267
|
Rename layer to comply with deepseek (#6393)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-30 10:00:48 +08:00 |
|
Yechan Kim
|
d6eb8e2366
|
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 08:52:31 +08:00 |
|
Yunfan Fan
|
1a8e28d295
|
[FIX] fix bugs caused by None attention_bias during Qwen3 model convert engine (#6344)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
|
2025-07-30 07:13:44 +08:00 |
|
Yan Chunwei
|
ad662ddcdd
|
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 16:16:52 -04:00 |
|
Michal Guzek
|
7efe3cb0cd
|
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
2025-07-29 10:16:59 -07:00 |
|
Yukun He
|
0eee2e2850
|
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-29 16:41:48 +08:00 |
|
QI JUN
|
13e24ab1cb
|
chore: remove unused code in PyExecutor (#6351)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-29 16:24:26 +08:00 |
|
Frank
|
d2a04abb95
|
[fix] Fixes to parameter usage and low latency configuration. (#6343)
|
2025-07-29 01:36:13 -04:00 |
|
nv-guomingz
|
49044733e1
|
chore: delete useless gitkeep files. (#6400)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-28 11:38:30 -04:00 |
|
QI JUN
|
4efc6496b7
|
chore: add _prepare_and_schedule_batch function in PyExecutor (#6365)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-28 05:50:27 -04:00 |
|
Yan Chunwei
|
45d441e60c
|
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 15:57:07 +08:00 |
|
Zero Zeng
|
c9b8b6180f
|
Add Acceptance Rate calculation to benchmark_serving (#6240)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
|
2025-07-28 14:00:58 +08:00 |
|
Jinyang Yuan
|
97f7e12588
|
[fix] Fix perf regression caused by MoE autotuner when using DeepEPLowLatency (#6288)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-28 01:37:11 -04:00 |
|
Chang Liu
|
dc757799e1
|
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266)
|
2025-07-27 23:29:21 -04:00 |
|
Void
|
f172face98
|
DeepEP LL dispatch FP4 (#6296)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-07-28 11:25:42 +08:00 |
|
Yukun He
|
93a0fd0a23
|
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4. (#6205)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-28 09:36:26 +08:00 |
|
YueWeng
|
2dd3186727
|
fix: remove cudaStreamSynchronize when using relaxed acceptance (#5262)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-07-28 09:18:41 +08:00 |
|
Ziyi Xiong
|
d853811190
|
[https://nvbugs/5402719][fix]: Add cuda graph dummy requests to the spec_resource_manager (#6258)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-26 20:32:39 -04:00 |
|
Michal Guzek
|
08d57123f9
|
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-25 18:10:40 -04:00 |
|
ameynaik-hub
|
1e5e71aa42
|
Mtp optimizations round1 (#5689)
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
|
2025-07-25 13:48:27 -04:00 |
|
nv-guomingz
|
b8d4cb8beb
|
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
|
2025-07-25 12:55:56 -04:00 |
|
xiaoqi
|
a0aecf0476
|
[feat]: support logit_bias (#5354)
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-25 09:37:41 +00:00 |
|
liji-nv
|
e07fff4f78
|
[https://nvbugs/5340941] - fix: Correct custom ops used by Qwen3 Moe … (#6285)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-25 14:49:45 +08:00 |
|
Mike Iovine
|
0f2f11f90b
|
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#6104)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-07-24 21:50:11 -04:00 |
|
Linda
|
9a99e6d6d7
|
fix: integration tests with nanobind (#6326)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-25 09:23:20 +08:00 |
|
Shiyu Li
|
375f74ecb2
|
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237)
Signed-off-by: Shiyu Li <shili@nvidia.com>
|
2025-07-25 08:01:40 +08:00 |
|
Frank
|
f8f5ba65fc
|
[fix] Update to remove popping of KV cache and other args. (#6310)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-07-24 15:54:33 -04:00 |
|
Stefan Niebler
|
0df758ec9f
|
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration (#6217)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-24 18:04:41 +02:00 |
|
bhsueh_NV
|
7b6aadc800
|
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-24 21:47:37 +08:00 |
|
liji-nv
|
14d94a3856
|
feat: Add non UB AR + Residual + Norm + Quant fusion (#6320)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-24 05:51:43 -04:00 |
|
Lizhi Zhou
|
a63a1ac7f9
|
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX related logs (#6085)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-07-24 16:21:01 +08:00 |
|
QI JUN
|
428e34080f
|
chore: remove unused variables in pyexecutor (#6280)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-24 13:16:15 +08:00 |
|
Stefan Niebler
|
2486eb778e
|
[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLLM Sampler (#6223)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-23 12:30:50 +02:00 |
|
YueWeng
|
ed62a06eef
|
[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue (#6136)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-07-23 14:53:37 +08:00 |
|
QI JUN
|
a8253b942f
|
chore: remove duplicate should_stop_processing check (#6242)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-23 14:11:23 +08:00 |
|
Yechan Kim
|
83c3ed128b
|
chore: set default device to cpu on Multimodal models (#5994)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-22 21:45:31 -07:00 |
|