Pengyun Lin
|
fad000589d
|
[None][chore] Unify DS tool parser names (#10239)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-12-31 14:40:07 +08:00 |
|
JunyiXu-nv
|
55bc6a5ff8
|
[https://nvbugs/5753250][fix] Fix undefined local variable in responses utils (#10154)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-28 06:59:32 +08:00 |
|
Pengyun Lin
|
c5b0f9e436
|
[https://nvbugs/5633700][fix] Cache tiktoken vocab for gpt-oss (#10219)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-12-26 18:39:03 +08:00 |
|
Xianjie Qiao
|
871c6b435c
|
[None] [feat] skip batch_tokenize_prompts in CustomDataset (#10214)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
2025-12-23 17:40:57 +08:00 |
|
Harshini Komali
|
d691371eaf
|
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf (#9310)
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-12-23 13:25:55 +08:00 |
|
Fanrong Li
|
0d2500c631
|
[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser (#10126)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-12-23 08:46:47 +08:00 |
|
JunyiXu-nv
|
aaa87abf41
|
[TRTLLM-7906][feat] Support multiple post process for Responses API (#9908)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-22 11:33:34 -05:00 |
|
Pengyun Lin
|
ac03915dc3
|
[TRTLLM-9604][feat] DS R1 & V3.1 tool parser (#10010)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-12-19 17:20:03 +08:00 |
|
Lizhi Zhou
|
f02782a6f2
|
[https://nvbugs/5726066][fix] fix auto-scaling related failures (#9845)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
|
2025-12-18 16:37:48 -05:00 |
|
Lizhi Zhou
|
bd13957e70
|
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-12-16 05:16:32 -08:00 |
|
arekay-nv
|
4f75a31a45
|
[https://nvbugs/5540979][fix] Potential fix for 5540979 (#9716)
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
|
2025-12-15 10:49:31 -05:00 |
|
Wanli Jiang
|
3230fbe79a
|
[None][feat] Update reasoning parser for nano-v3 (#9944)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-12-15 05:39:37 -08:00 |
|
Balaram Buddharaju
|
6a6e41f802
|
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-12 22:29:41 -08:00 |
|
bhsueh_NV
|
e49c70f6df
|
[None][feat] Support Mistral Large3 LLM part (#9820)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-12-13 11:44:27 +08:00 |
|
JunyiXu-nv
|
2fec53dfa5
|
[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-12 23:32:39 +08:00 |
|
JunyiXu-nv
|
710c592d7c
|
[https://nvbugs/5727517][fix] Preserve ip:port for disagg (#9859)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-12 09:45:34 +08:00 |
|
Erin
|
89dabf5aa1
|
[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353)
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-12-11 09:33:25 -08:00 |
|
JunyiXu-nv
|
b210f22c7e
|
[https://nvbugs/5703953][fix] Preserving ip:port for trtllm-serve before initializing llm (#9646)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-06 20:13:48 -08:00 |
|
Lizhi Zhou
|
0d0a16fff4
|
[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-12-05 10:44:16 +08:00 |
|
JunyiXu-nv
|
6d2daec5d0
|
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-04 13:49:40 +08:00 |
|
Wanli Jiang
|
4485e516a2
|
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (#9540)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-12-04 06:47:32 +08:00 |
|
JunyiXu-nv
|
743486b2ea
|
[TRTLLM-6842][feat] Support Response API for general purpose (#9392)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-03 16:49:26 +08:00 |
|
binghanc
|
db5b876124
|
[None][feat] support for more accurate AR calculation (#9323)
Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>
|
2025-11-29 00:34:21 +08:00 |
|
Pengyun Lin
|
fa61825c74
|
[None][feat] Support custom chat template for tool calling (#9297)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-11-25 22:07:04 +08:00 |
|
JunyiXu-nv
|
fdb0787e85
|
[None][chore] Support json_schema in response_format (#8934)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-11-14 09:43:13 +08:00 |
|
William Zhang
|
121140cfec
|
[None][fixes] Add tool call parsing fixes and Qwen3 coder parser (#8817)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-11-13 04:34:38 -08:00 |
|
mpikulski
|
533add5056
|
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-11-07 17:47:35 -08:00 |
|
Yechan Kim
|
00c0e6c440
|
[https://nvbugs/5523315][fix] Fix serve benchmark test (#8255)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-03 00:30:13 -08:00 |
|
Yilin Fan
|
f3224ccd32
|
[None][feat] Add disagg relay time to time breakdown tool (#8465)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
|
2025-10-30 18:21:45 -07:00 |
|
Fanrong Li
|
a21697ead9
|
[None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench (#8729)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-10-29 05:17:16 -07:00 |
|
Pengyun Lin
|
2aade46d18
|
[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-29 15:48:29 +08:00 |
|
Lizhi Zhou
|
24167d00eb
|
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-28 17:04:53 -07:00 |
|
nvxuanyuc
|
d1398c05e6
|
[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
|
2025-10-27 13:12:31 -04:00 |
|
zhanghaotong
|
1026069a2b
|
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-10-27 18:51:07 +08:00 |
|
Chang Liu
|
e47c787dd7
|
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-24 13:40:41 -04:00 |
|
Yechan Kim
|
2d86d6be40
|
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-10-24 12:53:40 -04:00 |
|
Zheng Duan
|
e666a704f5
|
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-10-23 22:09:21 -04:00 |
|
Lizhi Zhou
|
23d5280a90
|
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-21 17:25:07 -04:00 |
|
Pengyun Lin
|
a4227cf1b0
|
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-21 14:08:39 +08:00 |
|
Wangjue Yao
|
9865d3d770
|
[None][feat] Support cached tokens for Openai server (#7637)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-16 20:51:37 +08:00 |
|
Lizhi Zhou
|
22471ecc67
|
[TRTLLM-7846][feat] implement etcd storage for disagg cluster (#8210)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-14 16:48:41 -04:00 |
|
Yilin Fan
|
2695d70d42
|
[None][feat] Add request timing breakdown option in benchmark_serving (#8128)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
|
2025-10-10 09:24:54 -07:00 |
|
Lizhi Zhou
|
fdf29ab8fa
|
[TRTLLM-7846][feat] Http disagg-cluster management implemention (#7869)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-09 09:44:01 +08:00 |
|
mpikulski
|
98b3af4d4e
|
[TRTLLM-8413][chore] resolve sampling defaults in OpenAI API backend (#8121)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-06 06:09:43 -07:00 |
|
Yilin Fan
|
01423ac183
|
[None][feat] perf_metrics endpoint functionality improvement (#8005)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
|
2025-10-02 17:43:25 -07:00 |
|
Guoming Zhang
|
202bed4574
|
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Enwei Zhu
|
a1a57e83b8
|
[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-24 18:30:23 +08:00 |
|
JunyiXu-nv
|
6654b78c94
|
[https://nvbugs/5521799][fix] Trim incorrectly generated harmony messages (#7849)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-09-24 16:38:43 +08:00 |
|
Lizhi Zhou
|
7550251988
|
[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-09-24 08:31:56 +08:00 |
|
Yilin Fan
|
7d4d6cc9e0
|
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-09-23 09:39:47 -07:00 |
|