Commit Graph

142 Commits

Author SHA1 Message Date
Pengyun Lin
fad000589d
[None][chore] Unify DS tool parser names (#10239)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-31 14:40:07 +08:00
JunyiXu-nv
55bc6a5ff8
[https://nvbugs/5753250][fix] Fix undefined local variable in responses utils (#10154)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-28 06:59:32 +08:00
Pengyun Lin
c5b0f9e436
[https://nvbugs/5633700][fix] Cache tiktoken vocab for gpt-oss (#10219)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 18:39:03 +08:00
Xianjie Qiao
871c6b435c
[None] [feat] skip batch_tokenize_prompts in CustomDataset (#10214)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-12-23 17:40:57 +08:00
Harshini Komali
d691371eaf
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf (#9310)
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-23 13:25:55 +08:00
Fanrong Li
0d2500c631
[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser (#10126)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-23 08:46:47 +08:00
JunyiXu-nv
aaa87abf41
[TRTLLM-7906][feat] Support multiple post process for Responses API (#9908)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-22 11:33:34 -05:00
Pengyun Lin
ac03915dc3
[TRTLLM-9604][feat] DS R1 & V3.1 tool parser (#10010)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-19 17:20:03 +08:00
Lizhi Zhou
f02782a6f2
[https://nvbugs/5726066][fix] fix auto-scaling related failures (#9845)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
2025-12-18 16:37:48 -05:00
Lizhi Zhou
bd13957e70
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-16 05:16:32 -08:00
arekay-nv
4f75a31a45
[https://nvbugs/5540979][fix] Potential fix for 5540979 (#9716)
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
2025-12-15 10:49:31 -05:00
Wanli Jiang
3230fbe79a
[None][feat] Update reasoning parser for nano-v3 (#9944)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 05:39:37 -08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
bhsueh_NV
e49c70f6df
[None][feat] Support Mistral Large3 LLM part (#9820)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-13 11:44:27 +08:00
JunyiXu-nv
2fec53dfa5
[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 23:32:39 +08:00
JunyiXu-nv
710c592d7c
[https://nvbugs/5727517][fix] Preserve ip:port for disagg (#9859)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 09:45:34 +08:00
Erin
89dabf5aa1
[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353)
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-11 09:33:25 -08:00
JunyiXu-nv
b210f22c7e
[https://nvbugs/5703953][fix] Preserving ip:port for trtllm-serve before initializing llm (#9646)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-06 20:13:48 -08:00
Lizhi Zhou
0d0a16fff4
[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:44:16 +08:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
Wanli Jiang
4485e516a2
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (#9540)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-04 06:47:32 +08:00
JunyiXu-nv
743486b2ea
[TRTLLM-6842][feat] Support Response API for general purpose (#9392)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 16:49:26 +08:00
binghanc
db5b876124
[None][feat] support for more accurate AR calculation (#9323)
Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>
2025-11-29 00:34:21 +08:00
Pengyun Lin
fa61825c74
[None][feat] Support custom chat template for tool calling (#9297)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-11-25 22:07:04 +08:00
JunyiXu-nv
fdb0787e85
[None][chore] Support json_schema in response_format (#8934)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-14 09:43:13 +08:00
William Zhang
121140cfec
[None][fixes] Add tool call parsing fixes and Qwen3 coder parser (#8817)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-13 04:34:38 -08:00
mpikulski
533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
Yechan Kim
00c0e6c440
[https://nvbugs/5523315][fix] Fix serve benchmark test (#8255)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 00:30:13 -08:00
Yilin Fan
f3224ccd32
[None][feat] Add disagg relay time to time breakdown tool (#8465)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-30 18:21:45 -07:00
Fanrong Li
a21697ead9
[None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench (#8729)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-10-29 05:17:16 -07:00
Pengyun Lin
2aade46d18
[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-29 15:48:29 +08:00
Lizhi Zhou
24167d00eb
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-28 17:04:53 -07:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Yechan Kim
2d86d6be40
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-24 12:53:40 -04:00
Zheng Duan
e666a704f5
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-23 22:09:21 -04:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server (#7637)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
Lizhi Zhou
22471ecc67
[TRTLLM-7846][feat] implement etcd storage for disagg cluster (#8210)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 16:48:41 -04:00
Yilin Fan
2695d70d42
[None][feat] Add request timing breakdown option in benchmark_serving (#8128)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-10 09:24:54 -07:00
Lizhi Zhou
fdf29ab8fa
[TRTLLM-7846][feat] Http disagg-cluster management implemention (#7869)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-09 09:44:01 +08:00
mpikulski
98b3af4d4e
[TRTLLM-8413][chore] resolve sampling defaults in OpenAI API backend (#8121)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-06 06:09:43 -07:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement (#8005)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Guoming Zhang
202bed4574 [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Enwei Zhu
a1a57e83b8
[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-24 18:30:23 +08:00
JunyiXu-nv
6654b78c94
[https://nvbugs/5521799][fix] Trim incorrectly generated harmony messages (#7849)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-09-24 16:38:43 +08:00
Lizhi Zhou
7550251988
[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-24 08:31:56 +08:00
Yilin Fan
7d4d6cc9e0
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-09-23 09:39:47 -07:00