Pengyun Lin
|
a4227cf1b0
|
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-21 14:08:39 +08:00 |
|
Wangjue Yao
|
9865d3d770
|
[None][feat] Support cached tokens for Openai server (#7637)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-16 20:51:37 +08:00 |
|
JunyiXu-nv
|
6654b78c94
|
[https://nvbugs/5521799][fix] Trim incorrectly generated harmony messages (#7849)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-09-24 16:38:43 +08:00 |
|
Pengyun Lin
|
c2bc39af63
|
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-09-12 15:32:34 +08:00 |
|
JunyiXu-nv
|
504bb7ffa9
|
[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API (#7508)
Signed-off-by: Junyi Xu
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-09-08 11:11:35 +08:00 |
|
Zero Zeng
|
953f4fd69e
|
[None][fix] acceptance rate calculation fix in benchmark_serving (#6746)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
|
2025-08-19 17:29:36 +08:00 |
|
Yegor
|
b01d1c28f7
|
[feat] Detokenize option in /v1/completions request (#5382)
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
|
2025-07-08 19:36:04 +08:00 |
|
pansicheng
|
e84dc6b3c7
|
feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354)
* add deepseek-r1 reasoning parser
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
* fix test
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
---------
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-05-06 08:13:04 +08:00 |
|
Kaiyu Xie
|
e037d3e99b
|
chore: Unify Python NVTX call (#3450)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-04-15 23:25:36 +08:00 |
|
Shunkangz
|
ea050084ad
|
feat: Add support of chat completion in PD (#2985)
* Add support of chat completion in PD
Add support of include_usage in PD
Reformat
* Remove redundant code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Add chat completion test
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-04-11 17:53:28 +08:00 |
|
Kaiyu Xie
|
0a4e1d5a55
|
breaking change: perf: Make ipc_periodically the default responses_handler (#3102)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-04-08 10:36:39 +08:00 |
|
Kaiyu Xie
|
2631f21089
|
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-03-23 16:39:35 +08:00 |
|
Kaiyu Xie
|
3aa6b11d13
|
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
|
2025-03-18 21:25:19 +08:00 |
|
Kaiyu Xie
|
77d7fe1eb2
|
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
|
2025-03-04 18:44:00 +08:00 |
|