Commit Graph

164 Commits

Author SHA1 Message Date
Kaiyu Xie
da967d0bd7
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-23 22:29:37 -05:00
Yan Chunwei
54768f3f2c
[None][chore] refine placement group in ray executor (#10235)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-01-23 19:31:20 +08:00
Yan Chunwei
30ffa58b54
[https://nvbugs/5783876][fix] fix hmac launch (#10434)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-22 23:20:53 +08:00
Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Stefan Niebler
0cfd08745c
[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2026-01-16 10:52:41 -08:00
Yuxian Qiu
04b112651b
[None][feat] Hang detection for executor loop and worker. (#10480)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-13 02:34:32 -05:00
JadoTu
82aaf98070
[None][feat] add the eos tokens in generation config to stop words in the sampler (#10389)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2026-01-06 09:24:03 +08:00
shuyixiong
f4f0fe85e9
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests (#9939)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-24 15:27:01 +08:00
William Zhang
a6a88985cf
[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758)
* Why?

Certain VLMs like the Qwen family need more than just the multimodal
embeddings in the language model, and need MRoPE position IDs and
deltas. Prior to this commit, only the embeddings could be communicated
from the encoder worker to the prefill worker.

* What?

This commit extends the `DisaggregatedParams` to include the MRoPE
information. It also adjusts several pieces of code required to
communicate that between E, P and D workers.

Closes TRTLLM-9409.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-22 06:32:49 -05:00
Yan Chunwei
ea6cd76c55
[None][refactor] simplify get_stats and get_kvcache_events with rpc (#9980)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-22 18:23:43 +08:00
Michal Guzek
e6187d8109
[https://nvbugs/5708810][fix] Fix TRTLLMSampler (#9710)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-15 23:26:52 +01:00
Yan Chunwei
355e06d66d
[None][doc] update readme for rpc (#9972)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-15 10:16:50 +08:00
Yan Chunwei
85406f9dda
[https://nvbugs/5720482][fix] Fix test rpc streaming (#9902)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-13 01:14:43 -08:00
shuyixiong
7fc720a397
[TRTLLM-9784][fix] Resolve port conflicts (#9780)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-12 22:10:01 -08:00
Erin
89dabf5aa1
[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353)
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-11 09:33:25 -08:00
Yan Chunwei
e4c707845f
[None][fix] enable hmac in RPC (#9745)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-07 08:24:46 +08:00
Shi Xiaowei
227d42e492
[https://nvbugs/5651854][fix] Fix dist-serving perf by clearing CPU affinity (#9549)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-03 01:17:03 +08:00
Yan Chunwei
b86256eb54
[TRTLLM-9144][fix] enhance RPC robustness (#8711)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-12-02 21:37:59 +08:00
Venky
639c939a4f
[TRTC-1943][feat] Env vars override support in LLM API (#9104)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-01 10:04:49 -08:00
Enwei Zhu
c2562fc800
[https://nvbugs/5687820][fix] Remove self.abort() in DetokenizedGenerationResult (#9449)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-27 22:54:40 +08:00
Pengyun Lin
eca68e4465 [https://nvbugs/5564465][fix] Overwrite only if default_max_tokens is legal (#8538)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
JunyiXu-nv
46dccb5e2d
[None][chore] Prevent negative max_tokens passed into tllm request (#9037)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-20 09:58:13 +08:00
Erin
44d1c75701
[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server (#8765)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-13 17:21:24 -08:00
Liao Lanyu
1fd11455d8
[https://nvbugs/5556998][fix] init_hf_modules in worker_main for models with trust_remote=true (#8931)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-11 10:30:37 +08:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Cao Dong
b53961e972
[None][feat] Return logprobs incrementally in torch backend (#8785)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-07 10:23:39 +08:00
dhansen-nvidia
ada93f1187
[https://nvbugs/5527655][feat] Add NUMA-aware CPU affinity autoconfig (#8805)
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-11-06 11:59:46 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Yan Chunwei
ed297d7c2e
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api (#8415)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-03 17:59:49 -08:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs (#8600)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
Erin
a966644a71
[None][fix] Change Ray submit() to use async RPC (#8636)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-28 00:56:13 -04:00
gramnarayan
88b0fbc8ff
[#8245][feat] Autodeploy: Guided Decoding Support (#8551)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension (#8482)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
Yan Chunwei
f81caf5491
[None][chore] replace print_colored_debug with logger_debug (#8417)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-22 17:54:38 +08:00
Yan Chunwei
3f9dbc76c0
[None][fix] fix rpc unique addr related issue (#8419)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-22 04:47:18 -04:00
jthomson04
852316886e
[None][fix] Fix KV event consumption (#6346)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-18 15:41:26 -07:00
QI JUN
4a8ac8dd62
[TRTLLM-8480][chore] clean create_py_executor API (#8412)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-17 23:52:02 -04:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server (#7637)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
Yan Chunwei
206cf31705
[https://nvbugs/5560921][fix] GenerationExecutor RPC (#8209)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test (#8175)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs (#7206)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
Cao Dong
d882c92a84
[None][fix] Fix EventLoopShutdownError (#8260)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-13 17:31:33 +08:00
amitz-nv
fac47e2826
[https://nvbugs/5510879][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8063)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-12 12:29:52 -07:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (#5543)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement (#8005)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Cao Dong
62010c0ab7
[None][feat] Return topk logprobs in torch backend (#7976)
Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>
2025-09-30 09:32:37 +08:00
Tailing Yuan
b11ee868c5
[https://nvbugs/5495789][feat] Optionally disable server GC and worker GC (#7995)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-09-26 21:39:24 +08:00
Iman Tabrizian
da30d496b0
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756)" (#7969)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 15:36:38 -07:00