Kaiyu Xie
|
aee35e2dbd
|
chore: Make example SLURM scripts more parameterized (#6511)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-08-01 12:53:15 +08:00 |
|
nv-guomingz
|
03e38c9087
|
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 11:11:06 -04:00 |
|
Kaiyu Xie
|
e58afa510e
|
doc: Add README for wide EP (#6356)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-29 00:36:12 -04:00 |
|
Kaiyu Xie
|
f08286c679
|
doc: Refactor documents and examples of disaggregated serving and wide ep (#6054)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-23 09:20:57 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|
Shi Xiaowei
|
49359574c1
|
[TRTLLM-5673] Doc: ensure the disagg doc is up to date (#5938)
|
2025-07-11 17:39:05 +09:00 |
|
nv-guomingz
|
c434147366
|
chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-04 15:39:15 +09:00 |
|
Shunkangz
|
3e75320fe8
|
Add pd dynamic scaling readme (#5540)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.com>
|
2025-07-02 02:18:51 -04:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Chuang Zhu
|
947571c311
|
Fix buffer count (#5007)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-06-09 14:01:13 +08:00 |
|
Shunkangz
|
c835f06371
|
Refactor the first token response in PD (#4692)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-04 09:11:23 +08:00 |
|
Rashid Kaleem
|
19786a7961
|
[Doc] Fix readme for disaggregated serving (#4846)
Signed-off-by: Rashid K <rkaleem@nvidia.com>
Signed-off-by: Rashid Kaleem <rkaleem@nvidia.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
|
2025-06-03 11:45:26 -04:00 |
|
Yan Chunwei
|
5506f60037
|
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-05-28 18:43:04 +08:00 |
|
amirkl94
|
fbec0c3552
|
Release 0.20 to main (#4577)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-05-28 16:25:33 +08:00 |
|
pcastonguay
|
d7d455e7ea
|
[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243)
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-05-22 22:01:06 -04:00 |
|
Kaiyu Xie
|
b4e5df0ee0
|
Breaking change: perf: Enable scheduling overlap by default (#4174)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-05-15 14:27:36 +08:00 |
|
Chuang Zhu
|
09a28becae
|
fix cache buffer (#3942)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-05-07 09:49:44 +08:00 |
|
Shunkangz
|
ea050084ad
|
feat: Add support of chat completion in PD (#2985)
* Add support of chat completion in PD
Add support of include_usage in PD
Reformat
* Remove redundant code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Add chat completion test
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-04-11 17:53:28 +08:00 |
|
Chuang Zhu
|
f3237e52ed
|
update readme for disaggregated (#3323)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-04-07 21:29:15 +08:00 |
|
pcastonguay
|
b763051ba4
|
chore: Refactor disaggregated serving scripts (#3073)
* chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Updating README, removing launch script
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding scripts to populate urls section of disagg config based on SLURM env vars
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-04-03 14:55:05 -04:00 |
|
Kaiyu Xie
|
3aa6b11d13
|
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
|
2025-03-18 21:25:19 +08:00 |
|
Kaiyu Xie
|
9b931c0f63
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
Kaiyu Xie
|
77d7fe1eb2
|
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
|
2025-03-04 18:44:00 +08:00 |
|
Kaiyu Xie
|
ab5b19e027
|
Update TensorRT-LLM (#2820)
|
2025-02-25 21:21:49 +08:00 |
|