Commit Graph

30 Commits

Author SHA1 Message Date
Erin
8fe7bdeacf
feat: LogitsProcessor in PyTorch backend (#3145)
* support lp in pytorch backend

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

* fix tp

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

---------

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-01 14:15:30 -07:00
Erin
83f37614ef
feat: Support Top-K logprobs and prompt_logprobs in LLMAPI (#3388)
* support return logprob in llmapi

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

update and add test

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

stability test

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

* revert removal of old flag

Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

---------

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>
2025-05-01 12:47:14 -04:00
Kate Cheng
7dbe618683
feat: Add multimodal embedding field in LlmRequest (#3855)
* Add a new param to LlmRequest and Request to natively support mm

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* update comment

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Update tests to match the new LlmRequest constructor parameters

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Modify unitTest and modify mm_embeding's dict name in llama4

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Fix based on comments

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Fix comment

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Fix LlmRequest initialization in kvCacheManagerTest

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Clean up code for promt_tuning_config

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Clean up prompt_tuning_config in GenerationRequest

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

---------

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-01 12:23:30 +08:00
bhsueh_NV
f77252e9ff
fix bug of create cuda stream as default parameter which will be init… (#3764)
* fix bug of create cuda stream as default parameter which will be initialized during importing

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* add torch.cuda.Stream() for the leader node

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix pre-commit issue

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-28 08:16:03 +08:00
Yuan Tong
57944206ba
feat: return logits in PyTorch flow (#3221)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:56:03 -07:00
shaharmor98
5fff8f0935
Add running E2E LoRA flow (#3648)
* add passing E2E LoRA flow

Signed-off-by: Shahar Mor <smor@nvidia.com>

* add experimental feature

Signed-off-by: Shahar Mor <smor@nvidia.com>

* fix llma_args definition

Signed-off-by: Shahar Mor <smor@nvidia.com>

* decreased manually size of max loras to address OOM

Signed-off-by: Shahar Mor <smor@nvidia.com>

---------

Signed-off-by: Shahar Mor <smor@nvidia.com>
2025-04-23 11:19:41 +08:00
Yan Chunwei
2a09826ec4
fix hmac in remote mpi session (#3649)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-18 17:47:51 +08:00
Yibin Li
351808efeb
fix: Use hmac authentication for pickle encryption (#3384)
* hmac initial implementation to encrypt worker and proxy queue

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* set different hmac key for each pair of server/client queue

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* fix comments

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* fix style

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

---------

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-04-17 00:40:13 +08:00
Kaiyu Xie
e037d3e99b
chore: Unify Python NVTX call (#3450)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-15 23:25:36 +08:00
Yuan Tong
668a0335e4
fix: Proper error bubbling for PyExecutor (#3321)
* fix: Proper error bubbling for PyExecutor
* fix: Proper shutdown
* fix: multi gpu proper shutdown

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-04-15 14:49:46 +08:00
pcastonguay
fe6f14b2b1
fix: Fixing issue with first gen token being returned twice in streaming (#3427)
* fix: Fixing issue with first gen token being returned twice with streaming

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing not_expectring_strings in test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-13 22:45:09 -04:00
Yan Chunwei
b37c5c0a4d
make LLM-API slurm examples executable (#3402)
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-04-13 21:42:45 +08:00
Iman Tabrizian
c539750d42
fix: Allow context_and_generation request type in disagg overlap (#3489)
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-11 16:15:01 -07:00
Yan Chunwei
c5e803ba48
chore: code cleanup for error logging and SharedMemory in proxy.py (#3432)
* cleanup log

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove shared-memory

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove ExecutorResponse

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* add assert for postproc

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-10 21:57:06 +08:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. (#3270)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
Kaiyu Xie
0a4e1d5a55
breaking change: perf: Make ipc_periodically the default responses_handler (#3102)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-08 10:36:39 +08:00
Fanrong Li
1fe64b90be
fix: fix the acceptance rate of pytorch workflow in trtllm-bench (#3240)
* fix acceptance rate of pytorch workflow.
* revert the RequestOutput API change.

---------

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-03 15:12:24 +08:00
Shunkangz
dda7354d1a
Refactor return of first gen token in PD (#2986)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-01 12:28:27 +08:00
Frank
8bb3eea285
perf: Readd iteration logging for trtllm-bench. (#3039)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-01 08:13:09 +08:00
Yan Chunwei
794f61c997
fix: fix single-node cannot quit issue on slurm (#3140)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-31 10:15:27 +08:00
Yan Chunwei
87ab794aa2
fix: fix hang in mgmn with trtllm-llmapi-launch command (#3119)
* init

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* restore

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-27 18:45:43 +08:00
Kaiyu Xie
ea3739ee62
Fix: fuse message not aligned on different processes (#3067)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-26 17:15:27 +08:00
Kaiyu Xie
2631f21089
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM

---------

Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00