Yan Chunwei
f81caf5491
[None][chore] replace print_colored_debug with logger_debug ( #8417 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-22 17:54:38 +08:00
Yuan Tong
70c3b100eb
[ #7692 ][fix] recognize RequestError as per-request error in background handler ( #7726 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-09-24 11:11:17 +08:00
Yan Chunwei
3946e798db
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances ( #4727 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-19 06:13:53 +08:00
QI JUN
f899c4d294
Re-implement LlmResponse in Python to reduce host overhead of pybind ( #5224 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 21:28:09 +08:00
Yuan Tong
6bce7337a9
perf: avoid dynamic import overhead in is_llm_response with duck typing ( #5110 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-06-15 07:45:02 +08:00
Yan Chunwei
80b4026775
chore: remove request_error ipc in LLM.submit ( #4763 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-03 20:55:59 +08:00
Yan Chunwei
93c0632ee4
opt: the perormance for dist-agg streaming generation ( #4214 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-31 17:40:32 +08:00
Yan Chunwei
9199793848
fix: llmapi-launch add add trtllm-bench test with engine building ( #4091 )
...
* add trtllm-bench mgmn test
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-21 10:18:01 +08:00
Erin
83f37614ef
feat: Support Top-K logprobs and prompt_logprobs in LLMAPI ( #3388 )
...
* support return logprob in llmapi
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
update and add test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
stability test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
* revert removal of old flag
Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
---------
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>
2025-05-01 12:47:14 -04:00
Yuan Tong
57944206ba
feat: return logits in PyTorch flow ( #3221 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:56:03 -07:00
Yan Chunwei
2a09826ec4
fix hmac in remote mpi session ( #3649 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-18 17:47:51 +08:00
Yibin Li
351808efeb
fix: Use hmac authentication for pickle encryption ( #3384 )
...
* hmac initial implementation to encrypt worker and proxy queue
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* set different hmac key for each pair of server/client queue
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* fix comments
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* fix style
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
---------
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-04-17 00:40:13 +08:00
Yan Chunwei
b37c5c0a4d
make LLM-API slurm examples executable ( #3402 )
...
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-04-13 21:42:45 +08:00
Yan Chunwei
c5e803ba48
chore: code cleanup for error logging and SharedMemory in proxy.py ( #3432 )
...
* cleanup log
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove shared-memory
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove ExecutorResponse
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* add assert for postproc
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-10 21:57:06 +08:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. ( #3270 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
Kaiyu Xie
0a4e1d5a55
breaking change: perf: Make ipc_periodically the default responses_handler ( #3102 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-08 10:36:39 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00