TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Erin	8fe7bdeacf	feat: LogitsProcessor in PyTorch backend (#3145 ) * support lp in pytorch backend Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> * fix tp Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> --------- Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-05-01 14:15:30 -07:00
Erin	83f37614ef	feat: Support Top-K logprobs and prompt_logprobs in LLMAPI (#3388 ) * support return logprob in llmapi Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> update and add test Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> stability test Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> * revert removal of old flag Signed-off-by: Erin Ho <erinh@nvidia.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> --------- Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Erin Ho <erinh@nvidia.com>	2025-05-01 12:47:14 -04:00
Kate Cheng	7dbe618683	feat: Add multimodal embedding field in LlmRequest (#3855 ) * Add a new param to LlmRequest and Request to natively support mm Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * update comment Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Update tests to match the new LlmRequest constructor parameters Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Modify unitTest and modify mm_embeding's dict name in llama4 Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Fix based on comments Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Fix comment Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Fix LlmRequest initialization in kvCacheManagerTest Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Clean up code for promt_tuning_config Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> * Clean up prompt_tuning_config in GenerationRequest Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> --------- Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-01 12:23:30 +08:00
bhsueh_NV	f77252e9ff	fix bug of create cuda stream as default parameter which will be init… (#3764 ) * fix bug of create cuda stream as default parameter which will be initialized during importing Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * add torch.cuda.Stream() for the leader node Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix pre-commit issue Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-28 08:16:03 +08:00
Yuan Tong	57944206ba	feat: return logits in PyTorch flow (#3221 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-24 16:56:03 -07:00
shaharmor98	5fff8f0935	Add running E2E LoRA flow (#3648 ) * add passing E2E LoRA flow Signed-off-by: Shahar Mor <smor@nvidia.com> * add experimental feature Signed-off-by: Shahar Mor <smor@nvidia.com> * fix llma_args definition Signed-off-by: Shahar Mor <smor@nvidia.com> * decreased manually size of max loras to address OOM Signed-off-by: Shahar Mor <smor@nvidia.com> --------- Signed-off-by: Shahar Mor <smor@nvidia.com>	2025-04-23 11:19:41 +08:00
Yan Chunwei	2a09826ec4	fix hmac in remote mpi session (#3649 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-18 17:47:51 +08:00
Yibin Li	351808efeb	fix: Use hmac authentication for pickle encryption (#3384 ) * hmac initial implementation to encrypt worker and proxy queue Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * set different hmac key for each pair of server/client queue Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * fix comments Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * fix style Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> --------- Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-04-17 00:40:13 +08:00
Kaiyu Xie	e037d3e99b	chore: Unify Python NVTX call (#3450 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-15 23:25:36 +08:00
Yuan Tong	668a0335e4	fix: Proper error bubbling for PyExecutor (#3321 ) * fix: Proper error bubbling for PyExecutor * fix: Proper shutdown * fix: multi gpu proper shutdown Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-04-15 14:49:46 +08:00
pcastonguay	fe6f14b2b1	fix: Fixing issue with first gen token being returned twice in streaming (#3427 ) * fix: Fixing issue with first gen token being returned twice with streaming Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing not_expectring_strings in test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-13 22:45:09 -04:00
Yan Chunwei	b37c5c0a4d	make LLM-API slurm examples executable (#3402 ) Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>	2025-04-13 21:42:45 +08:00
Iman Tabrizian	c539750d42	fix: Allow context_and_generation request type in disagg overlap (#3489 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-04-11 16:15:01 -07:00
Yan Chunwei	c5e803ba48	chore: code cleanup for error logging and SharedMemory in proxy.py (#3432 ) * cleanup log Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove shared-memory Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove ExecutorResponse Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * add assert for postproc Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-10 21:57:06 +08:00
yuxianq	7b03350527	Add thread leak check and fix thread/memory leak issues. (#3270 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-08 19:03:18 +08:00
Kaiyu Xie	0a4e1d5a55	breaking change: perf: Make ipc_periodically the default responses_handler (#3102 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-08 10:36:39 +08:00
Fanrong Li	1fe64b90be	fix: fix the acceptance rate of pytorch workflow in trtllm-bench (#3240 ) * fix acceptance rate of pytorch workflow. * revert the RequestOutput API change. --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-04-03 15:12:24 +08:00
Shunkangz	dda7354d1a	Refactor return of first gen token in PD (#2986 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-01 12:28:27 +08:00
Frank	8bb3eea285	perf: Readd iteration logging for trtllm-bench. (#3039 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-04-01 08:13:09 +08:00
Yan Chunwei	794f61c997	fix: fix single-node cannot quit issue on slurm (#3140 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-03-31 10:15:27 +08:00
Yan Chunwei	87ab794aa2	fix: fix hang in mgmn with trtllm-llmapi-launch command (#3119 ) * init Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * restore Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-03-27 18:45:43 +08:00
Kaiyu Xie	ea3739ee62	Fix: fuse message not aligned on different processes (#3067 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-03-26 17:15:27 +08:00
Kaiyu Xie	2631f21089	Update (#2978 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-03-23 16:39:35 +08:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00
Kaiyu Xie	9b931c0f63	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
Kaiyu Xie	77d7fe1eb2	Update TensorRT-LLM (#2849 ) * Update TensorRT-LLM --------- Co-authored-by: aotman <chenhangatm@gmail.com>	2025-03-04 18:44:00 +08:00
Kaiyu Xie	ab5b19e027	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00
Kaiyu Xie	2ea17cdad2	Update TensorRT-LLM (#2792 ) * Update TensorRT-LLM --------- Co-authored-by: jlee <jungmoolee@clika.io>	2025-02-18 21:27:39 +08:00
Kaiyu Xie	e88da961c5	Update TensorRT-LLM (#2783 )	2025-02-13 18:40:22 +08:00
Dan Blanaru	16d2467ea8	Update TensorRT-LLM (#2755 ) * Update TensorRT-LLM --------- Co-authored-by: Denis Kayshev <topenkoff@gmail.com> Co-authored-by: akhoroshev <arthoroshev@gmail.com> Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com> Update	2025-02-11 03:01:00 +00:00

30 Commits