TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-07 03:31:58 +08:00

Author	SHA1	Message	Date
mpikulski	a39e8c5567	[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema (#9305 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-20 08:32:23 +01:00
Patrice Castonguay	9b0f45298f	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-18 20:59:17 -05:00
mpikulski	04fb481da3	[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-18 09:41:59 -08:00
Robin Kobus	df41f220a2	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-17 18:07:13 +01:00
JunyiXu-nv	fdb0787e85	[None][chore] Support json_schema in response_format (#8934 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-14 09:43:13 +08:00
William Zhang	121140cfec	[None][fixes] Add tool call parsing fixes and Qwen3 coder parser (#8817 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-11-13 04:34:38 -08:00
Yan Chunwei	4fd93bdc2c	[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming (#9118 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-12 19:55:09 -08:00
Yan Chunwei	8a8883bc73	[None][chore] Waive test_llm_rpc_streaming (#9113 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-13 11:06:26 +08:00
mpikulski	533add5056	[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 17:47:35 -08:00
Patrice Castonguay	d8ea0b967f	[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-07 07:33:51 -08:00
Yilin Fan	b7798bfab8	[None][feat] Add `trtllm_` prefix for exposed metrics (#8845 ) Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>	2025-11-06 15:27:18 +08:00
Cao Dong	dddfcdd3bf	[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-04 19:32:59 +08:00
xinhe-nv	4873ca04cc	[https://nvbugs/5521799 ][fix] add harmony channel validation (#8837 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-03 02:31:54 -08:00
Yan Chunwei	1551ed8e5f	[https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests (#8567 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-11-01 06:49:33 -07:00
Anthony Chang	852e5060aa	[https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json (#8617 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-31 04:41:44 -07:00
Pengyun Lin	2aade46d18	[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-29 15:48:29 +08:00
Yechan Kim	cf8a1d2ef9	[https://nvbugs/5596377 ][fix] Fix mm dummy calculation (#8498 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-29 09:45:21 +09:00
Anish Shanbhag	a09b38a862	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-28 09:17:26 -07:00
Yechan Kim	2d86d6be40	[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-24 12:53:40 -04:00
QI JUN	6ee1c87595	[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-24 08:55:49 +08:00
Anish Shanbhag	15de45d782	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-22 20:53:08 -04:00
Patrice Castonguay	879039f6d5	[https://nvbugs/5429636 ][feat] Kv transfer timeout (#8459 ) Signed-off-by: raayandhar <raayan.dhar@gmail.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <raayan.dhar@gmail.com>	2025-10-22 09:29:02 -04:00
Pengyun Lin	a4227cf1b0	[None][feat] Support Qwen3 reasoning parser (#8000 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-21 14:08:39 +08:00
Anish Shanbhag	5ff4f88be6	[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-17 16:13:22 -04:00
John Calderon	46ee7acb33	[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling (#7539 ) Signed-off-by: John Calderon <johncalesp@gmail.com> Signed-off-by: John Calderon <jcalderon@nvidia.com> Signed-off-by: john calderon <jcalderon@nvidia.com> Signed-off-by: John Calderon <jcalderon@nvidia>	2025-10-16 17:49:22 +02:00
Lizhi Zhou	982d4b65e8	[https://nvbugs/5550671 ][fix] fix disagg-serving multinodes test failure (#8307 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yan Chunwei	4e51148088	[https://nvbugs/5532023 ][fix] unwaive GenerationExecutor tests (#8251 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Wangjue Yao	9865d3d770	[None][feat] Support cached tokens for Openai server (#7637 ) Signed-off-by: wjueyao <wyao123@terpmail.umd.edu> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-16 20:51:37 +08:00
Yan Chunwei	206cf31705	[https://nvbugs/5560921 ][fix] GenerationExecutor RPC (#8209 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-10-16 09:05:22 +08:00
mpikulski	93a4b7f1b6	[None][chore] update torch_dtype -> dtype in 'transformers' (#8263 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-15 17:09:30 +09:00
shuyixiong	6776caaad1	[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test (#8175 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-10-14 23:46:30 +08:00
Robin Kobus	db8c63b9b1	[TRTLLM-4517] [feat] Additional model outputs (#7206 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-10-13 15:33:18 +02:00
amitz-nv	bbae7a05f0	[https://nvbugs/5521949 ][fix] Replace test_codellama_fp8_with_bf16_lora with test_llama_3_1_8b_fp8_with_bf16_lora (#8199 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-10-13 06:01:55 -07:00
amitz-nv	fac47e2826	[https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8063 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-10-12 12:29:52 -07:00
amitz-nv	d560054e1b	[None][chore] Restore asserts in pytorch flow LoRA tests (#8227 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-10-09 17:10:38 +03:00
Yan Chunwei	54ab9767b5	[None][chore] fix llmargs conflict (#8152 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-06 02:34:27 -07:00
amitz-nv	8060aad239	[https://nvbugs/5521949 ][fix] Re-enable test_bielik_11b_v2_2_instruct_multi_lora, fix its API use with pytorch flow LoRA (#8146 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-10-05 04:28:20 -07:00
Yan Chunwei	fb51de6c2e	[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (#5543 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: chunweiy <chunweiy@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>	2025-10-05 17:28:20 +08:00
Jonas Yang CN	88ea2c4ee9	[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-10-04 08:12:24 +08:00
Yilin Fan	01423ac183	[None][feat] perf_metrics endpoint functionality improvement (#8005 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com> Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>	2025-10-02 17:43:25 -07:00
mpikulski	fc7f78c400	[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling (#8110 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-02 10:20:32 +02:00
mpikulski	ee5ae49337	[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" (#8103 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-09-30 16:53:49 -04:00
Cao Dong	62010c0ab7	[None][feat] Return topk logprobs in torch backend (#7976 ) Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>	2025-09-30 09:32:37 +08:00
amitz-nv	e5f9b6aaa0	[None][fix] Fix TRT-python multi LoRA TP=2 test arguments (#8059 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-09-29 12:20:04 -04:00
mpikulski	31a1a5ff80	[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling (#7909 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-09-29 14:52:18 +01:00
Yan Chunwei	5999fab146	[https://nvbugs/5427043 ][fix] cherrypick: request length exceeds max_num_tokens (#7718 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Iman Tabrizian	da30d496b0	[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756 )" (#7969 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-24 15:36:38 -07:00
Enwei Zhu	a1a57e83b8	[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-24 18:30:23 +08:00
Cao Dong	2f8dc6feb0	[None][feat] Return topk logprobs in torch backend (#7756 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-09-24 15:30:39 +08:00
Yuan Tong	70c3b100eb	[#7692 ][fix] recognize RequestError as per-request error in background handler (#7726 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-09-24 11:11:17 +08:00

1 2 3 4 5

249 Commits