TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Simeng Liu	2b27810198	[https://nvbugs/5494718 ][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-10-24 19:09:07 -07:00
Erin	812bc8c954	[TRTLLM-8513][feat] Add back worker extension (#8482 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-10-24 20:30:28 -04:00
jthomson04	02081e2390	[None][feat] Support KV Connector with Disagg Prefill Worker (#8246 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-10-24 11:09:06 -07:00
Chang Liu	e47c787dd7	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-24 13:40:41 -04:00
Yechan Kim	2d86d6be40	[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-24 12:53:40 -04:00
Aurelien Chartier	cdf0403c64	[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest (#8634 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-10-24 06:44:34 -07:00
Chuang Zhu	2420918e5b	[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-24 08:58:16 -04:00
Suyog Gupta	f512ddaeef	[None][feat] add skip condition in AutoDeploy's triton fused moe kernel (#8632 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-10-24 08:46:17 -04:00
Wanli Jiang	f448043d88	[None][feat] Support base64 video input (#8458 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-10-24 10:23:13 +08:00
Zheng Duan	e666a704f5	[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-10-23 22:09:21 -04:00
QI JUN	6ee1c87595	[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-24 08:55:49 +08:00
h-guo18	23920223ab	[#4585 ][feat] Replace unified attention before export (#8303 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>	2025-10-23 18:02:04 -04:00
QI JUN	cc81028547	[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig (#8558 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-23 10:32:09 -04:00
Robin Kobus	3a5845e293	[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format (#7811 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-10-23 10:27:56 +02:00
Shijie	928247a3f9	[https://nvbugs/5451205 ][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943 ) Signed-off-by: Shijie Wang <jaywan@nvidia.com>	2025-10-23 15:55:10 +08:00
Suyog Gupta	2956978da3	[None][feat] Enable rms norm fusion for Nemotron MOE (#8563 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-23 00:09:42 -04:00
sunnyqgg	ea3e0eea51	[TRTLLM-7954][feat] Target model KV cache rellocation (#8421 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-10-23 09:36:50 +08:00
Anthony Chang	8a3b870e09	[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-23 09:14:18 +08:00
Anish Shanbhag	15de45d782	[TRTLLM-8682][chore] Remove auto_parallel module (#8329 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-22 20:53:08 -04:00
Leslie Fang	e5865de518	[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args (#8493 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-10-22 20:03:18 -04:00
Patrice Castonguay	879039f6d5	[https://nvbugs/5429636 ][feat] Kv transfer timeout (#8459 ) Signed-off-by: raayandhar <raayan.dhar@gmail.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <raayan.dhar@gmail.com>	2025-10-22 09:29:02 -04:00
Yan Chunwei	f81caf5491	[None][chore] replace print_colored_debug with logger_debug (#8417 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-22 17:54:38 +08:00
Yan Chunwei	3f9dbc76c0	[None][fix] fix rpc unique addr related issue (#8419 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-10-22 04:47:18 -04:00
Yiqing Yan	b04e51291a	[None][chore] Bump version to 1.2.0rc2 (#8562 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-10-22 14:35:05 +08:00
sunnyqgg	90080e0e09	[https://nvbugs/5556020 ][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-10-22 09:58:22 +08:00
Leslie Fang	50d4e5bc06	[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-10-22 08:33:48 +08:00
Chenghao Zhang	bac9e8c2ad	[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469 )	2025-10-21 15:32:01 -07:00
Lizhi Zhou	23d5280a90	[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-21 17:25:07 -04:00
Lucas Liebenwein	9b54b3bfaf	[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype (#8510 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-21 17:07:06 -04:00
YueWeng	8dc4aac5b6	[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-10-21 11:11:04 -04:00
Pengyun Lin	a4227cf1b0	[None][feat] Support Qwen3 reasoning parser (#8000 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-21 14:08:39 +08:00
Bo Li	ebb62e17d8	[None][feat] Add alltoall to trtllm-gen MoE backend. (#8481 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-10-21 12:42:54 +08:00
mpikulski	87eb5086fb	[None][fix] restore list[list[list[int]]] in add_token (#8502 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-20 22:34:57 -04:00
Yechan Kim	85d5aa7763	[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (#7789 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-21 11:11:24 +09:00
Suyog Gupta	7050b1ea49	[#8272 ][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-10-20 15:31:52 -07:00
Lucas Liebenwein	55c468b218	[#8461 ][feat] AutoDeploy: trtllm-serve bug fix + unit test (#8462 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-20 16:06:39 -04:00
Pamela Peng	b818a912d7	[https://nvbugs/5540752 ][fix] Support quantized Phi4 MM models (#8190 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-10-20 06:36:09 -04:00
mpikulski	97ce0ecefe	[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements (#8398 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-20 11:15:41 +02:00
ChristinaZ	c8b9998acb	[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-20 10:08:31 +08:00
Bo Deng	dd25595ae8	[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-10-19 19:24:43 +08:00
jthomson04	852316886e	[None][fix] Fix KV event consumption (#6346 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-10-18 15:41:26 -07:00
Lucas Liebenwein	41169fb20c	[None][feat] AutoDeploy: chunked prefill support (#8158 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-18 00:47:35 -07:00
QI JUN	4a8ac8dd62	[TRTLLM-8480][chore] clean create_py_executor API (#8412 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-17 23:52:02 -04:00
Wanli Jiang	58b43a6dab	[None][fix] Fix get_num_tokens_per_image for nano-v2-vlm (#8425 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-10-18 08:51:35 +08:00
Kyle McGill	136e0e6882	[None][feat] Enable CUDA graph support for KvConnectorWorker API (#8275 ) Signed-off-by: Kyle McGill <kmcgill@nvidia.com> Signed-off-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>	2025-10-17 18:09:03 -04:00
Anish Shanbhag	5ff4f88be6	[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-17 16:13:22 -04:00
h-guo18	55fed1873c	[None][chore] AutoDeploy: cleanup old inference optimizer configs (#8039 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-17 15:55:57 -04:00
Grzegorz Kwasniewski	bb7fdcebf4	[TRTLLM-8201][feat] Topological graph helpers (#8457 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-10-17 12:34:19 -04:00
zhhuang-nv	7a2bab93f0	[None][test] Add post merge test for Seed-OSS-36B-Instruct (#8321 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-10-17 02:30:33 -07:00
Tracin	dd06612d0e	[https://nvbugs/5540138 ][fix] Fix shape error when duplicating kv. (#8390 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-17 10:07:29 +08:00

1 2 3 4 5 ...

1460 Commits