TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yiqing Yan	ced5512ae4	[None][chore] Bump version to 1.1.0rc4 (#7525 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-04 16:30:47 +08:00
jianweiwu	7090b286b2	[None][fix] fix hunyuan_moe init bug (#7502 ) Signed-off-by: sorenwu <sorenwu@tencent.com>	2025-09-04 03:06:00 -04:00
Grzegorz Kwasniewski	3755f8ab7d	[TRTLLM-6342][fix] Fixed triggering BMM sharding (#7389 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-09-04 02:01:27 -04:00
Yanchao Lu	c622f61609	[None][fix] Fix a typo in the Slurm CI codes (#7485 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-04 01:56:27 -04:00
Emma Qiao	931816fee1	[TRTLLM-6199][infra] Update for using open driver from BSL (#7430 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-04 11:47:40 +08:00
William Zhang	a117e7a57e	[TRTLLM-7442][model] Remove unnecessary D2H copies (#7273 ) * Why? Initial profiling showed there were multiple D2H / H2D copies being scheduled in the mistral 3.1 small model. * What? This commit removes those unnecessary copies by returning `image_sizes` as a simple list instead of a tensor. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-03 23:14:20 -04:00
Jin Li	2a2dfe273b	[https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… (#7442 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 10:48:15 +08:00
Stanley Sun	db8eb0a447	[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-04 10:34:38 +08:00
Lizhi Zhou	d97c1e6bd9	[https://nvbugs/5470769 ][fix] fix disagg-serving accuracy test case (#7338 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-04 09:11:01 +08:00
Yao Yao	c1aa7f31d9	[None][fix] Fix a numerical stability issue for XQA with spec dec (#7114 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2025-09-03 20:40:05 -04:00
Frida Hou	51a2b8729e	[#7222 ][autodeploy] Separate run_shape_prop as another graph utility (#7313 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-09-03 19:32:50 -04:00
Leslie Fang	bd9ba97d89	[None][chore] Remove two unused parameters in create_py_executor (#7458 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-04 07:31:31 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
Mike Iovine	64e3bfa054	[None][fix] Fix KV cache recompute in draft_target spec decode (#7348 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-03 15:04:14 -04:00
Izzy Putterman	f156221c27	[None][doc] add GPT OSS Eagle3 blog (#7140 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-09-03 12:28:01 -04:00
Lizhi Zhou	7c73c2ff4b	[https://nvbugs/5485593 ][fix] improve accuracy/test_disaggregated_serving.py (#7366 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-03 09:38:53 -04:00
Stanley Sun	cebbf48b74	[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-03 08:36:52 -04:00
Anurag Mukkara	ae5136831f	[https://nvbugs/5472947 ][fix] wait on isend handles before reusing buffers (#7462 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-09-03 13:20:02 +05:30
Mike Iovine	79d93f9419	[https://nvbugs/5488141 ][fix] Unwaive llama3 test_eagle3 (#7486 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-03 14:10:40 +08:00
YueWeng	9a4f60687f	[https://nvbugs/5480289 ][fix] release slot manager in mtp MTPHiddenStatesManager (#7340 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-09-02 19:37:51 -07:00
Wanli Jiang	4223a9aada	[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-03 10:27:42 +08:00
Jinyang Yuan	572551b586	[None][perf] Autotune TRT-LLM Gen MoE when using CUDA graphs (#7285 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-09-03 10:08:59 +08:00
Daniel Stokes	109f27265c	[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue schedules (#6126 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-09-02 21:54:43 -04:00
Leslie Fang	42697ea32a	[None][chore] rm executor config in kv cache connector (#7372 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-03 08:13:13 +08:00
Martin Marciniszyn Mehringer	b4340ecb62	[None][chore] Add note about trtllm-serve to the devel container (#7483 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-09-02 11:27:56 -07:00
Eran Geva	75c1bb6389	[https://nvbugs/5458798 ][fix] Disabled test_trtllm_bench_backend_comparison due to timeout (#7397 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-09-02 11:21:42 -07:00
Simeng Liu	bcc55bcdf3	[https://nvbugs/5470782 ][fix] Add specific test names for test_deepseek.py (#7318 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-09-02 10:31:40 -07:00
Kanghwan	f58a183c6e	[None][chore] Fix formatting error in Gemma3 readme (#7352 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-03 01:15:37 +08:00
Emma Qiao	aae5d22bfe	[None][infra] Waive failed tests on main branch 0902 (#7482 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-02 10:16:49 -04:00
peaceh-nv	90479c50fb	[https://nvbugs/5453992 ][unwaive] Unwaive llama quickstart test (#7242 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-02 20:28:32 +08:00
JunyiXu-nv	eefe5f2093	[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-09-02 07:08:22 -04:00
HuiGao-NV	7279297717	[None][infra] waive test case failed on post-merge (#7471 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-02 06:20:08 -04:00
aalanwyr	c3c95736a1	[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-09-02 17:21:27 +08:00
tomeras91	9c8d2161d0	[None][doc] fix example in docstring (#7410 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-09-02 11:59:49 +03:00
Ivy Zhang	3799e5d460	[None][test] auto reuse torch empty cache on qa test (#7421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-02 04:44:47 -04:00
Yan Chunwei	f90375f37c	[https://nvbugs/5476580 ][fix] unwaive test_nvfp4_4gpus (#7454 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-09-02 04:17:14 -04:00
Yanchao Lu	a07bb163f7	[None][ci] Correct docker args for GPU devices and remove some stale CI codes (#7417 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-02 04:06:51 -04:00
Yiqing Yan	ff2439ff48	[None][infra] Using local variables in rerun function (#7198 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-02 13:55:26 +08:00
Jiagan Cheng	60df6b2826	[https://nvbugs/5485430 ][fix] Copy the nanobind file when using precompiled package (#7334 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-09-02 01:49:31 -04:00
Leslie Fang	e81c50dbd2	[None][chore] Use llm args in create_py_executor (#7239 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-01 16:27:55 -07:00
Tian Zheng	1b9c4cc2f7	[None][fix] Fix nanobind failure (#7425 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 17:26:40 -04:00
jiahanc	9f2dc3069d	[None] [doc] Update DeepSeek example doc (#7358 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-01 14:43:58 -04:00
Mike Iovine	b3c57a7042	[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-01 14:37:44 -04:00
Emma Qiao	01dfd3af1b	[None][infra] Waive failed case on main 0901 (#7447 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:27:24 +08:00
bhsueh_NV	16e9d1121c	[https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker (#7332 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-01 16:22:45 +08:00
yuanjingx87	2b286ae613	[None][infra] Disable GB200-PyTorch-1 due to OOM issue (#7386 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-01 01:56:31 -04:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Dimitrios Bariamis	b0558c73fc	[None][fix] Fix build of tritonbuild/tritonrelease image (#7003 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Dimitrios Bariamis	44cc308e6a	[https://nvbugs/5474037 ][fix] Fix building tritonbuild/tritonrelease images (#7157 ) Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com> Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
QI JUN	ed4087a295	[https://nvbugs/5374016 ][fix] improve error message (#6893 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00

1 2 3 4 5 ...

2626 Commits