TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Ivy Zhang	dda91b5117	tests: add QA test cases (#5959 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:14:25 +08:00
Yan Chunwei	7568deb2f1	[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig (#6001 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-16 16:05:38 +08:00
Ivy Zhang	763012a88a	[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill (#6051 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-16 16:04:08 +08:00
peaceh-nv	f5f31beee1	feat: Add deepseek-lite tests for RTX pro 6000 (#5903 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-16 15:51:45 +08:00
Bo Deng	ec3ebae43e	[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1 (#5991 ) Signed-off-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com> Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-07-16 13:54:42 +08:00
Zheng Duan	38db4bc7fb	feat: use session abstraction in data transceiver and cache formatter (#5611 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-16 13:52:44 +08:00
Zheng Duan	385af53a4d	[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test (#6041 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-16 13:52:13 +08:00
nv-guomingz	509dc7c831	chroe: upgrade modelopt to 0.33 (#6058 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-16 13:10:48 +08:00
Yiqing Yan	e51c541617	chore: Bump version to 1.0.0rc4 (#6086 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-16 13:02:23 +08:00
Wanli Jiang	8679a058a3	fix: Unable to load phi4-model with tp_size>1 (#5962 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-16 11:39:41 +08:00
Iman Tabrizian	665b4469b3	[fix] Fix Triton build (#6076 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-16 11:17:22 +08:00
Aurelien Chartier	6a47cac981	feat: Add support for Triton request cancellation (#5898 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-15 20:52:43 -04:00
danielafrimi	edab7532dd	feat/add latency support for trtllm bench (#3730 ) Signed-off-by: Ubuntu <dafrimi@nvidia.com> Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Daniel Afrimi <dafrimi@nvidia.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>	2025-07-15 13:13:49 -07:00
brb-nv	9214ac662a	test: Add regression tests for Gemma3 VLM (#6033 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-15 11:37:56 -07:00
Fanrong Li	7a1af1c738	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 (#5989 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-07-16 01:33:12 +09:00
Xiaodong (Vincent) Huang	0523f77b36	support TRTLLM_DEEP_EP_TOKEN_LIMIT to allow run deep-ep on memory-con… (#5684 ) Signed-off-by: Vincent Huang <vincenth@nvidia.com>	2025-07-15 18:34:21 +03:00
Jinyang Yuan	e761231c0b	[fix] Move NCCL group in all-gather and reduce-scatter OPs outside the outer loop (#6053 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-07-16 00:25:32 +09:00
Tailing Yuan	4a26bd6500	Fix: pad DeepEP fp4 recv tensors if empty (#6048 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-07-15 23:14:01 +09:00
MinaHuai	9ebc3ab9c4	[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision (#5998 ) Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>	2025-07-15 10:01:35 -04:00
Jaedeok Kim	ab1c54709d	fix: adjust window sizes of VSWA at torch backend (#5880 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>	2025-07-15 17:41:54 +08:00
Yiteng Niu	9e871ca582	[infra] add more log on reuse-uploading (#6036 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-15 17:18:38 +08:00
ruodil	2a147c4d01	test: add llama_v3.3_70b_cases in perf test (#6035 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-15 17:53:59 +10:00
ruodil	2504aa552e	test: add recursive updating pytorch config and change MOE backend format in perf test (#6046 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-15 17:53:15 +10:00
nv-guomingz	4e4d18826f	chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-15 15:50:03 +09:00
Zhanrui Sun	d811843a08	infra: [TRTLLM-6313] Fix the package sanity stage 'Host Node Name' in… (#5945 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-07-15 15:39:31 +09:00
Lucas Liebenwein	e499f6c44a	[Fix] check for ImportError or ModuleNotFoundError for deep_ep_utils (#6026 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-07-15 14:31:35 +09:00
Yiqing Yan	6b35afaf1b	[Infra][TRTLLM-6013] - Fix stage name in single stage test rerun report (#5672 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-15 12:27:21 +09:00
Zhanrui Sun	01b2def5ef	infra: [TRTLLM-6331] Support show all stage name list when stage name check failed (#5946 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-07-15 12:06:03 +09:00
jiahanc	24dfd4cd0b	Doc: Update llama-3.3-70B guide (#6028 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-07-15 11:37:26 +09:00
Daniel Stokes	dd2491f47d	fix: Fix MOE benchmark to rotate buffers to prevent L2 cache reuse (#4135 ) Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>	2025-07-15 13:40:42 +12:00
Rashid Kaleem	2ea4077993	[Model load] Fix llama min-latency model load (#5883 ) Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>	2025-07-15 09:29:19 +08:00
Yechan Kim	2320f12321	doc: update EXAONE 4.0 news (#6034 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-15 10:26:51 +09:00
ixlmar	f225f5cd2e	[nvbugs-5318143] fix: restrict PyTorch memory usage to avoid OOMs (#5964 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-15 06:49:42 +08:00
Daniel Stokes	f277afdd93	perf: Enable 128x256 tile shapes for FP4 MOE CUTLASS backend (#5986 ) Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>	2025-07-14 14:04:15 -07:00
Iman Tabrizian	c4ee535afb	[fix] fix eagle3 two model disaggregated serving test (#6014 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-15 04:26:04 +09:00
Robin Kobus	6d4b045d1f	refactor: Remove enforced sorted order of batch slots (#3502 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-14 17:23:02 +02:00
brb-nv	f5f5be9e94	enh: Bidirectional mask with multiple images for Gemma3 (#5976 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:39:18 +08:00
brb-nv	1a2d96919c	feat: Update Gemma3 Vision Encoder (#5973 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:38:10 +08:00
Alex Zhang	6c30d78b78	[TRTLLM-5653][infra] Run docs build only if PR contains only doc changes (#5184 ) Signed-off-by: Alex Zhang <13271672+zhanga5@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Alex Zhang <13271672+zhanga5@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-14 21:40:33 +08:00
Yechan Kim	63139fdcff	feat: EXAONE4.0 support (#5696 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-14 22:28:10 +09:00
Clay	dbf29184dc	fix #4974 : A thread leak issue in scaffolding unittest (#5020 ) Signed-off-by: Clay <ccs96307@gmail.com>	2025-07-14 20:22:03 +09:00
Kaiyu Xie	aa97fbb2ad	[Nvbug/5383670] fix: switch test case to non-fp4 ckpt for more GPU coverage (#5882 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-14 20:21:46 +09:00
Yiqing Yan	c720d7f779	Waive L0 test (#6002 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-14 19:55:34 +09:00
Zhanrui Sun	3a0ef73414	infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check (#5709 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-07-14 18:52:13 +09:00
Zhenhuan Chen	30608a5e6d	[https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error (#5865 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-14 17:17:30 +08:00
Robin Kobus	5a61d64b5b	[nvbugs/5345391] fix: chunked prefill + overlap scheduling (#5761 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Pengyun Lin	3fcaa8a310	[nvbug 5327706][fix] fix mgmn postprocess error (#5835 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
ruodil	347520494b	test: remove duplicate cases in perf sanity test (#5870 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yi Zhang	966e41a900	doc: Update gb200 doc (#5840 ) Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Bo Li	6d79559f3e	fix: [https://nvbugs/5351130 ][https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. (#5821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-14 17:17:30 +08:00

1 2 3 4 5 ...

1846 Commits