TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yanchao Lu	e72ade33c2	[None][chore] Update commit msg for adding lock files (#8448 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-10-17 00:24:26 -07:00
Leslie Fang	023e515d33	[None][chore] Combine two documents of feature combination matrix (#8442 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-10-17 14:31:33 +08:00
yufeiwu-nv	1e1f430163	[None][test] Filter out all fp8 test case for A100. (#8420 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-10-16 20:42:50 -07:00
Ivy Zhang	70a0f5beb6	[TRTLLM-8580][test] save runtime report periodically (#8312 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-17 10:47:26 +08:00
Tracin	dd06612d0e	[https://nvbugs/5540138 ][fix] Fix shape error when duplicating kv. (#8390 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-17 10:07:29 +08:00
yuanjingx87	85deacf117	[None][infra] Update CI allowed list 2025_10_15 (#8403 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-10-16 14:17:34 -07:00
yuanjingx87	3481d03470	[None][infra] Fix for generate lockfile pipeline (#7820 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-10-16 14:17:18 -07:00
Iman Tabrizian	22eb1633ae	[None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang (#8413 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-10-16 18:59:18 +02:00
John Calderon	46ee7acb33	[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling (#7539 ) Signed-off-by: John Calderon <johncalesp@gmail.com> Signed-off-by: John Calderon <jcalderon@nvidia.com> Signed-off-by: john calderon <jcalderon@nvidia.com> Signed-off-by: John Calderon <jcalderon@nvidia>	2025-10-16 17:49:22 +02:00
Yanchao Lu	bde606f82d	Update Dockerfile.multi Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-10-16 22:46:19 +08:00
Jin Li	d594c2d0ff	[https://nvbugs/5537348 ][fix] Use device tensor index for MTP (#8062 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yiqing Yan	05dd437084	[https://nvbugs/5565541 ][fix] Add timeout threshold for H100 FHMA test (#8354 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
bhsueh_NV	69325e1aa3	[https://nvbugs/5574556 ][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI (#8351 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Lizhi Zhou	982d4b65e8	[https://nvbugs/5550671 ][fix] fix disagg-serving multinodes test failure (#8307 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Chuang Zhu	18a534d2b4	[https://nvbugs/5465642 ][fix] Increase server timeout to wait weight loading (#8297 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Jin Li	47e6eea3fa	[https://nvbugs/5543770 ][fix] Update to Cutlass v4.2.1 (#8055 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Patrice Castonguay	b7602f7bd4	[https://nvbugs/5534837 ][fix] Fix KV cache split on long context (#8247 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Enwei Zhu	526cad37d7	[https://nvbugs/5568951 ][fix] Fix guided decoding disagg tests (#8311 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Zhanrui Sun	19241626d0	[https://nvbugs/5563653 ][infra] reduce docker image layers (#8250 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yechan Kim	4230639370	[https://nvbugs/5550722 ][fix] Fix image load (#8093 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yechan Kim	9587f099ac	[https://nvbugs/5547434 ][fix] Fix Qwen2.5-VL device_path error (#8057 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Ivy Zhang	1b559ba91d	[None][chore] Update test configs for release (#8224 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Ivy Zhang	4789c1e588	[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list (#8212 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Ivy Zhang	be2ab98233	[None][chore] Update constaintfor release (#8211 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yan Chunwei	4e51148088	[https://nvbugs/5532023 ][fix] unwaive GenerationExecutor tests (#8251 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yukun He	179c7dc501	[https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Enwei Zhu	57a4ef870a	[None][fix] Fix chunked prefill state of draft request (#8067 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
sunnyqgg	dd61454d5f	[https://nvbugs/5461761 ][fix] Unwaive eagle3 test (#8363 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-10-16 09:51:48 -04:00
Wangjue Yao	9865d3d770	[None][feat] Support cached tokens for Openai server (#7637 ) Signed-off-by: wjueyao <wyao123@terpmail.umd.edu> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-16 20:51:37 +08:00
xinhe-nv	f70eff30b3	[TRTLLM-8638][fix] waive llam4 tests on H20 (#8416 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-10-16 03:14:56 -07:00
xiweny	89d03d7668	[https://nvbugs/5532789 ] [doc] Add documents about CUDA 12.9 (#8411 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-16 00:05:17 -07:00
HuiGao-NV	4e6a492aa3	[None][chore] Isolate several intermittent cases (#8408 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-15 23:48:31 -07:00
Yan Chunwei	42ab473bb0	[https://nvbugs/5583261 ][ci] waive test_fetch_responses_streaming_sync (#8407 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-15 23:19:31 -07:00
chinamaoge	ee588a73ac	[None][fix] Fix the error where checkpoint_dir is assigned as NONE wh… (#8401 ) Signed-off-by: maoge <maoge23@qq.com> Co-authored-by: maoge <maoge23@qq.com>	2025-10-16 13:37:43 +08:00
Min Yu	0a0159fdd8	[https://nvbugs/5378031 ] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend (#7286 ) Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>	2025-10-16 11:07:48 +08:00
Cao Dong	e75b4f9f65	[None][feat] Dev DeepConf (#8362 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-10-16 11:01:31 +08:00
xiweny	4143887370	[https://nvbugs/5541494 ] [fix] Remove waivers (#8353 ) Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-15 19:10:35 -07:00
Wanli Jiang	ebf0e51206	[TRTLLM-8579][feat] Support quantized model for nano-v2-vlm (#8304 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-10-16 09:44:11 +08:00
ChristinaZ	db1c271bc6	[None][feat] Revise the calculation related to TileN in routing of MOE TRTLLM backend (#8148 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-16 09:15:46 +08:00
Yan Chunwei	206cf31705	[https://nvbugs/5560921 ][fix] GenerationExecutor RPC (#8209 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-10-16 09:05:22 +08:00
Chuang Zhu	40d129a415	[None][fix] Fix cache buffer size for window (#8320 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-16 09:01:11 +08:00
HuiGao-NV	e265eb5fe9	[None][feat] reuse cudagraph memory pool in normal forward flow (#8095 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-16 07:08:44 +08:00
dongfengy	7a0aa64973	[None][fix] Refactor triton paddings (#6980 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com> Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>	2025-10-15 12:59:01 -07:00
QI JUN	65ec01b257	[TRTLLM-8532][chore] clean warmup method of ModelEngine (#8264 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-15 08:40:58 -07:00
Venky	7efaa5216f	[None] [chore] Add OSS compliance to CODEOWNERS (#8375 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-10-15 06:22:32 -07:00
Yukun He	56c20665a9	[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. (#6924 ) Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-15 21:18:11 +08:00
mpikulski	0510b34588	[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py (#8317 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-15 02:53:57 -07:00
QI JUN	1a1c9a29ab	[None][ci] move all llama4 test cases to post merge (#8387 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-15 16:36:37 +08:00
mpikulski	93a4b7f1b6	[None][chore] update torch_dtype -> dtype in 'transformers' (#8263 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-15 17:09:30 +09:00
QI JUN	616d1df7a0	[None][chore] set the default value of max_num_tokens explicitly (#8208 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-14 23:03:02 -07:00

1 2 3 4 5 ...

3222 Commits