TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
amitz-nv	750d15bfaa	[https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-09-16 16:35:13 +08:00
Kaiyu Xie	6eef19297f	[None] [chore] cherry pick changes on slurm scripts from `release/1.1.0rc2` (#7750 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-09-16 16:07:13 +08:00
Li Min	b278d06481	[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-16 14:25:26 +08:00
Guoming Zhang	085271eceb	[None][doc] Clean the doc folder and move the outdated docs into lega… (#7729 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-16 11:43:19 +08:00
Bo Li	3f4e160cba	[None][chore] Fix error when running trtllm-bench without cuda graph. (#7725 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-09-15 20:30:23 -07:00
Void	103b554734	[None][fix] Ensure that the W4A8 custom input scale remains aligned across all ranks (#7614 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-09-16 11:04:26 +08:00
xinhe-nv	cf55927064	[None][chore] Add failed cases into waives.txt (#7735 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 10:58:06 +08:00
Yanchao Lu	e5cead1eb9	[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing (#7739 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-16 09:59:18 +08:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
Shi Xiaowei	809c4d20c0	[None][doc] Fix the link in the doc (#7713 ) Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>	2025-09-16 09:50:25 +08:00
Necofish	96f11b10ae	[None][feat] support attention dp for qwen3 dense model (#7618 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-09-16 09:33:22 +08:00
QI JUN	44d5ccfdd9	[None][ci] move qwen3 tests from GB200 to B200 (#7733 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-16 08:12:28 +08:00
Ziyi Xiong	536e8776cd	[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding (#7651 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-16 07:33:44 +08:00
Lucas Liebenwein	857c0b45be	[None][infra] AutoDeploy: codeowners for autodeploy unit tests (#7743 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-09-15 11:20:12 -07:00
Izzy Putterman	8097be7e9c	[None][feat] Eagle, use last hidden post norm (#7546 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-09-15 12:23:57 -04:00
Yanchao Lu	0c9430e5a5	[None][ci] Test waives for the main branch 09/15 (#7709 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-15 22:13:56 +08:00
jmydurant	7deefb3d2b	[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-09-15 21:43:49 +08:00
Zheng Duan	24fc1f9acf	[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow (#7553 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-09-15 07:26:01 -04:00
Wanli Jiang	e080294725	[TRTLLM-7918][feat] Revert "Support kvcache reuse for phi4mm (#7563 )" (#7722 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-15 17:19:44 +08:00
ixlmar	965a3dab90	[None][test] add test for min_tokens (#7678 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-09-15 08:59:23 +01:00
Wanli Jiang	fc9f4c9295	[TRTLLM-7918][feat] Support kvcache reuse for phi4mm (#7563 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-15 15:47:00 +08:00
HuiGao-NV	335c007df8	[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-15 15:37:58 +08:00
DylanChen-NV	d5df0af017	[https://nvbugs/5467981 ][fix] Fix Qwen2.5-VL fails with cuda graph padding (#7122 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-09-15 15:02:34 +08:00
Ivy Zhang	ddfe0320b3	[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-15 13:38:52 +08:00
JunyiXu-nv	a2c45d82c3	[None][chore] Enable multiple postprocess workers tests for chat completions api (#7602 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-09-15 12:16:44 +08:00
xinhe-nv	b69e3e9f99	[None][chore] Add failed cases into waives.txt (#7682 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-15 11:44:52 +08:00
Chang Liu	47e37755a3	[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-14 20:10:10 -07:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
Yanchao Lu	70aa4e28c1	[None][ci] Test waives for the main branch 09/14 (#7698 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-14 23:48:04 +08:00
Yanchao Lu	89fc136972	[None][ci] Some improvements for Slurm CI (#7689 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-14 16:56:32 +08:00
Zhanrui Sun	1f43854496	[TRTLLM-6791][infra] Add check for uploading stage name and avoid overriding test result tar file (#6742 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-13 01:15:33 +08:00
Zhanrui Sun	7d73a89ad0	[TRTLLM-7169][infra] Fix Slurm multi-node test showing "Submit Test Results" in the test name (#6856 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-12 18:46:19 +08:00
Pengyun Lin	c2bc39af63	[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-09-12 15:32:34 +08:00
Guoming Zhang	ef676fc71f	[https://nvbugs/5513192 ][fix] Add the missing param for kv_cache_tran… (#7679 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-11 19:00:16 +08:00
Chang Liu	3a9847eb84	[https://nvbugs/5498165 ][fix] fix permission error for config file lock (#7656 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-09-11 10:36:51 +08:00
Fan - Yunfan	e3117731b3	[None][fix] Fix the incorrect header file import in dataType.h (#7133 ) Signed-off-by: fanyunfan <2569548856@qq.com> Co-authored-by: fanyunfan <2569658856@qq.com> Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2025-09-11 08:59:04 +08:00
QI JUN	656f229b58	[None][ci] move some test cases from l40s to a30 (#7684 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-11 07:22:34 +08:00
Kanghwan	aa152ce8cf	[None][infra] Adjust labeling llm prompt for bug issues (#7385 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-11 05:10:31 +08:00
Emma Qiao	9986070044	[None][infra] Waive failed cases on main 0910 (#7676 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-11 01:43:29 +08:00
Dom Brown	fc9d426589	[https://nvbugs/5505402 ] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-10 18:30:48 +01:00
v-shobhit	0652514c6d	[None][feat] Use a shell context to install dependancies (#7383 ) Signed-off-by: Shobhit Verma <shobhitv@nvidia.com> Signed-off-by: v-shobhit <161510941+v-shobhit@users.noreply.github.com> Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>	2025-09-10 09:57:37 -07:00
nvamyt	222e01662c	[https://nvbugs/5488212 ][waive] Waive failed tests for L20 (#7664 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-10 22:32:15 +08:00
Leslie Fang	d219a4f225	[None][chore] remove executor config in kv cache creator (#7526 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-10 21:14:44 +08:00
Linda	a4312ba743	[https://nvbugs/5477359 ][fix] Nanobind: Allow none types for fields in result (#7672 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-09-10 14:13:46 +01:00
xinhe-nv	207c5258c4	[https://nvbugs/5494698 ][fix] skip gemma3 27b on blackwell (#7505 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-10 21:09:27 +08:00
Bo Deng	bf57829acf	[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-10 17:35:51 +08:00
Yiqing Yan	76c5e1a12f	[None][infra] Bump version to 1.1.0rc5 (#7668 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-10 16:06:54 +08:00
Kanghwan	758c22f832	[#7208 ][fix] Fix config type of MedusaConfig (#7320 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-09 23:25:17 -07:00
Frida Hou	bbb5ae3349	[#5861 ][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-09-10 13:00:06 +08:00
Zheyu Fu	c353ff342e	[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-10 12:53:59 +08:00

1 2 3 4 5 ...

2759 Commits