TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 15:55:08 +08:00

Author	SHA1	Message	Date
Yi Zhang	ada463d15d	[None][fix] Fix comments for kv cache manager v2 (#11207 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2026-02-04 23:31:29 -05:00
TensorRT LLM	4adf76d860	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-05 03:42:54 +00:00
dongfengy	0bd4630cd1	[https://nvbugs/5854860 ][fix] Fix cutedsl argmax on sm120 (#11181 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-02-04 17:15:31 -05:00
dongfengy	ad2d1df4a9	[https://nvbugs/5849697 ][fix] Refine QA Test List for SM120 (#11248 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-02-04 11:59:04 -08:00
Simeng Liu	d9fd8cc951	[https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-04 12:46:31 -05:00
Gal Hubara-Agam	767b8dcab3	[None][chore] AutoDeploy: Set nanov3 and superv3 configs to use flashinfer ssm (#11183 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-02-04 09:46:15 -08:00
Grzegorz Kwasniewski	d90a8e5700	[TRTLLM-10673][feat] Improved layer classification for sharding (#10718 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-02-04 18:06:10 +01:00
Lucas Liebenwein	925d911fc0	[#10966 ][feat] AutoDeploy: kv cache manager integration [2/2] (#11149 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-02-04 09:44:27 -05:00
Xianjie Qiao	e2bd9cce1e	[None][feat] Support disagg slurm jobs rescheduling (#11218 )	2026-02-04 22:10:36 +08:00
Yueh-Ting (eop) Chen	f6fff18142	[https://nvbugs/5624818 ][fix] Work around accuracy issue by enforcing paged_context_fmha on Hopper for fmha_v2 (#11192 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2026-02-04 19:21:50 +08:00
Zhenhuan Chen	3d8c1a51bd	[None][feat] move some disagg script's env configs from bash to submit.py (#10223 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2026-02-04 04:32:04 -05:00
mpikulski	f0ca62b175	[None][fix] make health_generate work with beam search (#11097 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-04 09:46:19 +01:00
xxi	02b80bfd58	[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends (#11128 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-02-04 15:57:56 +08:00
Gal Hubara-Agam	de6931bbfd	[None][fix] Fix selective_state_update perf regression for T=1 decode path (#11194 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-02-04 09:01:34 +02:00
chenfeiz0326	04b7db3ab5	[TRTLLM-8263][feat] Add Disagg Perf Tests (#10912 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-02-04 10:16:11 +08:00
tburt-nv	588db0ed64	[None][chore] bump version to 1.3.0rc3 (#11238 ) Signed-off-by: Tyler Burt <tburt@nvidia.com>	2026-02-04 09:30:45 +08:00
Dmitry Barsukoff	5d522295e9	[None][fix] Set continuous_usage_stats default to False to follow OpenAI protocol (#10644 ) Signed-off-by: Dmitry Barsukoff <riZZZhik@gmail.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2026-02-03 16:04:54 -08:00
Taylor Yeonbok Lee	f9e6045f39	[#11086 ][feat] Optimize Auto Deploy weight loading by preloading weights to CPU (#11059 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-02-03 13:23:10 -08:00
Lizhi Zhou	f9c4bdf6cf	[TRTLLM-8921][feat] implement gen-first disagg_service (#11020 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-03 15:46:11 -05:00
yuanjingx87	8f90330239	[TRTLLM-10019][infra] Move 6 h100 test stage to aihub platform (#11039 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-02-03 12:42:59 -08:00
mpikulski	710d6ef668	[https://nvbugs/5739981 ][fix] unwaive tests using opt-125M (#11100 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-03 15:21:01 +01:00
Chenjie Luo	2532eb5adc	[None][fix] Align kv_scales with modelopt HF checkpoint (#10745 ) Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>	2026-02-03 08:03:42 -05:00
xinhe-nv	20946554f6	[None][chore] Add failed cases into waives.txt (#11216 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-03 04:15:31 -05:00
Yiqing Yan	a56aaa585e	[TRTLLM-10839][infra] Set rerun report stage UNSTABLE and pipeline SUCCESS in post-merge when there are passed rerun tests (#11210 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-02-03 15:44:15 +08:00
xinhe-nv	b7767f682f	[None][chore] Add failed cases into waives.txt (#11202 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-03 02:26:02 -05:00
xinhe-nv	03f51bb767	[None][chore] Add failed cases into waives.txt (#11193 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-03 01:46:17 -05:00
Anish Shanbhag	e308eb50f4	[TRTLLM-10803][fix] Fix mocking of HuggingFace downloads in `with_mocked_hf_download` (#11200 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-02-02 21:58:15 -08:00
Taylor Yeonbok Lee	304dc6f3c0	[None][chore] Print memory usage before/after accuracy test in CI (#11155 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-02-03 00:23:14 -05:00
TensorRT LLM	12b4ebd0ad	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-02-03 03:13:38 +00:00
Abby Wei	061d7879d3	[TRTLLM-10307][infra] Add --high-priority in bot help message (#11133 ) Signed-off-by: Abby Wei <mengzew@nvidia.com>	2026-02-03 10:35:05 +08:00
Yiqing Yan	13420178fc	[TRTLLM-10561][infra] Fix jaraco-context and wheel vulnerability (#10901 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-02-03 09:54:11 +08:00
Venky	897eb0df2b	[None][doc] Fix GLM4-MoE Eagle support documentation (#11198 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-02-02 13:36:09 -08:00
gramnarayan	585fbb2734	[#10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation (#11073 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-02-02 09:51:10 -08:00
Izzy Putterman	3ef8a4639b	[None][feat] Nemotron H: Eagle3 support (#11131 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-02-02 10:26:25 -05:00
Yanchao Lu	cd7762a2fa	[None][test] Fix an invalid test name (#11195 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-02 23:25:51 +08:00
Rundong Li	f1b85fea4c	[None][feat] Integrate cuda.tile RMS norm kernels (#9725 ) Signed-off-by: Rundong (David) Li <davidli@nvidia.com> Co-authored-by: Jinman Xie <jinmanx@nvidia.com> Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com> Co-authored-by: Qiqi Xiao <qiqix@nvidia.com> Co-authored-by: Biao Wang <biaow@nvidia.com> Co-authored-by: Thomas Schmid <thschmid@nvidia.com>	2026-02-02 19:44:27 +08:00
Mike Iovine	13b0ab9c0e	[None][fix] Fix MTP 1-model sampler (#10369 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Mike Iovine	d9aef94431	[https://nvbugs/5814914 ][fix] Fix llama sm120 spec dec (#10765 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Ivy Zhang	fa5c3ead05	[None][test] Update test list (#10883 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yukun He	de465efc5f	[https://nvbugs/5814309 ][fix] Use NCCL as fallback to avoid crash due to insufficient memory (#10928 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Zheyu Fu	d31482686c	[https://nvbugs/5680911 ][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Enwei Zhu	7e5e5b90b9	[https://nvbugs/5748600 ][ci] Update guided decoding waive list (#10904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yuxian Qiu	dd0a5491ba	[https://nvbugs/5701445 ][chore] unwaive tests. (#10913 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yuxian Qiu	40d6f23dad	[https://nvbugs/5784543 ][chore] unwaive test. (#10906 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Lucas Liebenwein	68a18f7a3a	[https://nvbugs/5814247 ][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729 ) (#10850 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Enwei Zhu	ccdd8461ac	[None][fix] Always reset drafting states for GuidedDecoder (#10899 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Michal Guzek	fafc22e3d4	[https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
William Zhang	bc2487bc2c	[https://nvbugs/5826962 ][fix] Fix PD disaggregation for VLMs that use mrope (#10865 ) * Why? Commit `a6a8898` enabled EPD disaggregation for VLMs that use mrope (e.g. qwen). However, this broke PD disaggregation for these sames models. * What? This commit fixes this, and adds a unit test that guards against it. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Lizhi Zhou	4d282bd7c1	[https://nvbugs/5821433 ][fix] fix test_auto_scaling for 2 GPUs (#10866 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Zhenhuan Chen	6c2ecad2fe	[https://nvbugs/5769425 ][fix] add syncthreads for tinygemm to resolve intermittent accuracy problem (#10873 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00

1 2 3 4 5 ...

4976 Commits