TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Matthias Jouanneaux	eda1467061	[TRTLLM-5966][feat] Helix: add alltoall op (#6815 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-09-25 07:18:29 -07:00
PeganovAnton	396c0ea677	[None][chore] relax version constraints on fastapi (#7935 ) Signed-off-by: Anton Peganov <apeganov@nvidia.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-09-25 21:58:53 +08:00
Yueh-Ting (eop) Chen	c5012423f5	[None][chore] Remove developer name in comment (#7981 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-09-25 06:43:38 -07:00
Yan Chunwei	40c6103ef8	[None][doc] add Llama PP known issue to release note (#7959 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	663ce3a4de	[None][doc] fix invalid links in perf benchmarking. (#7933 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	202bed4574	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
QI JUN	961418908c	[https://nvbugs/5531963 ][fix] cherry pick #7725 (#7907 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	5999fab146	[https://nvbugs/5427043 ][fix] cherrypick: request length exceeds max_num_tokens (#7718 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	cb466a846d	[None][fix] api stability bug in status label (#7861 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	9d48898def	[None][doc] add stable label to all the un-labelled arguments in LLM class (#7863 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Zac Patel	c38d4cf6a6	[None][doc] Update Perf-Overview.md for release/1.0 (#7848 ) Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	57c098956e	[None][doc] add a guide for modifying APIs (#7866 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	9f0f52249e	[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	5ecc8d0ee2	[None][doc] Replace the main in the examples' link with commit id. (#7837 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	5342c607cd	[https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case (#7717 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Tao Li @ NVIDIA	44d7c3b245	[https://nvbugs/1234567 ][fix] Revert https://github.com/NVIDIA/TensorRT-LLM/pull/7768/files (#7813 ) Signed-off-by: Tao Li Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	4a09be40f0	[None][doc] Update docker cmd in quick start guide and trtllm-serve … (#7787 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
xinhe-nv	e30d9aced9	[https://nvbugs/4955671 ][fix] update test list (#7980 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-25 02:58:09 -07:00
Chuang Zhu	791e73edf6	[https://nvbugs/5536141 ][fix] fix_disagg_single_gpu_test (#7990 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-25 02:07:22 -07:00
Jinyang Yuan	b622cde5d5	[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices (#7419 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-09-25 10:27:57 +02:00
Emma Qiao	cb53261aaf	[None][infra] Unwaive some tests since dev already have a PR to collect more info (#7984 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-25 01:03:13 -07:00
Wanli Jiang	22b45ff9c7	[TRTLLM-7758][feat] Phi4-mm image modality inference optimization (#7918 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-25 15:58:29 +08:00
WeiHaocheng	259cc66c34	[None][doc] scaffolding tech blog part one (#7835 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: zheyuf <zheyuf@NVIDIA.com> Co-authored-by: zheyuf <zheyuf@NVIDIA.com>	2025-09-25 14:41:59 +08:00
fredricz-20070104	0945403174	[TRTLLM-6541][test] Add NIM perf test cases (#7924 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-25 13:15:26 +08:00
Guoming Zhang	bb6067176f	[None][chroe] Update the cuda and tensorrt version in homepage icons. (#7963 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-24 19:20:04 -07:00
Aurelien Chartier	98726a3bed	[None][chore] Update trtllm-bench documentation on setting FP8 KV cache (#7885 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-09-25 09:28:53 +08:00
Void	336c2ef540	[None][feat] DeepEP LL fp8 dispatch/combine (#7927 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-09-25 09:20:24 +08:00
Iman Tabrizian	be7e51727e	[https://nvbugs/5456485 ][bug] unwaive triton test (#7966 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-24 17:02:55 -07:00
Leslie Fang	342014069e	[None][chore] Validate features combination (#7630 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-25 08:01:13 +08:00
Iman Tabrizian	da30d496b0	[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756 )" (#7969 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-24 15:36:38 -07:00
sychen52	5a65af24cd	[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels (#7821 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2025-09-24 12:14:35 -07:00
Iman Tabrizian	6d45cd163e	[None][bug] Fix transformers version for Triton backend (#7964 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-09-24 12:55:52 -04:00
Mike Iovine	42c2ec3239	[https://nvbugs/5473781 ][fix] Fix llama 4 FP8 for PP>1 (#7220 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-24 12:16:27 -04:00
Pamela Peng	b1dc84b4a3	[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-24 11:40:26 -04:00
Yuxian Qiu	48fda86c56	[None][fix] Fix dummy load format for DeepSeek. (#7874 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-24 23:03:16 +08:00
Macrocell	6e5e8b8a3b	[None][fix] fix get_iteration_stats IndexError (#7216 ) Signed-off-by: yuhongwei <yumiao.yhw@antgroup.com> Co-authored-by: yuhongwei <yumiao.yhw@antgroup.com>	2025-09-24 22:43:03 +08:00
Eran Geva	603517f72a	[#7675 ][feat] CapturedGraph to support max_batch_size > max(cuda_graph_batch_sizes) (#7888 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-09-24 10:11:44 -04:00
Yuan Tong	51bef1beb0	[None][chore] cleanup build script (#7865 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-09-24 21:15:01 +08:00
Perkz Zheng	60101eb8a5	[None][fix] trtllm-gen cubins compiled with wrong arch. (#7953 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-24 04:13:36 -07:00
HuiGao-NV	c8bda4b3a9	[None][ci] Waive some intermittent failures (#7955 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-24 19:00:38 +08:00
Necofish	cfbcf9b9e8	[None][feat] Support Seed-OSS model in pytorch backend (#7496 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-09-24 03:57:12 -07:00
Enwei Zhu	a1a57e83b8	[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve (#7925 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-24 18:30:23 +08:00
xinhe-nv	b8bfa63197	[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… (#7944 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-24 03:25:17 -07:00
QI JUN	18ff1e31b8	[None][ci] remove duplicate test cases (#7956 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-24 17:47:22 +08:00
yufeiwu-nv	f323b74d42	[None][test] Update llm_models_root to improve path handling on BareMetal environment (#7876 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-24 17:35:57 +08:00
HuiGao-NV	29e63d3bc2	[https://nvbugs/5532248 ][fix] Fix fused_moe OOM (#7931 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-24 02:22:38 -07:00
JunyiXu-nv	6654b78c94	[https://nvbugs/5521799 ][fix] Trim incorrectly generated harmony messages (#7849 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-09-24 16:38:43 +08:00
Li Min	0252cee4c3	[None][chore] Recover cutlass-dsl pkg install and dsl op testing. (#7945 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-24 15:45:18 +08:00
QI JUN	946ffcd2eb	[None][ci] optimize test cases of dgx b200 (#7948 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-24 00:39:45 -07:00
Cao Dong	2f8dc6feb0	[None][feat] Return topk logprobs in torch backend (#7756 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-09-24 15:30:39 +08:00

1 2 3 4 5 ...

2974 Commits