TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yechan Kim	ed81173c55	[None][ci] Add test on waives (#8915 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-05 08:42:08 +08:00
Yibin Li	871ea244a3	[None][chore] Design diagram review process change (#8748 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-11-04 16:38:34 -08:00
Patrice Castonguay	782824533e	[https://nvbugs/5587574 ][fix] Increase server timeout to wait for weight loading (#8806 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-04 12:11:08 -08:00
Frida Hou	11ded113cd	[#8389 ][fix] Update group attention matching to first map to custom torch attention (#8638 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-04 12:00:43 -08:00
shuyixiong	70e4d72ffa	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com>	2025-11-04 10:19:24 -08:00
Yanchao Lu	e2b2675120	[None][fix] Remove duplicated test waives (#8914 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-04 23:04:33 +08:00
Bo Li	e4bf29bc66	[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. (#8728 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-04 21:36:29 +08:00
Robin Kobus	7e4b87b17c	[None][ci] Remove outdated test entries (#8909 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-04 05:32:46 -08:00
Cao Dong	dddfcdd3bf	[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-04 19:32:59 +08:00
xiweny	cae468cc8e	[https://nvbugs/5596343 ] [test] Waive flaky GPT-OSS cases (#8904 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-04 03:00:00 -08:00
Zhanrui Sun	4de31bece2	[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-04 18:59:34 +08:00
CarstyYou	4296c9553d	[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-04 18:10:36 +08:00
Ivy Zhang	23717cdb3f	[TRTLLM-8580][test] save runtime report periodically (#8312 ) (#8455 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
danielafrimi	2b58dba0f6	[https://nvbugs/5524714 ][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ (#8432 ) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
xiweny	ce23e24123	[https://nvbugs/5565565 ] [fix] Remove waiver (#8450 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yukun He	6c8ba3be27	[None][chore] Remove duplicate log outputs in test_perf.py (#8418 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
ruodil	102e556863	[None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yukun He	2225745782	[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 ) Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it. Implemented new AllreduceOp heuristic: - Added Linear programming-based heuristic implementation. - Added LUT-based heuristic implementation and corresponding code generation script. AllreduceOp minor fixing: - Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set. - Fixed a minor TWOSHOT kernel perf issue. - Cleaned up Dispatching code in AllReduceOp. This PR will fix the perf gaps reported in: https://nvbugspro.nvidia.com/bug/5517023 For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Zhenhuan Chen	34fbc7052c	[https://nvbugs/5545522 ][fix] move PREEXIT in UB kernels to fix accuracy issue (#8318 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Patrice Castonguay	65c138108e	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Ivy Zhang	9bcd2e6c0a	[None][chore] Update nim test list (#8356 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Stanley Sun	def9c0004d	[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357 ) Signed-off-by: Stanley Sun <stsun@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
xiweny	fcac2022e2	[https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 (#8228 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yueh-Ting (eop) Chen	bd1c9c0af4	[https://nvbugs/5625990 ][chore] Add test coverage for current incapability of the KV cache manager (#8829 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-11-04 16:35:45 +08:00
Yechan Kim	67208f1512	[None][fix] InputProcessor config naming convention fix (#8705 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 22:29:21 -08:00
Emma Qiao	4fe47faf47	[None][infra] Waive failed tests for main branch (#8897 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-03 22:21:28 -08:00
Zhanrui Sun	9ec6a6b68f	[None][infra] waive failed test on main 11/4 (#8896 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-03 21:37:09 -08:00
HuiGao-NV	97674c3114	[TRTLLM-8690][feat] add more tensors to share buffers (#8691 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-03 21:08:01 -08:00
Yan Chunwei	ed297d7c2e	[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api (#8415 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-11-03 17:59:49 -08:00
Anish Shanbhag	6a6317727b	[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-11-03 17:42:41 -08:00
Matthias Jouanneaux	d0f107e4dd	[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-11-04 09:06:58 +08:00
Mike Iovine	5e6f1bcd24	[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-03 10:12:10 -08:00
Matt Lefebvre	0f6763680a	[TRTINFRA-7215][infra] - Move half of the DGX H100 premerge tests to SLURM (#8849 ) Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>	2025-11-04 00:11:26 +08:00
Kaiyu Xie	db2a42f641	[None][chore] Add sample yaml for wide-ep example and minor fixes (#8825 ) Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>	2025-11-03 07:48:34 -08:00
Li Min	89336fbf07	[None][fix] Fix cute dsl nvfp4 gemm autotune issue (#8761 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-03 22:55:45 +08:00
Yechan Kim	f48968b6cc	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 06:01:07 -08:00
Emma Qiao	14bc8571ae	[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d (#8319 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-03 05:26:17 -08:00
Emma Qiao	d7176768cd	[None][infra] Waive the failed test for main on 11/3 (#8875 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-03 02:52:52 -08:00
Tailing Yuan	8303cfa477	[None][fix] Fix import issues in layer-wise benchmarks (#8827 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-11-03 02:32:48 -08:00
xinhe-nv	4873ca04cc	[https://nvbugs/5521799 ][fix] add harmony channel validation (#8837 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-03 02:31:54 -08:00
Guoming Zhang	65b793c77e	[None][doc] Add the missing content for model support section and fix valid links for long_sequence.md (#8869 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-11-03 02:06:04 -08:00
Yan Chunwei	271a981f1f	[None][doc] Add LLM-API API change principle (#8350 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-03 01:47:15 -08:00
xinhe-nv	64540451e7	[None][chore] Add failed cases into waives.txt (#8872 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-03 01:19:04 -08:00
Fanrong Li	e9f78c687a	[https://nvbugs/5625962 ][chore] unwaive DS-v32-fp4 tests (#8853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-03 00:34:52 -08:00
Yechan Kim	00c0e6c440	[https://nvbugs/5523315 ][fix] Fix serve benchmark test (#8255 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 00:30:13 -08:00
chenfeiz0326	cc4ab8d9d1	[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-11-03 16:23:13 +08:00
Cao Dong	2ff772ef71	[None][feat] Add benchmark to DeepConf (#8776 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-03 16:05:50 +08:00
Perkz Zheng	497a07021d	[None][update] optimized sparse mla kernels && fix unspecified cuda launch (#8866 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-11-02 22:26:59 -08:00
yufeiwu-nv	b4d17d1a4c	[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-11-03 13:34:06 +08:00
Chang Liu	f57dc01e6f	[https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input (#8846 )	2025-11-02 17:44:08 -08:00

1 2 3 4 5 ...

3495 Commits