TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
liji-nv	a04b5851b9	fix: update test_user_buffers_mm_add_prologue atol (#3711 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-04-21 10:56:29 +08:00
QI JUN	fb8ddfaf86	tests: waive test_llm_multi_node (#3664 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-20 16:14:32 +00:00
Kaiyu Xie	422c1b30e9	doc: Update DeepSeek perf docs (#3693 ) * Update DeepSeek perf docs Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * update Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> --------- Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-04-19 13:11:00 +08:00
nv-guomingz	c70b24c087	test:restore fp8 kv cache testing for L0 (#3671 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-19 12:28:35 +08:00
nv-guomingz	07688cd86f	test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-19 01:22:38 +08:00
Zongfei Jing	1c6e85b133	Fix: nvbugs/5222698 variable not defined (#3630 ) * Fix: nvbugs/5222698 variable not defined Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Tidy code Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-18 21:41:20 +08:00
xiweny	5bf8fdca65	fix: nvbugs/5231298: pytorch allreduce issue (#3673 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-18 15:50:40 +08:00
Yanchao Lu	56c9dd42b7	infra: Add PR approval protection for the release branch (#3634 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-18 09:16:58 +08:00
Enwei Zhu	c8cea3001a	fix: FP8 quantized lm_head (NvBug 5214229) (#3567 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-17 15:18:34 +08:00
ruodil	b1a65c0e90	tests: change qa perf test to trtllm-bench (#3619 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 13:58:38 +08:00
Tao Li @ NVIDIA	458203d805	update fp8 doc (#3647 ) Signed-off-by: taoli <litaotju@users.noreply.github.com> Co-authored-by: taoli <litaotju@users.noreply.github.com>	2025-04-17 13:16:07 +08:00
xiweny	5cc1d38958	fix: nvbugs/5187237: fix deterministic mode crash (#3448 ) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> * revert ar fusion Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> --------- Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-17 12:01:57 +08:00
Enwei Zhu	e36092bd40	squash (#3642 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-17 10:41:56 +08:00
Ivy Zhang	715428cca9	test: add test cases for 0.19 release (#3608 ) * fix test name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add quickstart test for nemotron-ultra Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * add rcca multi-node test case for deepseek-v3 Signed-off-by: Ivy Zhang <yanzh@nvidia.com> * add rcca info Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Ivy Zhang <yanzh@nvidia.com>	2025-04-16 16:19:06 +08:00
Zhanrui Sun	3471d6ccf0	chore: bump version to 0.19.0 (#3598 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-16 12:15:19 +08:00
narutolhy	ccd73c71a5	feat: Add stream generation task scaffolding examples (#3527 ) * stream generation task/controller Signed-off-by: narutolhy <582909902@qq.com> * edit README Signed-off-by: narutolhy <582909902@qq.com> * rename README Signed-off-by: narutolhy <582909902@qq.com> --------- Signed-off-by: narutolhy <582909902@qq.com>	2025-04-16 11:33:55 +08:00
Yan Chunwei	409c294c4e	fix trtllm-bench mgmn (#3563 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-16 11:04:09 +08:00
Yan Chunwei	63f3fba679	waive test_llm_multi_node_pytorch (#3592 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-16 10:49:07 +08:00
Enwei Zhu	44da0e8d60	fix: LLM API _hf_model_dir for non-cached case (#3562 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-16 10:39:34 +08:00
Daniel Cámpora	41ce5440fe	chore: Mass integration of release/0.18 (#3421 ) * [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265) * [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1) * [None][Doc] - Update docs for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88) * [Infra] - Fix or WAR issues in the package sanity check stages Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a) * cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue Signed-off-by: Ruodi Lu <ruodil@nvidia.com> (cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8) * Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'" Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f) * [Infra]Restrict setuptools version to avoid sasb pip install issue Signed-off-by: Emma Qiao <qqiao@nvidia.com> (cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac) * WAR for bug 5173448 Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com> (cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f) * [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9) * [Docs] - Doc changes for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4) * [Doc] - Doc change for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235) * [Infra] update version to 0.18.1 Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit 59e8326c75639275837d34de8e140358737a3365) * Add back nemotron file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix recurrentgemma reqs. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Adding WAR for bug 5173448. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Formatting. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove duplicated file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update examples/prompt_lookup/requirements.txt Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Remove glm-4-9b from model dir in chatglm test. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove indent change. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert changes on l0_test.groovy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update dev images Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> * Remove duplicated import. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix custom op Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Fix flashinfer & vanilla backend Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Skip problematic case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-16 10:03:29 +08:00
xiweny	da47d5f27e	fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 (#3585 ) * fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> * remove waiver Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> --------- Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-16 08:31:33 +08:00
Kaiyu Xie	f5f68ded26	Minor fixes for documents (#3577 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-16 07:47:18 +08:00
Robin Kobus	fffb403125	fix: disable KV cache reuse if using attention sink (#3021 ) * fix: disable KV cache reuse if using attention sink Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: disable KV cache reuse if sink bubble Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * add comment Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-16 03:07:32 +08:00
Pengyun Lin	1899e71364	doc: add genai-perf benchmark & slurm multi-node for trtllm-serve doc (#3407 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-04-16 00:11:58 +08:00
Kaiyu Xie	e037d3e99b	chore: Unify Python NVTX call (#3450 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-15 23:25:36 +08:00
Kaiyu Xie	258ae9c58c	Revert "infra: move nvrtc_wrapper to conan (#3282 )" (#3573 ) This reverts commit `c0dd6cbce0`. Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-15 22:45:13 +08:00
HuiGao-NV	d35db254e2	test: Enable 4 multi-gpu test cases for deepseek (#3569 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Hui Gaoâ <huig@nvidia.com>	2025-04-15 22:01:52 +08:00
Yan Chunwei	c27e130be0	unwaive test (#3559 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-15 19:42:06 +08:00
jiahanc	1d3b98b920	perf: Optimize quantization kernels used in DeepSeek on Hopper (#3466 ) Signed-off-by: jiahanc <jiahanc@nvidia.com>	2025-04-15 17:49:57 +08:00
xinhe-nv	5cfa927132	update waive list (#3503 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-15 16:53:53 +08:00
bhsueh_NV	3aa37e6b72	fix bug (#3570 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-15 16:50:22 +08:00
Yuan Tong	d4c0423cdb	refactor: collect executor and decoder states into dataclass (#3234 ) * fix: Proper error bubbling for PyExecutor Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-15 16:31:45 +08:00
Robin Kobus	b7a38feb14	chore: Clean up cpp runtime (#3537 ) * add space in test output Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * perf: reduce executor lock scope Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Move TokenRangeRetentionConfig implementation to cpp file Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: Improve finished steps handling for external draft tokens - Fixed a bug where the whole finished steps tensor was being zeroes instead of the slices. - Replaced the creation of a temporary tensor for finished steps with a direct slice from the input tensor, improving efficiency and readability. - Updated the tensor management logic to streamline the process of setting zero values for finished steps during batch processing. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * chore: Clean up includes Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-15 16:06:14 +08:00
shaharmor98	ede7058544	Feat/ Integrate peftCacheManager in PyExecutor creation (#3372 ) * integrate peftCacheManager in PyExecutor creation Signed-off-by: Shahar Mor <smor@nvidia.com>	2025-04-15 15:14:43 +08:00
hlu1	5881a65374	Fix test_fp4_quantize_gemm_torch (#3551 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-04-14 23:58:31 -07:00
Yuan Tong	668a0335e4	fix: Proper error bubbling for PyExecutor (#3321 ) * fix: Proper error bubbling for PyExecutor * fix: Proper shutdown * fix: multi gpu proper shutdown Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-04-15 14:49:46 +08:00
xinhe-nv	0e152910f5	update waive list (#3498 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-15 14:33:49 +08:00
Yukun He	cfc6f242dd	Chore: Remove profile test. (#3565 ) Because it is duplicated with test_fp4_linear. Also, cpp profiler has been unified with the new AutoTuner already. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-04-14 23:17:51 -07:00
Jinyang Yuan	0305942808	chore: Modifications that should have been included but were mistakenly overwritten in PR #3467 (#3557 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-04-15 14:08:07 +08:00
nv-guomingz	39bdb1fe1c	docs:update llm api examples and customizations sections' links. (#3566 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-15 13:55:22 +08:00
yuxianq	0e7e949feb	refactor: Split llama4 model from llama model. (#3530 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-15 13:41:05 +08:00
tburt-nv	e1e068d4f3	fix local user (#3550 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-04-15 13:20:34 +08:00
Bo Li	5eae397b3b	doc: Update instructions to enable FP8 MLA for Deepseek. (#3488 ) * doc: Update doc to enable FP8 MLA for Deepseek. Signed-off-by: Bo Li <bobboli0202@gmail.com> * Update. Signed-off-by: Bo Li <bobboli0202@gmail.com> * Update. Signed-off-by: Bo Li <bobboli0202@gmail.com> * Update the status on Hopper and Blackwell. Signed-off-by: Bo Li <bobboli0202@gmail.com> * Update. Signed-off-by: Bo Li <bobboli0202@gmail.com> * Update table of contents. Signed-off-by: Bo Li <bobboli0202@gmail.com> --------- Signed-off-by: Bo Li <bobboli0202@gmail.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-04-15 13:12:33 +08:00
Zheng Duan	b0cb963199	test: torch-flow conditional disagg test (#3410 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-15 10:54:14 +08:00
Jinyang Yuan	175adb94ab	chore: Log memory sizes of weights and activations separately (#3467 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-04-15 09:48:35 +08:00
nv-guomingz	b32ae7ac92	test:add fp8_kv_cache functionality test case. (#3457 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-15 09:16:46 +08:00
QI JUN	112f716155	chore: move all distributed related codes into _torch.distributed directory (#3511 ) * move all distributed related codes into _torch.distributed directory Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-15 08:39:17 +08:00
brb-nv	098ca7f68c	test: Fix breaking Phi3 multimodal tests (#3544 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-15 08:02:34 +08:00
Iman Tabrizian	bad55e99bb	test: Add MTP + overlap + Attention DP disaggregated test (#3542 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-15 07:46:03 +08:00
Pamela Peng	6cdfc54883	feat: Add FP8 support for SM 120 (#3248 ) * Allow FP8 on SM120 Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix sm121 Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * fix pre-commit Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> * review update Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> --------- Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-04-14 16:05:41 -07:00

1 2 3 4 5 ...

489 Commits