TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
pcastonguay	fbe325ce57	[https://nvbugs/5471108 ][chore] Unwaiving disagg acc test (#7686 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-09-19 08:56:09 -04:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
Liao Lanyu	18095a7cb8	[https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-19 18:13:33 +08:00
xinhe-nv	efb763402f	[None][chore] Add failed cases into waives.txt (#7841 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-19 17:59:47 +08:00
Ivy Zhang	0ac51487f4	[None][chore] remove cli cases for rtx6k (#7833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:33:59 +08:00
Ivy Zhang	6b33bcced2	[None][test] Add accuracy benchmark in stress test (#7561 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:09:46 +08:00
dominicshanshan	451475e0dc	[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . (#7853 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-19 14:54:59 +08:00
Emma Qiao	ea079fa530	[None][infra] Waive failed tests in post-merge (#7859 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-19 14:16:12 +08:00
ruodil	c5453103d6	[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-19 11:12:53 +08:00
fredricz-20070104	fc4e6d3702	[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-19 10:12:55 +08:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
Ziyi Xiong	420f0fbcf5	[https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda (#7790 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-19 08:11:29 +08:00
QI JUN	7646da2d85	[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 07:19:50 +08:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
Li Min	d921fc3352	[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-18 21:20:04 +08:00
xinhe-nv	d3a907131a	[https://nvbugs/5519462 ][fix] Add failed cases into waives.txt (#7817 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 20:01:06 +08:00
Wanli Jiang	fe104dc20d	[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 17:37:16 +08:00
xinhe-nv	d909f80379	[TRTLLM-7250][fix] Add failed cases into waives.txt (#7807 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 17:13:07 +08:00
Wanli Jiang	a7ca0fff54	[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 16:26:20 +08:00
dongfengy	2ae08bd1b8	[https://nvbugs/5519530 ][fix] Fix gptoss 2-gpu test (#7819 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-18 16:01:53 +08:00
xinhe-nv	236f71ea05	[None][chore] Add failed cases into waives.txt (#7801 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 14:48:16 +08:00
Leslie Fang	870cfcf9a0	[None][chore] Remove executor config in create_py_executor (#7599 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-18 14:24:58 +08:00
Li Min	14e455da3e	[None][fix] Fix CI issue for dsl pkg install (#7784 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-09-18 13:58:20 +08:00
Ivy Zhang	26d50eb539	[TRTLLM-8070][test] add generation logits case for llama3 (#7759 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-18 13:33:16 +08:00
Yukun He	cd80e0a7f1	[None][fix] Make tile_tokens_dim calculation just in time before kernel launching. (#7529 ) tile_tokens_dim directly depends on the num_token, which is a dynamic shape during tuning and inference. When AutoTuner prepares dummy tensors with different num_tokens, it does not update the value of tile_tokens_dim automatically. Therefore, the value stored in the AutoTuner cache is misaligned, which will introduce a lot of cache misses during inference, which hurts perf a lot. To avoid this issue, we move the calculation of tile_tokens_dim right before kernel launching, so that the value of tile_tokens_dim is always up to date with the num_tokens of the current input tensor used for the kernel runner. Also, the tile_tokens_dim is calculated based on the number of tokens of a tuned bucket, instead of the original token number. Because we only tune the value for the buckets, not for the raw input token number, to avoid unexpected misalignment between tile_tokens_dim and the token number. This PR also removes the warmup requests with the extra input shapes, which are triggered in the CUDA graph warmup phase. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-18 10:58:52 +08:00
Yan Chunwei	327e5e5eed	[None][ci] restore unwaive list (#7802 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-18 10:50:34 +08:00
Lucas Liebenwein	39eb120b96	[#7308 ] [feat] AutoDeploy: graph-less transformers mode for HF (#7635 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>	2025-09-18 10:44:24 +08:00
Netanel Haber	a5cfc8368f	[https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async (#7041 ) (#7796 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-09-17 21:27:01 -04:00
yunruis	7c03eb9ea2	[https://nvbugs/5516661 ][fix] Drop waive case 5516661 (#7791 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-09-18 08:55:32 +08:00
Emma Qiao	c4abca323e	[None][infra] Waive failed tests on main (#7812 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-17 23:44:36 +08:00
William Zhang	2614d71994	[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 (#7628 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-17 08:11:16 -07:00
xinhe-nv	f918302b3a	[TRTLLM-7250][fix] waive block tests (#7782 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-17 15:31:03 +08:00
ruodil	e6073b3911	[None][test] add gpt oss model for trtllm perf test (#7328 ) Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-09-17 15:23:21 +08:00
xinhe-nv	7801d0992b	[None][chore] Remove closed bugs (#7697 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-17 15:14:09 +08:00
QI JUN	d3e680b3c3	[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] (#7788 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 15:12:55 +08:00
Fanrong Li	523a17d990	[https://nvbugs/5485325 ][fix] Cherry-pick #7373 : fix the CUDA graph warmup issue when using speculative decoding (#7734 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-17 13:57:39 +08:00
QI JUN	bd7aad4988	[None][ci] waive test_llm_gemma_1gpu_summary_vswa (#7781 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 10:48:31 +08:00
Lucas Liebenwein	4c3dc89f84	[None][chore] AutoDeploy: clean up of model unit test configuration (#7742 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-09-17 10:42:01 +08:00
Kaiyu Xie	62042a9733	[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128 ) (#7571 ) Signed-off-by: Cheng Hang <chang@nvidia.com> Co-authored-by: Cheng Hang <chang@nvidia.com>	2025-09-17 09:41:32 +08:00
Iman Tabrizian	a91453de34	[None][waive] Waive tests (#7775 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-09-16 19:42:32 -04:00
HuiGao-NV	a49cfb3e68	[https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding (#7737 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com> Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-17 06:24:20 +08:00
xinhe-nv	e7c1569456	[None][chore] Add failed cases into waives.txt (#7746 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 18:43:40 +08:00
Ziyi Xiong	905bb26bbd	[https://nvbugs/5471106 ][fix] Remove the waivers (#7711 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-16 17:43:39 +08:00
xinhe-nv	c6ab2072b5	[None][fix] waive hang tests on main (#7720 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 17:05:15 +08:00
xinhe-nv	1fbea497ff	[TRTLLM-7070][feat] add gpt-oss serve benchmark tests (#7638 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 16:39:31 +08:00
amitz-nv	750d15bfaa	[https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-09-16 16:35:13 +08:00
Li Min	b278d06481	[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-16 14:25:26 +08:00
xinhe-nv	cf55927064	[None][chore] Add failed cases into waives.txt (#7735 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 10:58:06 +08:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
QI JUN	44d5ccfdd9	[None][ci] move qwen3 tests from GB200 to B200 (#7733 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-16 08:12:28 +08:00

1 2 3 4 5 ...

1507 Commits