TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-11 13:33:40 +08:00

Author	SHA1	Message	Date
xinhe-nv	9c1b75e978	[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-22 00:12:43 -07:00
Wanli Jiang	f5bfd68a50	[https://nvbugs/5509024 ][fix] Print full parsed outputs and update keywords for multimodal model (#7670 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yi Zhang	f9c9c3f50a	[https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Ivy Zhang	022bc96fb6	[https://nvbugs/5512734 ][fix] Update kv cache config for maverick (#7710 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
bhsueh_NV	ef557f880b	[https://nvbugs/5437405 ][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yanchao Lu	5c8b022d1e	[None][ci] Test waives for the release/1.0 branch 09/15 (#7700 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Simeng Liu	99995846b3	[https://nvbugs/5470782 ][chore] Remove the skip statement in 1.0 rele… (#7573 ) Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
peaceh-nv	541b7fda89	[https://nvbugs/5503423 ][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 (#7603 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yan Chunwei	afca2fcbe0	[https://nvbugs/5351244 ][fix] test_mpi_session (#7501 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yuxian Qiu	2d46dda6a7	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Lizhi Zhou	293d9fb612	[https://nvbugs/5448767 ][fix] disable kv cache reuse for disagg pp>1 tests (#7354 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Stefan Niebler	8aead224fb	[https://nvbugs/5513423 ][fix] Correctly respect min_tokens in PyTorch Workflow (#7808 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>	2025-09-21 22:15:18 -07:00
peaceh-nv	9dc7316b7f	[https://nvbugs/5512556 ][unwaive] Unwaive DeepSeek PP tests (#7828 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-22 10:26:30 +08:00
dongxuy04	9eb8084ca9	[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-21 11:01:51 -07:00
Ziyi Xiong	897c4dd23b	[https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec (#7728 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-21 08:20:48 +08:00
Yan Chunwei	4509d97780	[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing (#7840 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-20 06:24:22 -07:00
Chang Liu	2e317a7db6	[https://nvbugs/5520490 ][fix] Fix intermittent test failures by avoiding external web data pulls (#7879 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-19 17:24:13 -07:00
Mike Iovine	8030b540ac	[https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access (#7845 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-19 10:30:37 -04:00
pcastonguay	fbe325ce57	[https://nvbugs/5471108 ][chore] Unwaiving disagg acc test (#7686 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-09-19 08:56:09 -04:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
Liao Lanyu	18095a7cb8	[https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-19 18:13:33 +08:00
xinhe-nv	efb763402f	[None][chore] Add failed cases into waives.txt (#7841 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-19 17:59:47 +08:00
Ivy Zhang	0ac51487f4	[None][chore] remove cli cases for rtx6k (#7833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:33:59 +08:00
Ivy Zhang	6b33bcced2	[None][test] Add accuracy benchmark in stress test (#7561 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:09:46 +08:00
dominicshanshan	451475e0dc	[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . (#7853 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-19 14:54:59 +08:00
Emma Qiao	ea079fa530	[None][infra] Waive failed tests in post-merge (#7859 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-19 14:16:12 +08:00
ruodil	c5453103d6	[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-19 11:12:53 +08:00
fredricz-20070104	fc4e6d3702	[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-19 10:12:55 +08:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
Ziyi Xiong	420f0fbcf5	[https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda (#7790 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-19 08:11:29 +08:00
QI JUN	7646da2d85	[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 07:19:50 +08:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
Li Min	d921fc3352	[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-18 21:20:04 +08:00
xinhe-nv	d3a907131a	[https://nvbugs/5519462 ][fix] Add failed cases into waives.txt (#7817 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 20:01:06 +08:00
Wanli Jiang	fe104dc20d	[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 17:37:16 +08:00
xinhe-nv	d909f80379	[TRTLLM-7250][fix] Add failed cases into waives.txt (#7807 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 17:13:07 +08:00
Wanli Jiang	a7ca0fff54	[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 16:26:20 +08:00
dongfengy	2ae08bd1b8	[https://nvbugs/5519530 ][fix] Fix gptoss 2-gpu test (#7819 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-18 16:01:53 +08:00
xinhe-nv	236f71ea05	[None][chore] Add failed cases into waives.txt (#7801 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 14:48:16 +08:00
Leslie Fang	870cfcf9a0	[None][chore] Remove executor config in create_py_executor (#7599 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-18 14:24:58 +08:00
Li Min	14e455da3e	[None][fix] Fix CI issue for dsl pkg install (#7784 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-09-18 13:58:20 +08:00
Ivy Zhang	26d50eb539	[TRTLLM-8070][test] add generation logits case for llama3 (#7759 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-18 13:33:16 +08:00
Yukun He	cd80e0a7f1	[None][fix] Make tile_tokens_dim calculation just in time before kernel launching. (#7529 ) tile_tokens_dim directly depends on the num_token, which is a dynamic shape during tuning and inference. When AutoTuner prepares dummy tensors with different num_tokens, it does not update the value of tile_tokens_dim automatically. Therefore, the value stored in the AutoTuner cache is misaligned, which will introduce a lot of cache misses during inference, which hurts perf a lot. To avoid this issue, we move the calculation of tile_tokens_dim right before kernel launching, so that the value of tile_tokens_dim is always up to date with the num_tokens of the current input tensor used for the kernel runner. Also, the tile_tokens_dim is calculated based on the number of tokens of a tuned bucket, instead of the original token number. Because we only tune the value for the buckets, not for the raw input token number, to avoid unexpected misalignment between tile_tokens_dim and the token number. This PR also removes the warmup requests with the extra input shapes, which are triggered in the CUDA graph warmup phase. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-18 10:58:52 +08:00
Yan Chunwei	327e5e5eed	[None][ci] restore unwaive list (#7802 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-18 10:50:34 +08:00
Lucas Liebenwein	39eb120b96	[#7308 ] [feat] AutoDeploy: graph-less transformers mode for HF (#7635 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>	2025-09-18 10:44:24 +08:00
Netanel Haber	a5cfc8368f	[https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async (#7041 ) (#7796 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-09-17 21:27:01 -04:00
yunruis	7c03eb9ea2	[https://nvbugs/5516661 ][fix] Drop waive case 5516661 (#7791 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-09-18 08:55:32 +08:00
Emma Qiao	c4abca323e	[None][infra] Waive failed tests on main (#7812 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-17 23:44:36 +08:00
William Zhang	2614d71994	[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 (#7628 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-17 08:11:16 -07:00
xinhe-nv	f918302b3a	[TRTLLM-7250][fix] waive block tests (#7782 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-17 15:31:03 +08:00
ruodil	e6073b3911	[None][test] add gpt oss model for trtllm perf test (#7328 ) Signed-off-by: Ruodi Lu <ruodil@nvidia.com> Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-09-17 15:23:21 +08:00
xinhe-nv	7801d0992b	[None][chore] Remove closed bugs (#7697 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-17 15:14:09 +08:00
QI JUN	d3e680b3c3	[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] (#7788 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 15:12:55 +08:00
Fanrong Li	523a17d990	[https://nvbugs/5485325 ][fix] Cherry-pick #7373 : fix the CUDA graph warmup issue when using speculative decoding (#7734 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-17 13:57:39 +08:00
QI JUN	bd7aad4988	[None][ci] waive test_llm_gemma_1gpu_summary_vswa (#7781 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-17 10:48:31 +08:00
Lucas Liebenwein	4c3dc89f84	[None][chore] AutoDeploy: clean up of model unit test configuration (#7742 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-09-17 10:42:01 +08:00
Kaiyu Xie	62042a9733	[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128 ) (#7571 ) Signed-off-by: Cheng Hang <chang@nvidia.com> Co-authored-by: Cheng Hang <chang@nvidia.com>	2025-09-17 09:41:32 +08:00
Iman Tabrizian	a91453de34	[None][waive] Waive tests (#7775 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-09-16 19:42:32 -04:00
HuiGao-NV	a49cfb3e68	[https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding (#7737 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com> Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-17 06:24:20 +08:00
xinhe-nv	e7c1569456	[None][chore] Add failed cases into waives.txt (#7746 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 18:43:40 +08:00
Ziyi Xiong	905bb26bbd	[https://nvbugs/5471106 ][fix] Remove the waivers (#7711 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-16 17:43:39 +08:00
xinhe-nv	c6ab2072b5	[None][fix] waive hang tests on main (#7720 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 17:05:15 +08:00
xinhe-nv	1fbea497ff	[TRTLLM-7070][feat] add gpt-oss serve benchmark tests (#7638 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 16:39:31 +08:00
amitz-nv	750d15bfaa	[https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-09-16 16:35:13 +08:00
Li Min	b278d06481	[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-16 14:25:26 +08:00
xinhe-nv	cf55927064	[None][chore] Add failed cases into waives.txt (#7735 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-16 10:58:06 +08:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
QI JUN	44d5ccfdd9	[None][ci] move qwen3 tests from GB200 to B200 (#7733 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-16 08:12:28 +08:00
Ziyi Xiong	536e8776cd	[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding (#7651 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-16 07:33:44 +08:00
Yanchao Lu	0c9430e5a5	[None][ci] Test waives for the main branch 09/15 (#7709 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-15 22:13:56 +08:00
jmydurant	7deefb3d2b	[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-09-15 21:43:49 +08:00
ixlmar	965a3dab90	[None][test] add test for min_tokens (#7678 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-09-15 08:59:23 +01:00
HuiGao-NV	335c007df8	[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-15 15:37:58 +08:00
Ivy Zhang	ddfe0320b3	[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-15 13:38:52 +08:00
JunyiXu-nv	a2c45d82c3	[None][chore] Enable multiple postprocess workers tests for chat completions api (#7602 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-09-15 12:16:44 +08:00
xinhe-nv	b69e3e9f99	[None][chore] Add failed cases into waives.txt (#7682 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-15 11:44:52 +08:00
Chang Liu	47e37755a3	[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-14 20:10:10 -07:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
Yanchao Lu	70aa4e28c1	[None][ci] Test waives for the main branch 09/14 (#7698 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-14 23:48:04 +08:00
Pengyun Lin	c2bc39af63	[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-09-12 15:32:34 +08:00
Guoming Zhang	ef676fc71f	[https://nvbugs/5513192 ][fix] Add the missing param for kv_cache_tran… (#7679 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-11 19:00:16 +08:00
QI JUN	656f229b58	[None][ci] move some test cases from l40s to a30 (#7684 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-11 07:22:34 +08:00
Emma Qiao	9986070044	[None][infra] Waive failed cases on main 0910 (#7676 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-11 01:43:29 +08:00
Dom Brown	fc9d426589	[https://nvbugs/5505402 ] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-10 18:30:48 +01:00
nvamyt	222e01662c	[https://nvbugs/5488212 ][waive] Waive failed tests for L20 (#7664 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-10 22:32:15 +08:00
xinhe-nv	207c5258c4	[https://nvbugs/5494698 ][fix] skip gemma3 27b on blackwell (#7505 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-10 21:09:27 +08:00
Bo Deng	bf57829acf	[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-10 17:35:51 +08:00
Frida Hou	bbb5ae3349	[#5861 ][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-09-10 13:00:06 +08:00
Zheyu Fu	c353ff342e	[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-10 12:53:59 +08:00
fredricz-20070104	ef620f3579	[https://nvbugs/5410687 ][test] Add deepseek r1-w4afp8 quickstart (#7645 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-10 10:21:01 +08:00
Guoming Zhang	beefd6413e	[None][fix] fix post-merge issue raised by #5488 (#7655 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-10 09:26:27 +08:00
Chang Liu	faa2f46554	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-09 14:51:36 -04:00
Jin Li	d49374bc45	[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-09 12:18:56 -04:00
QI JUN	a0e1604898	[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-09 11:06:32 -04:00
Liao Lanyu	af403848d7	[https://nvbugs/5445466 ][fix] unwaive DS R1 test cases with bug already fixed (#7429 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-09 17:25:49 +08:00
Perkz Zheng	da6cb541a2	[None][feat] Optimize MLA kernels with separate reduction kernels (#7597 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-09 16:58:44 +08:00
xinhe-nv	8a52015f50	[None][chore] Remove closed bugs (#7591 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-09 04:08:42 -04:00
William Zhang	c53d1814a7	[None][feat] Extend VLM factory and add Mistral3 factory (#7583 ) This commit: * extends existing factory interfaces to enable Mistral3 in AutoDeploy. * adds a Mistral3 VLM factory. * adds various model patches for pixtral (the vision model) and mistral3 to make the VLM export compliant. * adjusts checkpoint loading code to take possible parameter name conversions into account. * fixes a sampling bug (the `end_id` needs to be take into account when sampling, but it is not included in the stop words' token IDs). Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-09 02:47:18 -04:00
Yiqing Yan	5c616da2fd	[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-09 11:33:09 +08:00
Wanli Jiang	1e0669d27a	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-09 10:38:20 +08:00
Iman Tabrizian	d96c54d8ae	[None][test] Skip eagle3 test (#7627 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-09-08 17:23:53 -04:00
dongfengy	fdd5bd49fc	[https://nvbugs/5481080 ][fix] Fix GPTOSS W4A16 reference (#7323 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-08 13:59:28 -07:00
Chuang Zhu	77657a1c12	[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-08 13:37:46 -04:00
Eran Geva	5f2a42b3df	[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored (#7219 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-09-08 08:45:58 -04:00
Chang Liu	4a1e13897f	[None][feat] Update multimodal utility `get_num_tokens_per_image` for better generalization (#7544 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-08 07:42:46 -04:00
bhsueh_NV	219e95569a	[https://nvbugs/5506683 ][fix] adjust the CI (#7604 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-08 15:41:41 +08:00
dominicshanshan	c9dca69e1b	[None][chore] Mass integration of release/1.0 - 3rd (#7519 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com> Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com> Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: milesial <milesial@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-08 14:03:04 +08:00
JunyiXu-nv	504bb7ffa9	[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API (#7508 ) Signed-off-by: Junyi Xu Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-09-08 11:11:35 +08:00
Raayan Dhar	8f3121ac81	[None][fix] chore: fixing the math on asymmetric tp+pp tests (#7098 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-09-07 14:27:46 -04:00
Netanel Haber	0fee8cd028	[TRTLLM-7153] [feat] Move stop_criteria to sample_async (#7041 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-09-07 17:36:49 +03:00
Raayan Dhar	bae9560e62	[https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks (#7455 ) Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-07 08:45:49 -04:00
Emma Qiao	aea8ac1649	[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base (#7086 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-07 14:26:18 +08:00
Mike Iovine	45390402fc	[https://nvbugs/5502352 ][fix] Fix 2-model CDL path (#7543 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-06 23:53:27 -04:00
Chang Liu	99b98f1374	[TRTLLM-7440][fix] Split `fused_input_embed` to separate out host sync (#7280 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-06 23:11:39 -04:00
Chang Liu	23500b55c3	[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse (#7106 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-09-06 17:58:32 -04:00
QI JUN	12ecb864c2	[None][chore] share input_ids buffers among different cuda graphs (#7236 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-06 17:49:42 -04:00
dominicshanshan	9a97f0a3b7	[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 (#7585 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-06 21:29:16 +08:00
QI JUN	525bb806a9	[None][ci] move some test cases of DGX H100 to post merge (#7569 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-06 01:03:38 -04:00
QI JUN	b8183cac2b	[None][ci] Revert "[https://nvbugs/5461761 ][fix] Remove the waiver (#7476 )" (#7584 )	2025-09-05 22:02:09 -07:00
Lucas Liebenwein	74105a45d9	[#6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example (#7221 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-09-05 22:10:48 -04:00
peaceh-nv	25389c9fe2	[https://nvbugs/5453806 ][unwaive] Unwaive fp8 kvcache attention test (#7243 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-05 12:13:57 -04:00
Emma Qiao	d8ec546b73	[None][infra] Waive failed tests on main branch 0905 (#7564 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-05 22:46:46 +08:00
Ziyi Xiong	79e0296ca0	[https://nvbugs/5461761 ][fix] Remove the waiver (#7476 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-05 15:29:54 +08:00
xinhe-nv	8e3962d278	[TRTLLM-6642][feat] add gptoss 20g tests (#7361 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:20:28 -04:00
xinhe-nv	b3ba3d98d2	[None][chore] Remove closed bugs (#7408 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-05 02:11:16 -04:00
QI JUN	ff3704897b	[None][ci] remove unnecessary test_modeling_deepseek.py (#7542 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-04 20:05:27 -07:00
Jin Li	2189a2f3ff	[https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… (#7441 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-05 10:56:21 +08:00
Shunkangz	bddf183e15	[None][feat] Add Request specific exception (#6931 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-09-04 18:43:42 -04:00
Chang Liu	08a0e06621	[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos (#7360 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-09-04 14:39:23 -04:00
Yuxian Qiu	48a5270868	[https://nvbugs/5492485 ][fix] Use offline dataset from llm-models instead. (#7435 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-04 09:58:16 -07:00
sychen52	98a1bffb7c	[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2025-09-04 09:03:38 -07:00
Enwei Zhu	1745102e72	[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 23:30:14 +08:00
Izzy Putterman	26b133f3a7	[None][feat] MultiLayer Eagle (#7234 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-09-04 10:49:13 -04:00
Ivy Zhang	b46e0ae5d4	[None][test] update nim and full test list (#7468 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-04 09:06:01 -04:00
QI JUN	d38b8e3dd9	[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests (#7489 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-04 06:04:51 -07:00
kris1025	cce9556858	[https://nvbugs/5485886 ][fix] Fix resource free of Eagle3ResourceManager (#7437 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2025-09-04 17:38:13 +08:00
Grzegorz Kwasniewski	3755f8ab7d	[TRTLLM-6342][fix] Fixed triggering BMM sharding (#7389 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-09-04 02:01:27 -04:00
Jin Li	2a2dfe273b	[https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… (#7442 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 10:48:15 +08:00
Stanley Sun	db8eb0a447	[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-04 10:34:38 +08:00
Lizhi Zhou	d97c1e6bd9	[https://nvbugs/5470769 ][fix] fix disagg-serving accuracy test case (#7338 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-04 09:11:01 +08:00
Enwei Zhu	5ff3a65b23	[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-03 15:16:11 -07:00
Lizhi Zhou	7c73c2ff4b	[https://nvbugs/5485593 ][fix] improve accuracy/test_disaggregated_serving.py (#7366 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-03 09:38:53 -04:00
Stanley Sun	cebbf48b74	[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-09-03 08:36:52 -04:00
Mike Iovine	79d93f9419	[https://nvbugs/5488141 ][fix] Unwaive llama3 test_eagle3 (#7486 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-03 14:10:40 +08:00
Wanli Jiang	4223a9aada	[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-03 10:27:42 +08:00
Daniel Stokes	109f27265c	[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue schedules (#6126 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-09-02 21:54:43 -04:00
Eran Geva	75c1bb6389	[https://nvbugs/5458798 ][fix] Disabled test_trtllm_bench_backend_comparison due to timeout (#7397 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-09-02 11:21:42 -07:00
Simeng Liu	bcc55bcdf3	[https://nvbugs/5470782 ][fix] Add specific test names for test_deepseek.py (#7318 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-09-02 10:31:40 -07:00
Emma Qiao	aae5d22bfe	[None][infra] Waive failed tests on main branch 0902 (#7482 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-02 10:16:49 -04:00
peaceh-nv	90479c50fb	[https://nvbugs/5453992 ][unwaive] Unwaive llama quickstart test (#7242 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-02 20:28:32 +08:00
JunyiXu-nv	eefe5f2093	[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-09-02 07:08:22 -04:00
HuiGao-NV	7279297717	[None][infra] waive test case failed on post-merge (#7471 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-02 06:20:08 -04:00
aalanwyr	c3c95736a1	[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-09-02 17:21:27 +08:00
Ivy Zhang	3799e5d460	[None][test] auto reuse torch empty cache on qa test (#7421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-02 04:44:47 -04:00
Yan Chunwei	f90375f37c	[https://nvbugs/5476580 ][fix] unwaive test_nvfp4_4gpus (#7454 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-09-02 04:17:14 -04:00
Mike Iovine	b3c57a7042	[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-01 14:37:44 -04:00
Emma Qiao	01dfd3af1b	[None][infra] Waive failed case on main 0901 (#7447 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:27:24 +08:00
bhsueh_NV	16e9d1121c	[https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker (#7332 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-01 16:22:45 +08:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Yiqing Yan	21291f3d8e	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Emma Qiao	09bca7ca82	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
peaceh-nv	f4dc1ed39c	[https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf (#6937 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	29cdcdb56a	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	d5bc5cd4f2	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
William Zhang	d15dcdc4ae	[https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests (#6909 ) This commit lowers the GPU memory allocated for KV cache in accuracy tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	ac07418968	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	b4d41d6604	[TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	612c26be22	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	cf0c47ca2d	[None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	2480aedb73	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	3e99744201	[https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case (#6838 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	deba2885c1	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	7841ea6255	[None][chore] waive GB300 known issues (#6812 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
Yuan Tong	ccb800f909	[TRTLLM-7457][ci] Update unittest parallel config (#7297 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-29 09:28:04 +08:00
Emma Qiao	1e644fa28a	[None][infra] Waive failed tests on main branch 08/26 (#7346 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 00:24:08 +08:00
Neta Zmora	08f935681d	[https://nvbugs/5474453 ][fix] fix path to tested model (#7272 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-08-28 08:01:48 -04:00
Zongfei Jing	53163bf1df	[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-28 18:26:16 +08:00
QI JUN	ae89163368	[None][ci] skip TestGPTOSS (#7333 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-28 05:01:49 -04:00
William Zhang	4541655e5f	[https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests (#7274 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-28 00:03:32 -04:00
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Eran Geva	462169bfc9	[https://nvbugs/5458798 ][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-08-27 07:57:46 -07:00
QI JUN	d09add5ede	[None][ci] parallelize unit tests of auto deploy in B200 (#7291 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:32:11 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
bhsueh_NV	f167b1fd99	[https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 15:26:10 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Yuan Tong	6c7813e821	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-08-27 00:45:58 -04:00

... 2 3 4 5 6 ...

1675 Commits