TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-28 14:44:24 +08:00

Author	SHA1	Message	Date
xiweny	cae468cc8e	[https://nvbugs/5596343 ] [test] Waive flaky GPT-OSS cases (#8904 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-04 03:00:00 -08:00
Ivy Zhang	23717cdb3f	[TRTLLM-8580][test] save runtime report periodically (#8312 ) (#8455 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yukun He	6c8ba3be27	[None][chore] Remove duplicate log outputs in test_perf.py (#8418 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
ruodil	102e556863	[None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Patrice Castonguay	65c138108e	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Stanley Sun	def9c0004d	[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357 ) Signed-off-by: Stanley Sun <stsun@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
xiweny	fcac2022e2	[https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 (#8228 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yueh-Ting (eop) Chen	bd1c9c0af4	[https://nvbugs/5625990 ][chore] Add test coverage for current incapability of the KV cache manager (#8829 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-11-04 16:35:45 +08:00
Mike Iovine	5e6f1bcd24	[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-03 10:12:10 -08:00
Yechan Kim	f48968b6cc	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 06:01:07 -08:00
Tailing Yuan	8303cfa477	[None][fix] Fix import issues in layer-wise benchmarks (#8827 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-11-03 02:32:48 -08:00
Fanrong Li	e9f78c687a	[https://nvbugs/5625962 ][chore] unwaive DS-v32-fp4 tests (#8853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-03 00:34:52 -08:00
chenfeiz0326	cc4ab8d9d1	[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-11-03 16:23:13 +08:00
yufeiwu-nv	b4d17d1a4c	[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-11-03 13:34:06 +08:00
dongfengy	6d6797c792	[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-11-02 16:44:02 -08:00
Fanrong Li	f0dc746738	[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-31 14:38:31 -07:00
Yuxian Qiu	025d2926df	[https://nvbugs/5599515 ][fix] Fix PP bubbles. (#8687 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-31 10:13:56 +08:00
Mike Iovine	b87448b009	[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-30 15:47:04 -04:00
Tailing Yuan	ec31363a86	[None][fix] Layer wise benchmarks: use local models, lint (#8799 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 09:47:46 -07:00
Emma Qiao	a5cc9fe0aa	[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. (#6256 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-10-30 01:56:04 -07:00
Yuxian Qiu	3176bd3815	[None][fix] Fix UnboundLocalError. (#8756 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-29 19:41:37 -07:00
HuiGao-NV	ae57738bae	[https://nvbugs/5547414 ][fix] Use cached models (#8755 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-29 19:10:10 -07:00
Iman Tabrizian	ae6875fe10	[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-10-29 08:04:26 -07:00
Chang Liu	81eb861df0	[None][chore] Enable GPQA in CI for DeepSeek V3.2 (#8712 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-29 04:22:22 -07:00
Zheng Duan	d626d13d37	[https://nvbugs/5607238 ][test] fix working dir in disagg worker test (#8648 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-10-29 16:13:52 +08:00
Pengyun Lin	2aade46d18	[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-29 15:48:29 +08:00
Zheng Duan	fea5bfbda7	[None][feat] add detailed KV cache transfer time breakdown (#8521 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-10-29 10:11:09 +08:00
ruodil	f444fe2deb	[None][test] fix a typo in perf test sampler config (#8726 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-10-29 09:53:53 +08:00
Lizhi Zhou	24167d00eb	[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-28 17:04:53 -07:00
dongfengy	083f3637f1	[https://nvbugs/5596343 ][test] Update test waive to get back some coverage (#8702 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-10-28 14:05:48 -07:00
Anish Shanbhag	a09b38a862	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-28 09:17:26 -07:00
dongfengy	5a01f382c1	[https://nvbugs/5575913 ][fix] Use separate thresholds for 120b/20b gptoss (#8664 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-10-28 10:35:07 -04:00
Robin Kobus	e8e2b0697a	[None][chore] Revert "[TRTLLM-7835][test] add default sample config for perf test (#8523 ) (#8725 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-10-28 14:23:38 +01:00
ruodil	bf72eb045e	[TRTLLM-7835][test] add default sample config for perf test (#8523 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-10-28 02:22:47 -04:00
yufeiwu-nv	0e36484fba	[None][test] Add gpt_oss_20b Model to Sanity Perf Test (#8265 )	2025-10-28 13:36:28 +08:00
Aurelien Chartier	0a02f5f25d	[None][chore] Use a cached model path for Ray integration test (#8660 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-10-27 19:16:06 -07:00
gramnarayan	88b0fbc8ff	[#8245 ][feat] Autodeploy: Guided Decoding Support (#8551 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-28 09:29:57 +08:00
Yechan Kim	a6017f6266	[https://nvbugs/5608723 ][fix] Use local data on multimodal tests and unwaive tests (#8673 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-28 09:20:02 +09:00
Bo Li	9c4432f8a4	[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-27 13:23:06 -04:00
xinhe-nv	0ac5cbcac4	[None][chore] Add failed cases into waives.txt (#8669 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-10-27 02:36:28 -04:00
Chenghao Zhang	a6d20f6f9b	[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-10-25 15:26:45 -04:00
Simeng Liu	2b27810198	[https://nvbugs/5494718 ][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-10-24 19:09:07 -07:00
jthomson04	02081e2390	[None][feat] Support KV Connector with Disagg Prefill Worker (#8246 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-10-24 11:09:06 -07:00
Chang Liu	e47c787dd7	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-24 13:40:41 -04:00
Chuang Zhu	2420918e5b	[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-24 08:58:16 -04:00
xinhe-nv	2aaedd08cd	[TRTLLM-8638][fix] fix test issues (#8557 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-10-24 02:16:55 -04:00
ruodil	07a957e5cb	[None][test] remove redunctant runtime backend in perf test (#8358 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-10-24 02:01:34 -04:00
Stanley Sun	6b793d5c3d	[TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests (#8580 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-10-24 13:23:47 +08:00
xinhe-nv	59375e8bed	[TRTLLM-8638][fix] Add failed cases into waives.txt (#8590 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-10-24 00:02:42 -04:00
xinhe-nv	04e2b2752a	[None][feat] add Nemotron-Ultra multi nodes eval tests (#8577 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-10-23 02:44:26 -04:00

1 2 3 4 5 ...

751 Commits