TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Venky	8aa5a4be43	[https://nvbugs/5464088 ] [fix] dequantize fp8 activation input to lora forward; update perf test config (#7014 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-09-14 15:04:50 +00:00
Mike Iovine	b3c57a7042	[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-01 14:37:44 -04:00
Emma Qiao	01dfd3af1b	[None][infra] Waive failed case on main 0901 (#7447 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:27:24 +08:00
bhsueh_NV	16e9d1121c	[https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker (#7332 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-01 16:22:45 +08:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Yiqing Yan	21291f3d8e	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Emma Qiao	09bca7ca82	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
peaceh-nv	f4dc1ed39c	[https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf (#6937 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	29cdcdb56a	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	d5bc5cd4f2	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
William Zhang	d15dcdc4ae	[https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests (#6909 ) This commit lowers the GPU memory allocated for KV cache in accuracy tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	ac07418968	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	b4d41d6604	[TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	612c26be22	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	cf0c47ca2d	[None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	2480aedb73	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	3e99744201	[https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case (#6838 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	deba2885c1	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	7841ea6255	[None][chore] waive GB300 known issues (#6812 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
Yuan Tong	ccb800f909	[TRTLLM-7457][ci] Update unittest parallel config (#7297 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-29 09:28:04 +08:00
Emma Qiao	1e644fa28a	[None][infra] Waive failed tests on main branch 08/26 (#7346 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 00:24:08 +08:00
Neta Zmora	08f935681d	[https://nvbugs/5474453 ][fix] fix path to tested model (#7272 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-08-28 08:01:48 -04:00
Zongfei Jing	53163bf1df	[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-28 18:26:16 +08:00
QI JUN	ae89163368	[None][ci] skip TestGPTOSS (#7333 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-28 05:01:49 -04:00
William Zhang	4541655e5f	[https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests (#7274 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-28 00:03:32 -04:00
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Eran Geva	462169bfc9	[https://nvbugs/5458798 ][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-08-27 07:57:46 -07:00
QI JUN	d09add5ede	[None][ci] parallelize unit tests of auto deploy in B200 (#7291 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:32:11 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
bhsueh_NV	f167b1fd99	[https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 15:26:10 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Yuan Tong	6c7813e821	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-08-27 00:45:58 -04:00
Zhenhuan Chen	d0d8903a7f	[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config (#7089 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-26 20:58:33 -07:00
Shunkangz	ff4047414b	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-27 11:16:12 +08:00
Zhou Yuxin	ccb6aadea8	[https://nvbugs/5412456 ][fix] Remove from waives.txt (#7248 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-27 10:05:53 +08:00
Jin Li	028235404b	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-26 18:31:33 -04:00

1 2 3 4 5 ...

1371 Commits