TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Zhang Ge	49df731b96	[#6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels (#6917 ) Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-13 12:14:58 +08:00
Yan Chunwei	4fd93bdc2c	[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming (#9118 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-12 19:55:09 -08:00
Yan Chunwei	8a8883bc73	[None][chore] Waive test_llm_rpc_streaming (#9113 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-13 11:06:26 +08:00
Zhenhuan Chen	943b05e2d3	[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-13 10:34:17 +08:00
QI JUN	3416efbc29	[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill (#9111 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-13 10:06:32 +08:00
dongxuy04	9241ccaf27	[None][feat] Enable EPLB for trtllm-gen and cutlass backend (#8886 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-11-12 12:30:27 -08:00
Chenghao Zhang	5f26c31954	[https://nvbugs/5636912 ][fix] AutoDeploy: Unwaive the test (#9018 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-12 12:26:38 -08:00
Patrice Castonguay	8a751a0e56	[None][chore] Remove is_disaggregated param in executor request queue (#9049 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-12 13:37:15 -05:00
Fanrong Li	780d4f9dc5	[None][feat] Add MTP>1 support for DS-v3.2 (#9045 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-12 09:56:12 -08:00
Iman Tabrizian	cdde15b275	[TRTLLM-8540][feat] Add support for disagg in DSv3.2 (#8735 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-11-12 08:21:11 -08:00
mpikulski	264d38e6c5	[TRTLLM-9175][test] ensure sampling is async (#9076 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-12 15:27:52 +01:00
yufeiwu-nv	b7a2574c60	[https://nvbugs/5568991 ][test] Remove Phi-3 models (#9066 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2025-11-12 03:16:36 -08:00
QI JUN	4003dc7574	[None][ci] waive some test cases of disaggregated serving (#9085 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-12 15:06:21 +08:00
Emma Qiao	bb6eb9510d	[None][infra] Waive a failed case of disaggregated/test_disaggregated.py (#9074 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 19:38:32 -08:00
QI JUN	fd703fbb7b	[None][ci] run speculative unit tests serially (#9080 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 19:06:44 -08:00
Lucas Liebenwein	aca56097cb	[None][fix] AutoDeploy: update nano3 accuracy test (#9061 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-11 12:26:31 -08:00
QI JUN	524754b6fd	[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 10:13:45 -08:00
Chenghao Zhang	ec9cf715a2	[None][feat] AutoDeploy: Perf improvement for mamba layers (#8991 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-11 08:27:07 -08:00
Wanli Jiang	ebdd1cc8e0	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-11 07:48:23 -08:00
mpikulski	b151de4a8f	[TRTLLM-8377][test] unit tests for TorchSampler batched sampling (#9012 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-11 07:16:42 -08:00
HuiGao-NV	23c388c58b	[https://nvbugs/5616189 ][fix] Make more cases use local cached models (#8935 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-11 03:14:05 -08:00
QI JUN	0ce22ce928	[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] (#9069 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 02:11:15 -08:00
Yiqing Yan	b7d51c5549	[None][chore] Remove duplicated waive test (#9067 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-11-11 16:49:49 +08:00
Emma Qiao	da1f0e2465	[None][infra] Waive failed tests on main 11/11 (#9058 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 13:19:30 +08:00
xinhe-nv	fac522056c	[None][chore] Add failed cases into waives.txt (#8998 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-11-11 12:40:59 +08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
shuyixiong	1ccb799c9a	[None][chore] Relocate rlhf_utils.py (#8938 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-10 19:03:23 -08:00
dongfengy	972c21c142	[None][chore] Clean up unused and confusing code in moe test (#9019 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-11-10 18:52:21 -08:00
Yechan Kim	0938a3ad2a	[https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-11 10:24:31 +09:00
Frida Hou	f40e1f7496	[https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp (#9047 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-10 16:25:58 -08:00
xiweny	50c486367a	[https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-10 08:12:14 -08:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
xinhe-nv	f848d844d9	[None][chore] Add failed cases into waives.txt (#9030 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-09 23:36:05 -08:00
Fanrong Li	a7033a9193	[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-10 12:16:01 +08:00
Yiqing Yan	78fac1f665	[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-10 10:34:06 +08:00
Bo Li	67af7c15a5	[https://nvbugs/5637037 ][fix] Update unwaive list. (#9001 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-10 08:53:07 +08:00
Emma Qiao	183778d58a	[None][infra] Waive failed tests for main 11/07 (#9008 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-08 08:51:35 -08:00
Emma Qiao	2af6a537ad	[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-08 06:34:24 -08:00
mpikulski	533add5056	[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 17:47:35 -08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
Patrice Castonguay	d8ea0b967f	[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-07 07:33:51 -08:00
Yuxian Qiu	7b82ba90da	[https://nvbugs/5629790 ][chore] unwaive test. (#8967 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-07 18:41:32 +08:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
QI JUN	1c6e490894	[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-06 22:37:03 -08:00
Lizhi Zhou	b26e1617f2	[https://nvbugs/5633340 ][fix] kill processes properly after test (#8970 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-06 21:45:38 -08:00
xiweny	ee20e679a9	[https://nvbugs/5636986 ][fix] Fix DeepGemmMoe get_buffer calls (#8939 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-06 19:57:19 -08:00
Simeng Liu	9f8d93f89a	[https://nvbugs/5606136 ][ci] Remove tests for deprecating triton multimodal models. (#8926 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-11-06 17:58:42 -08:00
jthomson04	fcae852cef	[None][fix] Fix KV cache clearing with KV Connector API (#8750 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-11-06 14:28:27 -08:00
Chenghao Zhang	ddf2d010e2	[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear (#8820 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-06 11:00:10 -08:00
DylanChen-NV	b275635a9a	[https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill (#8910 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-11-06 07:41:21 -08:00

1 2 3 4 5 ...

1967 Commits