TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Lucas Liebenwein	aca56097cb	[None][fix] AutoDeploy: update nano3 accuracy test (#9061 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-11 12:26:31 -08:00
QI JUN	524754b6fd	[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 10:13:45 -08:00
Chenghao Zhang	ec9cf715a2	[None][feat] AutoDeploy: Perf improvement for mamba layers (#8991 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-11 08:27:07 -08:00
Wanli Jiang	ebdd1cc8e0	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-11 07:48:23 -08:00
mpikulski	20fd305bb6	[None][fix] type annotation (#9071 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-11 07:20:20 -08:00
mpikulski	b151de4a8f	[TRTLLM-8377][test] unit tests for TorchSampler batched sampling (#9012 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-11 07:16:42 -08:00
Guoming Zhang	b894dc2d70	[None][fix] Display the GPU memory information in GiB unit. (#9070 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-11-11 06:24:59 -08:00
mpikulski	979b3ae9ce	[TRTLLM-7723][feat] sampling using FlashInfer.sampling (#8581 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-11 03:21:19 -08:00
HuiGao-NV	23c388c58b	[https://nvbugs/5616189 ][fix] Make more cases use local cached models (#8935 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-11 03:14:05 -08:00
Emma Qiao	22f1523f9e	[None][infra] Only print and don't fail the check if there are duplicated items in waives.txt (#9068 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 03:04:59 -08:00
QI JUN	0ce22ce928	[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] (#9069 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 02:11:15 -08:00
elvischenv	62a30bca25	[None][chore] Add tensorrt_llm/scripts to .gitignore (#8895 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-11-11 11:10:02 +01:00
Yiqing Yan	b7d51c5549	[None][chore] Remove duplicated waive test (#9067 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-11-11 16:49:49 +08:00
Yuxian Qiu	7aeac97e4e	[https://nvbugs/5622938 ][fix] Use async send_requests_to_next_pp. (#9041 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-11 14:19:44 +08:00
Lucas Liebenwein	6bf4e59267	[#8763 ][feature] AutoDeploy: configurable dtype for caching (#8812 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-10 22:17:14 -08:00
jiahanc	de6088e363	[None][doc] update llama and llama4 example doc (#9048 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 22:04:26 -08:00
Bo Deng	0b9bc5aae8	[None][infra] install mooncake in docker images (#8447 ) Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com> Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> Co-authored-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-11-11 13:34:27 +08:00
Emma Qiao	da1f0e2465	[None][infra] Waive failed tests on main 11/11 (#9058 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 13:19:30 +08:00
xinhe-nv	fac522056c	[None][chore] Add failed cases into waives.txt (#8998 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-11-11 12:40:59 +08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
TensorRT LLM	c61b44e594	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-11-11 03:36:08 +00:00
shuyixiong	1ccb799c9a	[None][chore] Relocate rlhf_utils.py (#8938 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-10 19:03:23 -08:00
dongfengy	972c21c142	[None][chore] Clean up unused and confusing code in moe test (#9019 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-11-10 18:52:21 -08:00
Liao Lanyu	1fd11455d8	[https://nvbugs/5556998 ][fix] init_hf_modules in worker_main for models with trust_remote=true (#8931 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-11-11 10:30:37 +08:00
Yechan Kim	0938a3ad2a	[https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-11 10:24:31 +09:00
Frida Hou	f40e1f7496	[https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp (#9047 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-10 16:25:58 -08:00
xiweny	50c486367a	[https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-10 08:12:14 -08:00
mpikulski	edc91ba819	[None][fix] Improve type annotations on ResourceManager.get_resource_manager (#9013 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-10 15:06:16 +01:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
xinhe-nv	f848d844d9	[None][chore] Add failed cases into waives.txt (#9030 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-09 23:36:05 -08:00
bhsueh_NV	e8d4a56dd0	[None][fix] fix eagle3 accuracy issue on sm120 (#8944 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-11-10 14:02:03 +08:00
Fanrong Li	a7033a9193	[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-10 12:16:01 +08:00
Yiqing Yan	78fac1f665	[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-10 10:34:06 +08:00
Bo Li	67af7c15a5	[https://nvbugs/5637037 ][fix] Update unwaive list. (#9001 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-10 08:53:07 +08:00
Emma Qiao	183778d58a	[None][infra] Waive failed tests for main 11/07 (#9008 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-08 08:51:35 -08:00
Emma Qiao	2af6a537ad	[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-08 06:34:24 -08:00
mpikulski	533add5056	[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 17:47:35 -08:00
hvagadia	6ff82ea24e	[None][feat] Allow env variable to specify spawn process IPC address (#8922 ) Signed-off-by: hvagadia <hvagadia@nvidia.com>	2025-11-07 15:45:57 -08:00
yuanjingx87	748c56a036	[None][infra] Update allowed list 2025.11.06 (#8987 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-11-07 12:02:38 -08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
Guoming Zhang	c232ffd122	[None][doc] Replace the relative links with absolute links in README.md. (#8995 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-11-08 00:23:42 +08:00
Patrice Castonguay	d8ea0b967f	[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-07 07:33:51 -08:00
Yuxian Qiu	7b82ba90da	[https://nvbugs/5629790 ][chore] unwaive test. (#8967 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-07 18:41:32 +08:00
Zhanrui Sun	e53be1564a	[TRTLLM-9213][infra] Fix boost issue (#8996 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-07 01:27:05 -08:00
Yiqing Yan	c836ae5aaa	[None][chore] Bump version to 1.2.0rc3 (#9004 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-11-07 01:24:32 -08:00
mpikulski	1944fb15af	[None][fix] add missing CLI option in multimodal example (#8977 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 09:06:08 +01:00
mpikulski	5ef65872a3	[None][fix] type annotations in fuse_input_embeds (#8976 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 09:04:08 +01:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
QI JUN	1c6e490894	[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-06 22:37:03 -08:00
Lizhi Zhou	b26e1617f2	[https://nvbugs/5633340 ][fix] kill processes properly after test (#8970 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-06 21:45:38 -08:00

1 2 3 4 5 ...

3577 Commits