TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Lucas Liebenwein	6bf4e59267	[#8763 ][feature] AutoDeploy: configurable dtype for caching (#8812 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-10 22:17:14 -08:00
jiahanc	de6088e363	[None][doc] update llama and llama4 example doc (#9048 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 22:04:26 -08:00
Bo Deng	0b9bc5aae8	[None][infra] install mooncake in docker images (#8447 ) Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com> Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> Co-authored-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-11-11 13:34:27 +08:00
Emma Qiao	da1f0e2465	[None][infra] Waive failed tests on main 11/11 (#9058 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 13:19:30 +08:00
xinhe-nv	fac522056c	[None][chore] Add failed cases into waives.txt (#8998 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-11-11 12:40:59 +08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
TensorRT LLM	c61b44e594	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-11-11 03:36:08 +00:00
shuyixiong	1ccb799c9a	[None][chore] Relocate rlhf_utils.py (#8938 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-10 19:03:23 -08:00
dongfengy	972c21c142	[None][chore] Clean up unused and confusing code in moe test (#9019 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-11-10 18:52:21 -08:00
Liao Lanyu	1fd11455d8	[https://nvbugs/5556998 ][fix] init_hf_modules in worker_main for models with trust_remote=true (#8931 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-11-11 10:30:37 +08:00
Yechan Kim	0938a3ad2a	[https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-11 10:24:31 +09:00
Frida Hou	f40e1f7496	[https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp (#9047 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-10 16:25:58 -08:00
xiweny	50c486367a	[https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-10 08:12:14 -08:00
mpikulski	edc91ba819	[None][fix] Improve type annotations on ResourceManager.get_resource_manager (#9013 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-10 15:06:16 +01:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
xinhe-nv	f848d844d9	[None][chore] Add failed cases into waives.txt (#9030 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-09 23:36:05 -08:00
bhsueh_NV	e8d4a56dd0	[None][fix] fix eagle3 accuracy issue on sm120 (#8944 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-11-10 14:02:03 +08:00
Fanrong Li	a7033a9193	[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-10 12:16:01 +08:00
Yiqing Yan	78fac1f665	[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-10 10:34:06 +08:00
Bo Li	67af7c15a5	[https://nvbugs/5637037 ][fix] Update unwaive list. (#9001 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-10 08:53:07 +08:00
Emma Qiao	183778d58a	[None][infra] Waive failed tests for main 11/07 (#9008 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-08 08:51:35 -08:00
Emma Qiao	2af6a537ad	[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-08 06:34:24 -08:00
mpikulski	533add5056	[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 17:47:35 -08:00
hvagadia	6ff82ea24e	[None][feat] Allow env variable to specify spawn process IPC address (#8922 ) Signed-off-by: hvagadia <hvagadia@nvidia.com>	2025-11-07 15:45:57 -08:00
yuanjingx87	748c56a036	[None][infra] Update allowed list 2025.11.06 (#8987 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-11-07 12:02:38 -08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
Guoming Zhang	c232ffd122	[None][doc] Replace the relative links with absolute links in README.md. (#8995 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-11-08 00:23:42 +08:00
Patrice Castonguay	d8ea0b967f	[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-07 07:33:51 -08:00
Yuxian Qiu	7b82ba90da	[https://nvbugs/5629790 ][chore] unwaive test. (#8967 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-07 18:41:32 +08:00
Zhanrui Sun	e53be1564a	[TRTLLM-9213][infra] Fix boost issue (#8996 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-07 01:27:05 -08:00
Yiqing Yan	c836ae5aaa	[None][chore] Bump version to 1.2.0rc3 (#9004 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-11-07 01:24:32 -08:00
mpikulski	1944fb15af	[None][fix] add missing CLI option in multimodal example (#8977 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 09:06:08 +01:00
mpikulski	5ef65872a3	[None][fix] type annotations in fuse_input_embeds (#8976 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 09:04:08 +01:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
QI JUN	1c6e490894	[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-06 22:37:03 -08:00
Lizhi Zhou	b26e1617f2	[https://nvbugs/5633340 ][fix] kill processes properly after test (#8970 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-06 21:45:38 -08:00
Eran Geva	990e674b71	[None][fix] Switch AD AllReduce strategy to NCCL (#8979 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-07 06:49:44 +02:00
xiweny	ee20e679a9	[https://nvbugs/5636986 ][fix] Fix DeepGemmMoe get_buffer calls (#8939 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-06 19:57:19 -08:00
Cao Dong	b53961e972	[None][feat] Return logprobs incrementally in torch backend (#8785 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-07 10:23:39 +08:00
Simeng Liu	9f8d93f89a	[https://nvbugs/5606136 ][ci] Remove tests for deprecating triton multimodal models. (#8926 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-11-06 17:58:42 -08:00
Chang Liu	1c19fd6868	[https://nvbugspro.nvidia.com/bug/5637012 ][fix] Bugfix when config is None for MLA (#8978 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 09:37:19 +08:00
jthomson04	fcae852cef	[None][fix] Fix KV cache clearing with KV Connector API (#8750 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-11-06 14:28:27 -08:00
Chenghao Zhang	1a78e7a3d6	[None][feat] AutoDeploy: Support Latent MOE for Nemotron (#8955 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-06 12:40:19 -08:00
dhansen-nvidia	ada93f1187	[https://nvbugs/5527655 ][feat] Add NUMA-aware CPU affinity autoconfig (#8805 ) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>	2025-11-06 11:59:46 -08:00
Chenghao Zhang	ddf2d010e2	[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear (#8820 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-06 11:00:10 -08:00
DylanChen-NV	b275635a9a	[https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill (#8910 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-11-06 07:41:21 -08:00
shuyixiong	c73efe12e7	[None][chore] Use cached model in all ray tests (#8962 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-06 15:14:15 +01:00
Fanrong Li	d246f62868	[https://nvbugs/5630345 ] [chore] skip deepseek-v3.2 fp8 kv tests on pre-Blackwell architectures (#8973 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-06 03:41:37 -08:00
yunruis	51545560da	[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-11-06 17:39:57 +08:00
Yilin Fan	b7798bfab8	[None][feat] Add `trtllm_` prefix for exposed metrics (#8845 ) Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>	2025-11-06 15:27:18 +08:00

1 2 3 4 5 ...

3563 Commits