Commit Graph

3577 Commits

Author SHA1 Message Date
Lucas Liebenwein
aca56097cb
[None][fix] AutoDeploy: update nano3 accuracy test (#9061)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-11 12:26:31 -08:00
QI JUN
524754b6fd
[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 10:13:45 -08:00
Chenghao Zhang
ec9cf715a2
[None][feat] AutoDeploy: Perf improvement for mamba layers (#8991)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-11 08:27:07 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
mpikulski
20fd305bb6
[None][fix] type annotation (#9071)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 07:20:20 -08:00
mpikulski
b151de4a8f
[TRTLLM-8377][test] unit tests for TorchSampler batched sampling (#9012)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-11 07:16:42 -08:00
Guoming Zhang
b894dc2d70
[None][fix] Display the GPU memory information in GiB unit. (#9070)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-11 06:24:59 -08:00
mpikulski
979b3ae9ce
[TRTLLM-7723][feat] sampling using FlashInfer.sampling (#8581)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 03:21:19 -08:00
HuiGao-NV
23c388c58b
[https://nvbugs/5616189][fix] Make more cases use local cached models (#8935)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-11 03:14:05 -08:00
Emma Qiao
22f1523f9e
[None][infra] Only print and don't fail the check if there are duplicated items in waives.txt (#9068)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 03:04:59 -08:00
QI JUN
0ce22ce928
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] (#9069)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 02:11:15 -08:00
elvischenv
62a30bca25
[None][chore] Add tensorrt_llm/scripts to .gitignore (#8895)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-11-11 11:10:02 +01:00
Yiqing Yan
b7d51c5549
[None][chore] Remove duplicated waive test (#9067)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-11 16:49:49 +08:00
Yuxian Qiu
7aeac97e4e
[https://nvbugs/5622938][fix] Use async send_requests_to_next_pp. (#9041)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-11 14:19:44 +08:00
Lucas Liebenwein
6bf4e59267
[#8763][feature] AutoDeploy: configurable dtype for caching (#8812)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-10 22:17:14 -08:00
jiahanc
de6088e363
[None][doc] update llama and llama4 example doc (#9048)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-10 22:04:26 -08:00
Bo Deng
0b9bc5aae8
[None][infra] install mooncake in docker images (#8447)
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
Co-authored-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-11-11 13:34:27 +08:00
Emma Qiao
da1f0e2465
[None][infra] Waive failed tests on main 11/11 (#9058)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 13:19:30 +08:00
xinhe-nv
fac522056c
[None][chore] Add failed cases into waives.txt (#8998)
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-11-11 12:40:59 +08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
TensorRT LLM
c61b44e594 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-11 03:36:08 +00:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py (#8938)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
dongfengy
972c21c142
[None][chore] Clean up unused and confusing code in moe test (#9019)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-10 18:52:21 -08:00
Liao Lanyu
1fd11455d8
[https://nvbugs/5556998][fix] init_hf_modules in worker_main for models with trust_remote=true (#8931)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-11 10:30:37 +08:00
Yechan Kim
0938a3ad2a
[https://nvbugs/5644187][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-11 10:24:31 +09:00
Frida Hou
f40e1f7496
[https://nvbugs/5625972][fix] Add context manager to fix FakeTensorProp (#9047)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-10 16:25:58 -08:00
xiweny
50c486367a
[https://nvbugs/5619396][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-10 08:12:14 -08:00
mpikulski
edc91ba819
[None][fix] Improve type annotations on ResourceManager.get_resource_manager (#9013)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-10 15:06:16 +01:00
ChristinaZ
2e7769d1e8
[None][feat] Add customized topk and related unit tests for DSA (#8882)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-10 03:35:35 -08:00
xinhe-nv
f848d844d9
[None][chore] Add failed cases into waives.txt (#9030)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-09 23:36:05 -08:00
bhsueh_NV
e8d4a56dd0
[None][fix] fix eagle3 accuracy issue on sm120 (#8944)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-11-10 14:02:03 +08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
Bo Li
67af7c15a5
[https://nvbugs/5637037][fix] Update unwaive list. (#9001)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-10 08:53:07 +08:00
Emma Qiao
183778d58a
[None][infra] Waive failed tests for main 11/07 (#9008)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-08 08:51:35 -08:00
Emma Qiao
2af6a537ad
[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-08 06:34:24 -08:00
mpikulski
533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
hvagadia
6ff82ea24e
[None][feat] Allow env variable to specify spawn process IPC address (#8922)
Signed-off-by: hvagadia <hvagadia@nvidia.com>
2025-11-07 15:45:57 -08:00
yuanjingx87
748c56a036
[None][infra] Update allowed list 2025.11.06 (#8987)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-07 12:02:38 -08:00
Chang Liu
7081f254cf
[None][perf] Add custom indexer k cache scatter op (#8960)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 11:24:26 -08:00
Guoming Zhang
c232ffd122
[None][doc] Replace the relative links with absolute links in README.md. (#8995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-08 00:23:42 +08:00
Patrice Castonguay
d8ea0b967f
[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-07 07:33:51 -08:00
Yuxian Qiu
7b82ba90da
[https://nvbugs/5629790][chore] unwaive test. (#8967)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-07 18:41:32 +08:00
Zhanrui Sun
e53be1564a
[TRTLLM-9213][infra] Fix boost issue (#8996)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-07 01:27:05 -08:00
Yiqing Yan
c836ae5aaa
[None][chore] Bump version to 1.2.0rc3 (#9004)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-07 01:24:32 -08:00
mpikulski
1944fb15af
[None][fix] add missing CLI option in multimodal example (#8977)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:06:08 +01:00
mpikulski
5ef65872a3
[None][fix] type annotations in fuse_input_embeds (#8976)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:04:08 +01:00
Stefan Niebler
326a201473
[https://nvbugs/5508536][fix] Take Over (#8627): Reintroduce: Move stop_criteria to sample_async (#7041) (#8794)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Lizhi Zhou
b26e1617f2
[https://nvbugs/5633340][fix] kill processes properly after test (#8970)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-06 21:45:38 -08:00