Commit Graph

3563 Commits

Author SHA1 Message Date
Lucas Liebenwein
6bf4e59267
[#8763][feature] AutoDeploy: configurable dtype for caching (#8812)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-10 22:17:14 -08:00
jiahanc
de6088e363
[None][doc] update llama and llama4 example doc (#9048)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-10 22:04:26 -08:00
Bo Deng
0b9bc5aae8
[None][infra] install mooncake in docker images (#8447)
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
Co-authored-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-11-11 13:34:27 +08:00
Emma Qiao
da1f0e2465
[None][infra] Waive failed tests on main 11/11 (#9058)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 13:19:30 +08:00
xinhe-nv
fac522056c
[None][chore] Add failed cases into waives.txt (#8998)
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-11-11 12:40:59 +08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
TensorRT LLM
c61b44e594 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-11 03:36:08 +00:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py (#8938)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
dongfengy
972c21c142
[None][chore] Clean up unused and confusing code in moe test (#9019)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-10 18:52:21 -08:00
Liao Lanyu
1fd11455d8
[https://nvbugs/5556998][fix] init_hf_modules in worker_main for models with trust_remote=true (#8931)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-11 10:30:37 +08:00
Yechan Kim
0938a3ad2a
[https://nvbugs/5644187][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-11 10:24:31 +09:00
Frida Hou
f40e1f7496
[https://nvbugs/5625972][fix] Add context manager to fix FakeTensorProp (#9047)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-10 16:25:58 -08:00
xiweny
50c486367a
[https://nvbugs/5619396][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-10 08:12:14 -08:00
mpikulski
edc91ba819
[None][fix] Improve type annotations on ResourceManager.get_resource_manager (#9013)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-10 15:06:16 +01:00
ChristinaZ
2e7769d1e8
[None][feat] Add customized topk and related unit tests for DSA (#8882)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-10 03:35:35 -08:00
xinhe-nv
f848d844d9
[None][chore] Add failed cases into waives.txt (#9030)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-09 23:36:05 -08:00
bhsueh_NV
e8d4a56dd0
[None][fix] fix eagle3 accuracy issue on sm120 (#8944)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-11-10 14:02:03 +08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
Bo Li
67af7c15a5
[https://nvbugs/5637037][fix] Update unwaive list. (#9001)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-10 08:53:07 +08:00
Emma Qiao
183778d58a
[None][infra] Waive failed tests for main 11/07 (#9008)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-08 08:51:35 -08:00
Emma Qiao
2af6a537ad
[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-08 06:34:24 -08:00
mpikulski
533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
hvagadia
6ff82ea24e
[None][feat] Allow env variable to specify spawn process IPC address (#8922)
Signed-off-by: hvagadia <hvagadia@nvidia.com>
2025-11-07 15:45:57 -08:00
yuanjingx87
748c56a036
[None][infra] Update allowed list 2025.11.06 (#8987)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-07 12:02:38 -08:00
Chang Liu
7081f254cf
[None][perf] Add custom indexer k cache scatter op (#8960)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 11:24:26 -08:00
Guoming Zhang
c232ffd122
[None][doc] Replace the relative links with absolute links in README.md. (#8995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-08 00:23:42 +08:00
Patrice Castonguay
d8ea0b967f
[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-07 07:33:51 -08:00
Yuxian Qiu
7b82ba90da
[https://nvbugs/5629790][chore] unwaive test. (#8967)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-07 18:41:32 +08:00
Zhanrui Sun
e53be1564a
[TRTLLM-9213][infra] Fix boost issue (#8996)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-07 01:27:05 -08:00
Yiqing Yan
c836ae5aaa
[None][chore] Bump version to 1.2.0rc3 (#9004)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-07 01:24:32 -08:00
mpikulski
1944fb15af
[None][fix] add missing CLI option in multimodal example (#8977)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:06:08 +01:00
mpikulski
5ef65872a3
[None][fix] type annotations in fuse_input_embeds (#8976)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:04:08 +01:00
Stefan Niebler
326a201473
[https://nvbugs/5508536][fix] Take Over (#8627): Reintroduce: Move stop_criteria to sample_async (#7041) (#8794)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Lizhi Zhou
b26e1617f2
[https://nvbugs/5633340][fix] kill processes properly after test (#8970)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-06 21:45:38 -08:00
Eran Geva
990e674b71
[None][fix] Switch AD AllReduce strategy to NCCL (#8979)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-07 06:49:44 +02:00
xiweny
ee20e679a9
[https://nvbugs/5636986][fix] Fix DeepGemmMoe get_buffer calls (#8939)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-06 19:57:19 -08:00
Cao Dong
b53961e972
[None][feat] Return logprobs incrementally in torch backend (#8785)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-07 10:23:39 +08:00
Simeng Liu
9f8d93f89a
[https://nvbugs/5606136][ci] Remove tests for deprecating triton multimodal models. (#8926)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-11-06 17:58:42 -08:00
Chang Liu
1c19fd6868
[https://nvbugspro.nvidia.com/bug/5637012][fix] Bugfix when config is None for MLA (#8978)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 09:37:19 +08:00
jthomson04
fcae852cef
[None][fix] Fix KV cache clearing with KV Connector API (#8750)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-06 14:28:27 -08:00
Chenghao Zhang
1a78e7a3d6
[None][feat] AutoDeploy: Support Latent MOE for Nemotron (#8955)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-06 12:40:19 -08:00
dhansen-nvidia
ada93f1187
[https://nvbugs/5527655][feat] Add NUMA-aware CPU affinity autoconfig (#8805)
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-11-06 11:59:46 -08:00
Chenghao Zhang
ddf2d010e2
[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear (#8820)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-06 11:00:10 -08:00
DylanChen-NV
b275635a9a
[https://nvbugs/5498478][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill (#8910)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-11-06 07:41:21 -08:00
shuyixiong
c73efe12e7
[None][chore] Use cached model in all ray tests (#8962)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-06 15:14:15 +01:00
Fanrong Li
d246f62868
[https://nvbugs/5630345] [chore] skip deepseek-v3.2 fp8 kv tests on pre-Blackwell architectures (#8973)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-06 03:41:37 -08:00
yunruis
51545560da
[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-11-06 17:39:57 +08:00
Yilin Fan
b7798bfab8
[None][feat] Add trtllm_ prefix for exposed metrics (#8845)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-11-06 15:27:18 +08:00