Commit Graph

1634 Commits

Author SHA1 Message Date
Netanel Haber
aa72d39b72
MTP and derivatives: Align sample state with trtllm sampler sample state (#5675)
This PR moves MTPSampler and derivatives to use the universal seq_slot indexing for sampling.
This is the last piece of the puzzle: After this, all of the samplers will use this format.
See: 6ee94c7
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-07-03 19:55:48 +02:00
Po-Wei (Vincent)
0566fa1697
[None][infra] Update the auto-community label action to be triggered every hour (#5658)
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
2025-07-03 09:56:30 -07:00
Zhenhuan Chen
528ff52ef4
[https://nvbugs/5365714] fix(scaffolding): use default LLM rather than trt backend LLM (#5705)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-03 23:54:20 +09:00
Rashid Kaleem
2b0c87e613
[ModelLoad] Concurrent load model (#5291)
Signed-off-by: Rashid K <rkaleem@nvidia.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
2025-07-03 22:18:04 +08:00
nv-guomingz
8dad22cbe7
chore: refine the default value by using pydantic default instead of … (#5695)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-03 22:41:29 +09:00
Robin Kobus
1a3bd140ed
chore: Remove unused isFullContextRequest method (#5666)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-03 15:08:09 +02:00
Netanel Haber
f91379b7e8
delete duplicate eagle3 and ngram tests (#5711)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-07-03 15:47:26 +03:00
Omer Ullman Argov
c72856188c
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-03 08:06:10 -04:00
WeiHaocheng
dccbfc8b1e
fix: Set init value for moe expert id (#5660)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-07-03 07:05:31 -04:00
Emma Qiao
530897388c
[Infra] - Waive a failed case on main (#5702)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-03 06:09:27 -04:00
Yiqing Yan
de0b522dfd
[Infra] - Fix test stage check for the package sanity check stage (#5694)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-03 16:39:46 +08:00
tomeras91
7dbecf7272
[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-03 11:07:51 +03:00
Yiqing Yan
3c9dd5cd66
chore: bump version to 1.0.0rc2 (#5645)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-03 12:35:28 +08:00
Emma Qiao
2a5fdebf10
[Infra] - Waive failed tests for main 0702 (#5671)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 22:05:07 -04:00
Enwei Zhu
3a46cf275b
fix: Fix missing arg to alltoall_prepare_maybe_dispatch (#5669)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-02 21:41:55 -04:00
Fridah-nv
afef5127f0
feat:[AutoDeploy] E2E build example for llama4 VLM (#3922)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-07-02 19:29:34 -04:00
ixlmar
04fa6c0cfc
[TRTLLM-6143] feat: Improve dev container tagging (#5551)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-02 14:56:34 +02:00
Emma Qiao
31699cbeb1
[Infra] - Set default timeout to 1hr and remove some specific settings (#5667)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 08:37:54 -04:00
Jhao-Ting Chen
77082cde38
[https://nvbugspro.nvidia.com/bug/5329655] [feat] Pytorch path add spec dec param to attention op (#5146)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-07-02 04:54:43 -04:00
Robin Kobus
4cd8543d8c
[TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic from postProcessRequest (#5489)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-02 10:13:31 +02:00
qixiang-99
ca7b6ec8d8
Feat/pytorch vswa kvcachemanager (#5151)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-02 15:58:00 +08:00
Yan Chunwei
2d69b55fe8
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-02 14:21:37 +08:00
Shunkangz
3e75320fe8
Add pd dynamic scaling readme (#5540)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.com>
2025-07-02 02:18:51 -04:00
Yiteng Niu
caf27ca0f6
[chore] 2025-07-02 update github CI allowlist (#5661)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-07-02 13:57:24 +08:00
Xiaowei Wang
32dfdfba30
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-07-01 23:02:41 -04:00
Void
7992869798
perf: better heuristic for allreduce (#5432)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-07-01 22:56:06 -04:00
HuiGao-NV
10c50515c2
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-07-02 09:49:20 +08:00
Perkz Zheng
ba2ab5098b
[Bug] attention DP doesn't work with embedding TP (#5642)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-07-02 08:57:46 +08:00
Aurelien Chartier
efef911f5e
fix: add missing self. from PR #5346 (#5653)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-01 20:38:55 -04:00
Po-Wei (Vincent)
1341ffdfaa
[TRTLLM-5644][infra] Update the community action to more appropriate api (#4883)
Don't need blossom-ci
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
2025-07-01 14:44:16 -07:00
Aurelien Chartier
fa95e402a5
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-01 12:16:09 -07:00
liji-nv
c345f5876c
[feat] Support torch compile for attention dp (#5086)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-01 13:48:52 -04:00
Kaiyu Xie
f9a455651b
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-01 09:35:25 -04:00
Robin Kobus
d68fa728d8
refactor: Clean up DecodingInput and DecodingOutput (#5617)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-01 14:31:42 +02:00
Yan Chunwei
3bc703d450 ci: unwaive llmapi launch test (#5281)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Emma Qiao
178fc3f655 [Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-01 20:12:55 +08:00
ixlmar
48eee338bf fix: constrain grepping in docker/Makefile (#5493)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
ixlmar
4b3f2dbb45 fix: fix regression in LOCAL_USER (#5517)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Anurag Mukkara
93edfea2b8 [nvbug/5354825] Fix nougat test image url (#5496)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Yan Chunwei
ee7fcbf20e [nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Martin Marciniszyn Mehringer
be5ddb0533 Fix permission for local user issues in NGC docker container. (#5373)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
ruodil
ded203d8aa test: set enable_attention_dp=True in default deepseek settings (#5461)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Wanli Jiang
3789ba1d37 feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
brb-nv
4ef60d5fbb nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Ivy Zhang
61213e3562 tests: fix typos in qa test (#5421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Martin Marciniszyn Mehringer
872610a048 doc: cherry pick #5334 (#5368)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Yan Chunwei
a5eff139f1
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-01 19:06:41 +08:00
杨凯旋
61c5a53642
[#5403][perf] Conditionally enable SWAP AB for speculative decoding (#5404)
Signed-off-by: zoheth <z0heth@outlook.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-07-01 18:32:37 +08:00
Emma Qiao
65c2b93284
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-01 05:01:32 -04:00
Pamela Peng
071ad758c4
[https://nvbugs/5318059][test] Unwaive test (#5624)
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
2025-07-01 04:54:44 -04:00