Rashid Kaleem
|
2b0c87e613
|
[ModelLoad] Concurrent load model (#5291)
Signed-off-by: Rashid K <rkaleem@nvidia.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
|
2025-07-03 22:18:04 +08:00 |
|
nv-guomingz
|
8dad22cbe7
|
chore: refine the default value by using pydantic default instead of … (#5695)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-03 22:41:29 +09:00 |
|
Robin Kobus
|
1a3bd140ed
|
chore: Remove unused isFullContextRequest method (#5666)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-03 15:08:09 +02:00 |
|
Netanel Haber
|
f91379b7e8
|
delete duplicate eagle3 and ngram tests (#5711)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-07-03 15:47:26 +03:00 |
|
Omer Ullman Argov
|
c72856188c
|
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-03 08:06:10 -04:00 |
|
WeiHaocheng
|
dccbfc8b1e
|
fix: Set init value for moe expert id (#5660)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-07-03 07:05:31 -04:00 |
|
Emma Qiao
|
530897388c
|
[Infra] - Waive a failed case on main (#5702)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-03 06:09:27 -04:00 |
|
Yiqing Yan
|
de0b522dfd
|
[Infra] - Fix test stage check for the package sanity check stage (#5694)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-03 16:39:46 +08:00 |
|
tomeras91
|
7dbecf7272
|
[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-03 11:07:51 +03:00 |
|
Yiqing Yan
|
3c9dd5cd66
|
chore: bump version to 1.0.0rc2 (#5645)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-03 12:35:28 +08:00 |
|
Emma Qiao
|
2a5fdebf10
|
[Infra] - Waive failed tests for main 0702 (#5671)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 22:05:07 -04:00 |
|
Enwei Zhu
|
3a46cf275b
|
fix: Fix missing arg to alltoall_prepare_maybe_dispatch (#5669)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-02 21:41:55 -04:00 |
|
Fridah-nv
|
afef5127f0
|
feat:[AutoDeploy] E2E build example for llama4 VLM (#3922)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-02 19:29:34 -04:00 |
|
ixlmar
|
04fa6c0cfc
|
[TRTLLM-6143] feat: Improve dev container tagging (#5551)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-07-02 14:56:34 +02:00 |
|
Emma Qiao
|
31699cbeb1
|
[Infra] - Set default timeout to 1hr and remove some specific settings (#5667)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 08:37:54 -04:00 |
|
Jhao-Ting Chen
|
77082cde38
|
[https://nvbugspro.nvidia.com/bug/5329655] [feat] Pytorch path add spec dec param to attention op (#5146)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-07-02 04:54:43 -04:00 |
|
Robin Kobus
|
4cd8543d8c
|
[TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic from postProcessRequest (#5489)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-02 10:13:31 +02:00 |
|
qixiang-99
|
ca7b6ec8d8
|
Feat/pytorch vswa kvcachemanager (#5151)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
|
2025-07-02 15:58:00 +08:00 |
|
Yan Chunwei
|
2d69b55fe8
|
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-02 14:21:37 +08:00 |
|
Shunkangz
|
3e75320fe8
|
Add pd dynamic scaling readme (#5540)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.com>
|
2025-07-02 02:18:51 -04:00 |
|
Yiteng Niu
|
caf27ca0f6
|
[chore] 2025-07-02 update github CI allowlist (#5661)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
|
2025-07-02 13:57:24 +08:00 |
|
Xiaowei Wang
|
32dfdfba30
|
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
|
2025-07-01 23:02:41 -04:00 |
|
Void
|
7992869798
|
perf: better heuristic for allreduce (#5432)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-07-01 22:56:06 -04:00 |
|
HuiGao-NV
|
10c50515c2
|
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-02 09:49:20 +08:00 |
|
Perkz Zheng
|
ba2ab5098b
|
[Bug] attention DP doesn't work with embedding TP (#5642)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-07-02 08:57:46 +08:00 |
|
Aurelien Chartier
|
efef911f5e
|
fix: add missing self. from PR #5346 (#5653)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 20:38:55 -04:00 |
|
Po-Wei (Vincent)
|
1341ffdfaa
|
[TRTLLM-5644][infra] Update the community action to more appropriate api (#4883)
Don't need blossom-ci
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
|
2025-07-01 14:44:16 -07:00 |
|
Aurelien Chartier
|
fa95e402a5
|
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 12:16:09 -07:00 |
|
liji-nv
|
c345f5876c
|
[feat] Support torch compile for attention dp (#5086)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-01 13:48:52 -04:00 |
|
Kaiyu Xie
|
f9a455651b
|
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 09:35:25 -04:00 |
|
Robin Kobus
|
d68fa728d8
|
refactor: Clean up DecodingInput and DecodingOutput (#5617)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-01 14:31:42 +02:00 |
|
Yan Chunwei
|
3bc703d450
|
ci: unwaive llmapi launch test (#5281)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Emma Qiao
|
178fc3f655
|
[Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 20:12:55 +08:00 |
|
ixlmar
|
48eee338bf
|
fix: constrain grepping in docker/Makefile (#5493)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
ixlmar
|
4b3f2dbb45
|
fix: fix regression in LOCAL_USER (#5517)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Anurag Mukkara
|
93edfea2b8
|
[nvbug/5354825] Fix nougat test image url (#5496)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Yan Chunwei
|
ee7fcbf20e
|
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Martin Marciniszyn Mehringer
|
be5ddb0533
|
Fix permission for local user issues in NGC docker container. (#5373)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
ruodil
|
ded203d8aa
|
test: set enable_attention_dp=True in default deepseek settings (#5461)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Wanli Jiang
|
3789ba1d37
|
feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
brb-nv
|
4ef60d5fbb
|
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Ivy Zhang
|
61213e3562
|
tests: fix typos in qa test (#5421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Martin Marciniszyn Mehringer
|
872610a048
|
doc: cherry pick #5334 (#5368)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Yan Chunwei
|
a5eff139f1
|
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-01 19:06:41 +08:00 |
|
杨凯旋
|
61c5a53642
|
[#5403][perf] Conditionally enable SWAP AB for speculative decoding (#5404)
Signed-off-by: zoheth <z0heth@outlook.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
|
2025-07-01 18:32:37 +08:00 |
|
Emma Qiao
|
65c2b93284
|
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 05:01:32 -04:00 |
|
Pamela Peng
|
071ad758c4
|
[https://nvbugs/5318059][test] Unwaive test (#5624)
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
|
2025-07-01 04:54:44 -04:00 |
|
Robin Kobus
|
5f77d212ef
|
test: Reduce number of C++ test cases (#5437)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-01 09:40:49 +02:00 |
|
danielafrimi
|
7a617ad1fe
|
feat: W4A16 GEMM (#4232)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-01 10:36:05 +03:00 |
|
xinhe-nv
|
19c56f0374
|
test: [CI] Add failed cases into waives.txt (#5582)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-01 14:57:03 +08:00 |
|