Commit Graph

1596 Commits

Author SHA1 Message Date
Anurag Mukkara
93edfea2b8 [nvbug/5354825] Fix nougat test image url (#5496)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Yan Chunwei
ee7fcbf20e [nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Martin Marciniszyn Mehringer
be5ddb0533 Fix permission for local user issues in NGC docker container. (#5373)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
ruodil
ded203d8aa test: set enable_attention_dp=True in default deepseek settings (#5461)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Wanli Jiang
3789ba1d37 feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
brb-nv
4ef60d5fbb nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Ivy Zhang
61213e3562 tests: fix typos in qa test (#5421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Martin Marciniszyn Mehringer
872610a048 doc: cherry pick #5334 (#5368)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Yan Chunwei
a5eff139f1
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-01 19:06:41 +08:00
杨凯旋
61c5a53642
[#5403][perf] Conditionally enable SWAP AB for speculative decoding (#5404)
Signed-off-by: zoheth <z0heth@outlook.com>
Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-07-01 18:32:37 +08:00
Emma Qiao
65c2b93284
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-01 05:01:32 -04:00
Pamela Peng
071ad758c4
[https://nvbugs/5318059][test] Unwaive test (#5624)
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
2025-07-01 04:54:44 -04:00
Robin Kobus
5f77d212ef
test: Reduce number of C++ test cases (#5437)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-01 09:40:49 +02:00
danielafrimi
7a617ad1fe
feat: W4A16 GEMM (#4232)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-07-01 10:36:05 +03:00
xinhe-nv
19c56f0374
test: [CI] Add failed cases into waives.txt (#5582)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-01 14:57:03 +08:00
Vivian Chen
34212e2e36
[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (#5554)
Signed-off-by: Vivian Chen <140748220+xuanzic@users.noreply.github.com>
2025-06-30 21:34:42 -07:00
Stanley Sun
7135b27284
rcca: test default kv_cache_reuse option for pytorch multimodal (#5544)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-01 12:12:48 +08:00
xinhe-nv
a8cf611baa
test: [CI] Add failed cases into waives.txt (#5569)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-01 11:02:56 +08:00
xinhe-nv
9b17b29b6e
test: [CI] remove closed bugs (#5572)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-01 10:15:43 +08:00
QI JUN
82547f733d
add feature support matrix for PyTorch backend (#5037)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-01 10:09:54 +08:00
Erin
8caaf6871d
chores: [TRTLLM-6072] 1.0 LLMAPI doc updates (#5629)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-06-30 21:58:45 -04:00
Yi Zhang
7cf1209a19
[fix]: Fix main test skip issue (#5503)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-30 21:39:49 -04:00
Netanel Haber
6ee94c7ac8
Reintroduce with perf fixes: feature: unify new_tokens format sample state to trtllm samper tokens format (#5513)
58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-06-30 11:58:59 -07:00
Wei-Ming Chen
f28cd3056e
feat: AutoDeploy fp8 quantization support for bmm (#3849)
Signed-off-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
2025-06-30 12:36:34 -04:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
Li Min
16fc99391f
refactor: [TRTLLM-6150] Refactor moe permute and finalize op by removing duplicated code (#5557)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-06-30 08:48:04 -07:00
Omer Ullman Argov
3b19634a5c
[fix][ci] missing class names in post-merge test reports (#5603)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 22:13:29 +08:00
Yan Chunwei
98a7c24062
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-30 20:40:23 +08:00
Omer Ullman Argov
42134b8b84
[ci] move eagle1 and medusa tests to post-merge (#5604)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 19:32:28 +08:00
ixlmar
38a39772ce
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions (#5490) (#5605)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-30 13:27:49 +02:00
Emma Qiao
b8a568d3c6
[Infra][main] Cherry-pick from release/0.21: Update nccl to 2.27.5 (#5539) (#5587)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-30 18:12:08 +08:00
Robin Kobus
9bdc5951f8
refactor: decoder state setup (#5093)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-30 11:09:43 +02:00
Fanrong Li
6cbc9a5297
[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-30 15:59:12 +08:00
Kaiyu Xie
2ce200fbbb
doc: Minor update to DeepSeek R1 best practice (#5600)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-30 15:49:06 +08:00
WeiHaocheng
42a9385d02
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-06-30 13:06:09 +08:00
dongjiyingdjy
852b79053d
feat : support duplicate_kv_weight for qwen3 blockwise scale (#5459)
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
2025-06-30 11:49:22 +08:00
Omer Ullman Argov
1db63c2546
[fix] speedup modeling unittests (#5579)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 06:30:45 +03:00
Yiqing Yan
4fef14da56
Deduplicate waive list (#5546)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-30 11:12:26 +08:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Omer Ullman Argov
2780fc27a7
[ci] remove MMLU if followed by GSM8K (#5578)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 05:29:54 +03:00
Cheng Hang
64db7d27f6
[feat] Optimizations on weight-only batched gemv kernel (#5420)
Signed-off-by: Cheng Hang <chang@nvidia.com>
2025-06-30 10:20:16 +08:00
Enwei Zhu
b4dab23e7b
[TRTLLM-5965] perf: Optimize MoE sort kernels for large-scale EP (#5435)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-30 01:02:07 +08:00
Omer Ullman Argov
94dc97ab10
[feat][test] reuse MPI pool executor across tests (#5566)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-29 17:23:12 +03:00
Bo Li
6000380a0c
perf: Avoid reswizzle_sf after allgather. (#5504)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-29 21:25:50 +08:00
tomeras91
a1c1c6b504
[CI] reduce mamba2 ssm test parameterization (#5571)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-29 15:56:23 +03:00
Talor Abramovich
70e34a3291
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376)
Signed-off-by: Talor Abramovich <talora@nvidia.com>
2025-06-29 12:46:30 +00:00
amirkl94
de9779900c
feat: Add support for YARN in NemotronNAS models (#4906)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-06-29 09:45:49 +03:00
amirkl94
a985c0b7e6
tests: Move stress tests to be Post-Merge only (#5166)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-06-29 09:44:47 +03:00
Emma Qiao
9db769ee62
[Infra] - Add import pytest (#5565)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-29 11:06:14 +08:00
Lucas Liebenwein
619709fc33
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-29 03:52:14 +08:00