nv-guomingz
|
d0b3d2ac65
|
fix:https://nvbugs/5362398 (#5609)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Yan Chunwei
|
77288d3671
|
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Lucas Liebenwein
|
24ac9b5f69
|
[AutoDeploy] merge feat/ad-2025-06-29 (#5737)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-04 10:21:18 +09:00 |
|
Omer Ullman Argov
|
c72856188c
|
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-03 08:06:10 -04:00 |
|
tomeras91
|
7dbecf7272
|
[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-03 11:07:51 +03:00 |
|
Emma Qiao
|
2a5fdebf10
|
[Infra] - Waive failed tests for main 0702 (#5671)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 22:05:07 -04:00 |
|
Fridah-nv
|
afef5127f0
|
feat:[AutoDeploy] E2E build example for llama4 VLM (#3922)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-02 19:29:34 -04:00 |
|
Jhao-Ting Chen
|
77082cde38
|
[https://nvbugspro.nvidia.com/bug/5329655] [feat] Pytorch path add spec dec param to attention op (#5146)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-07-02 04:54:43 -04:00 |
|
Yan Chunwei
|
2d69b55fe8
|
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-02 14:21:37 +08:00 |
|
Xiaowei Wang
|
32dfdfba30
|
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
|
2025-07-01 23:02:41 -04:00 |
|
HuiGao-NV
|
10c50515c2
|
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-02 09:49:20 +08:00 |
|
Aurelien Chartier
|
fa95e402a5
|
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 12:16:09 -07:00 |
|
liji-nv
|
c345f5876c
|
[feat] Support torch compile for attention dp (#5086)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-01 13:48:52 -04:00 |
|
Kaiyu Xie
|
f9a455651b
|
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 09:35:25 -04:00 |
|
Emma Qiao
|
178fc3f655
|
[Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 20:12:55 +08:00 |
|
Yan Chunwei
|
ee7fcbf20e
|
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Emma Qiao
|
65c2b93284
|
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 05:01:32 -04:00 |
|
danielafrimi
|
7a617ad1fe
|
feat: W4A16 GEMM (#4232)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-01 10:36:05 +03:00 |
|
Wei-Ming Chen
|
f28cd3056e
|
feat: AutoDeploy fp8 quantization support for bmm (#3849)
Signed-off-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
|
2025-06-30 12:36:34 -04:00 |
|
nv-guomingz
|
6e48ac25a6
|
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 12:23:14 -04:00 |
|
Yan Chunwei
|
98a7c24062
|
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-30 20:40:23 +08:00 |
|
WeiHaocheng
|
42a9385d02
|
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-06-30 13:06:09 +08:00 |
|
Omer Ullman Argov
|
1db63c2546
|
[fix] speedup modeling unittests (#5579)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-30 06:30:45 +03:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Cheng Hang
|
64db7d27f6
|
[feat] Optimizations on weight-only batched gemv kernel (#5420)
Signed-off-by: Cheng Hang <chang@nvidia.com>
|
2025-06-30 10:20:16 +08:00 |
|
Omer Ullman Argov
|
94dc97ab10
|
[feat][test] reuse MPI pool executor across tests (#5566)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-29 17:23:12 +03:00 |
|
tomeras91
|
a1c1c6b504
|
[CI] reduce mamba2 ssm test parameterization (#5571)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-06-29 15:56:23 +03:00 |
|
Talor Abramovich
|
70e34a3291
|
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376)
Signed-off-by: Talor Abramovich <talora@nvidia.com>
|
2025-06-29 12:46:30 +00:00 |
|
Emma Qiao
|
9db769ee62
|
[Infra] - Add import pytest (#5565)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-29 11:06:14 +08:00 |
|
Lucas Liebenwein
|
619709fc33
|
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-29 03:52:14 +08:00 |
|
Li Min
|
6021a439ab
|
Make moe permute and final as custom op (#5412)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-06-27 15:48:33 -07:00 |
|
Aurelien Chartier
|
833c0dea4a
|
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI (#5497)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-27 17:03:05 +02:00 |
|
wili
|
56cdfe5c6c
|
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-06-27 23:00:17 +08:00 |
|
Enwei Zhu
|
7f1893f54c
|
ci: waive flaky test test_llama_eagle3 (#5548)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-27 19:16:07 +08:00 |
|
Emma Qiao
|
980030c816
|
[Infra] - Waive failed case in post-merge (#5536)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-27 13:55:49 +08:00 |
|
Yibin Li
|
0f3bd7800e
|
[TRTLLM-4971]: Use safe deserialization in ParallelConfig (#4630)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-06-27 09:58:41 +08:00 |
|
Robin Kobus
|
8dfa31c71d
|
refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-26 19:45:52 +08:00 |
|
Omer Ullman Argov
|
6bae76d7ca
|
[fix][ci] move torch tests to run under torch stage (#5473)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 14:31:38 +03:00 |
|
Bo Li
|
1bab9000a6
|
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-06-26 14:03:56 +08:00 |
|
dongxuy04
|
490d2e5819
|
feat: large-scale EP(part 8: Online EP load balancer integration for PCIe fp8) (#5226)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-25 22:25:13 -07:00 |
|
Omer Ullman Argov
|
61bb71fd1b
|
[fix][test] remove test in global scope (#5470)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-25 23:42:26 +03:00 |
|
QI JUN
|
3a2c4ca77b
|
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 04:32:46 +08:00 |
|
HuiGao-NV
|
314f15f0a7
|
Fix: fix nvbug 5356427 (#5464)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 22:24:26 +08:00 |
|
QI JUN
|
2901c5a5bc
|
CI: waive test_ad_build_small_multi (#5471)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 16:44:42 +08:00 |
|
Netanel Haber
|
3ca2f6ac51
|
start OAIServer with max_beam_width=1 for TorchSampler (#5427)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-06-25 15:52:06 +08:00 |
|
Enwei Zhu
|
fc7a81ceb0
|
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 14:12:56 +08:00 |
|
Shunkangz
|
d5354897c0
|
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-25 09:50:04 +08:00 |
|
Lucas Liebenwein
|
5cffb7e0ec
|
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch (#5454)
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-06-25 09:30:13 +08:00 |
|
QI JUN
|
241f921800
|
waive test_moe.py::test_moe_fp8[autotune] (#5455)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 09:14:44 +08:00 |
|
Iman Tabrizian
|
846bbf1edc
|
Fix test Pytorch model engine (#5416)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-24 11:09:27 -07:00 |
|