Wanli Jiang
|
3f7cedec7c
|
Update transformers to 4.53.0 (#5747)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-09 09:32:24 -07:00 |
|
Omer Ullman Argov
|
a32f7083b4
|
[ci] parallelize torch unittests (#5714)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-09 11:05:57 +03:00 |
|
Dom Brown
|
3e3b1769ad
|
[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-07-09 08:21:58 +01:00 |
|
Erin
|
e277766f0d
|
chores: merge examples for v1.0 doc (#5736)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-08 21:00:42 -07:00 |
|
Lucas Liebenwein
|
d14dd2f597
|
[AutoDeploy] re-enable waive for flaky AD test (#5867)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-07-09 11:47:48 +09:00 |
|
brb-nv
|
2bd09ed2d4
|
fix: Skip rope scaling for local layers in Gemma3 VLM (#5857)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-09 10:10:33 +08:00 |
|
Fridah-nv
|
a79b73f577
|
fix: [5376140] [AutoDeploy] Update unit tests: skip all_close assert for dropout in attention, increase tolerance for rope op test (#5855)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-09 09:13:31 +09:00 |
|
Dom Brown
|
e3ccca06e1
|
test: reduce redundant test cases for TRTLLM Gen FP8 MoE (#5845)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-07-09 00:40:33 +09:00 |
|
Kaiyu Xie
|
bb5b16fcb9
|
feat: Return context response immediately when stream_interval > 1 (#5836)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-09 00:19:57 +09:00 |
|
Yegor
|
b01d1c28f7
|
[feat] Detokenize option in /v1/completions request (#5382)
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
|
2025-07-08 19:36:04 +08:00 |
|
Yiqing Yan
|
ec0d7e64b9
|
[Infra] - Waive L0 test (#5837)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-08 17:54:06 +08:00 |
|
Enwei Zhu
|
55f86ce7ab
|
[NvBug 5362426] fix: Fix prompt adapter TP2 case (#5782)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-08 16:01:36 +09:00 |
|
Venky
|
9258187e98
|
Waive some test_llama_eagle3 unittests (#5811)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-07-08 15:35:27 +09:00 |
|
nv-guomingz
|
0be41b6524
|
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818)
|
2025-07-08 13:15:30 +09:00 |
|
Yechan Kim
|
5bc3a15f10
|
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-07 18:03:12 -07:00 |
|
nv-guomingz
|
5a8173c121
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-08 08:52:36 +08:00 |
|
Omer Ullman Argov
|
1191555cce
|
[ci] speedup fused moe tests (#5726)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-07 18:03:15 +03:00 |
|
Robin Kobus
|
30a19fcf7c
|
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-07 16:30:43 +02:00 |
|
DylanChen-NV
|
5ca2b9bb15
|
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
|
2025-07-07 18:04:57 +08:00 |
|
Yan Chunwei
|
dfce61f4b9
|
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler (#5751)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-07 17:05:14 +08:00 |
|
Bo Li
|
9db2e9ee47
|
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-07 14:58:32 +08:00 |
|
Yanchao Lu
|
2013034948
|
[Test] - Waive or fix few known test failures (#5769)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-06 21:14:16 +08:00 |
|
Stefan Niebler
|
d1112aac37
|
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-05 01:35:13 +09:00 |
|
Shunkangz
|
32339d1b20
|
Raise shut down error for each request (#4936)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-04 18:58:24 +09:00 |
|
Emma Qiao
|
a0135c0f6f
|
[Infra] - Waive failed cases on release/0.21 (#5674)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-04 13:14:13 +08:00 |
|
brb-nv
|
cdaa6abce7
|
fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
nv-guomingz
|
d0b3d2ac65
|
fix:https://nvbugs/5362398 (#5609)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Yan Chunwei
|
77288d3671
|
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Lucas Liebenwein
|
24ac9b5f69
|
[AutoDeploy] merge feat/ad-2025-06-29 (#5737)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-04 10:21:18 +09:00 |
|
Omer Ullman Argov
|
c72856188c
|
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-03 08:06:10 -04:00 |
|
tomeras91
|
7dbecf7272
|
[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-03 11:07:51 +03:00 |
|
Emma Qiao
|
2a5fdebf10
|
[Infra] - Waive failed tests for main 0702 (#5671)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 22:05:07 -04:00 |
|
Fridah-nv
|
afef5127f0
|
feat:[AutoDeploy] E2E build example for llama4 VLM (#3922)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-02 19:29:34 -04:00 |
|
Jhao-Ting Chen
|
77082cde38
|
[https://nvbugspro.nvidia.com/bug/5329655] [feat] Pytorch path add spec dec param to attention op (#5146)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-07-02 04:54:43 -04:00 |
|
Yan Chunwei
|
2d69b55fe8
|
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-02 14:21:37 +08:00 |
|
Xiaowei Wang
|
32dfdfba30
|
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
|
2025-07-01 23:02:41 -04:00 |
|
HuiGao-NV
|
10c50515c2
|
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-02 09:49:20 +08:00 |
|
Aurelien Chartier
|
fa95e402a5
|
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 12:16:09 -07:00 |
|
liji-nv
|
c345f5876c
|
[feat] Support torch compile for attention dp (#5086)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-01 13:48:52 -04:00 |
|
Kaiyu Xie
|
f9a455651b
|
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 09:35:25 -04:00 |
|
Emma Qiao
|
178fc3f655
|
[Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 20:12:55 +08:00 |
|
Yan Chunwei
|
ee7fcbf20e
|
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Emma Qiao
|
65c2b93284
|
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 05:01:32 -04:00 |
|
danielafrimi
|
7a617ad1fe
|
feat: W4A16 GEMM (#4232)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-01 10:36:05 +03:00 |
|
Wei-Ming Chen
|
f28cd3056e
|
feat: AutoDeploy fp8 quantization support for bmm (#3849)
Signed-off-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
|
2025-06-30 12:36:34 -04:00 |
|
nv-guomingz
|
6e48ac25a6
|
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 12:23:14 -04:00 |
|
Yan Chunwei
|
98a7c24062
|
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-30 20:40:23 +08:00 |
|
WeiHaocheng
|
42a9385d02
|
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-06-30 13:06:09 +08:00 |
|
Omer Ullman Argov
|
1db63c2546
|
[fix] speedup modeling unittests (#5579)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-30 06:30:45 +03:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|