Venky
|
9538c8d0e5
|
Add basic Nemo Ckpt Lora Loading in pytorch flow (#6019)
|
2025-07-22 19:42:45 -07:00 |
|
wili
|
8ecdeee300
|
[refactor] Simplification of Speculative decoding configs - Part 2 (#5936)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-23 09:20:27 +08:00 |
|
Iman Tabrizian
|
bc2fb29c5e
|
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support (#6224)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-23 05:27:16 +08:00 |
|
Lucas Liebenwein
|
41fb8aa8b1
|
[AutoDeploy] merge feat/ad-2025-07-07 (#6196)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
|
2025-07-23 05:11:04 +08:00 |
|
2ez4bz
|
ab7434ac62
|
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-22 11:06:41 -07:00 |
|
John Calderon
|
b7c8a672da
|
[Issue 6193] Fix gemma3vl weight loader (#6233)
Signed-off-by: John Calderon <johncalesp@gmail.com>
|
2025-07-22 10:32:18 -07:00 |
|
Linda
|
60073731ca
|
fix: bindings unit tests for nanobind (#6221)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-22 14:51:43 +01:00 |
|
Stanley Sun
|
04f2d4b2eb
|
test: update test list for RTX6KD (#6213)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-07-22 18:55:24 +08:00 |
|
Pengyun Lin
|
48ddc3d4b9
|
[fix]: Revert commit 388b491 (#6143)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
pcastonguay
|
310bdd9830
|
fix: Fix triton backend build [nvbug 5396469] (#6098)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Yi Zhang
|
eb7d0f84b5
|
[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Nikita Korobov
|
9d26b7891a
|
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Yan Chunwei
|
f194b65f3e
|
fix [nvbug/5351244]: address remote mpi session submit (#5664)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Bo Li
|
537757e669
|
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. (#5896)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Bo Li
|
db77d83a2a
|
bug: [https://nvbugs/5368507] Fix test_generate_with_seed. (#6206)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:28:38 +08:00 |
|
2ez4bz
|
37d0b68442
|
[fix] Fix flaky mistral E2E test (#6230)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-22 11:55:28 +08:00 |
|
WeiHaocheng
|
fddb7f1141
|
feat: moe prepare support topk % 4 != 0 (#5742)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-07-22 10:42:46 +08:00 |
|
Ivy Zhang
|
eb5cb5b642
|
tests: add timeout_manager to tensorrt flow test cases (#5942)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-22 10:23:41 +08:00 |
|
Shunkangz
|
ee45e0c63f
|
feat: Refactor the fetching request logic (#5786)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-22 09:16:28 +08:00 |
|
Chang Liu
|
7381f1dba7
|
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444)
Only supports qwen in this PR
|
2025-07-21 16:11:58 -07:00 |
|
Simeng Liu
|
4a0951f85c
|
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] (#5859)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-07-21 15:46:37 -07:00 |
|
Mike Iovine
|
9645814bdf
|
[chore] Clean up quickstart_advanced.py (#6021)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-07-21 15:00:59 -04:00 |
|
Yi Zhang
|
f9b0a911fb
|
test: Enable GB200 torch compile multi gpu tests (#6145)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-21 22:17:13 +08:00 |
|
Pengyun Lin
|
9832bef07d
|
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-21 21:09:43 +08:00 |
|
Emma Qiao
|
e41507a253
|
[Infra] - Waive failed cases on recent post-merge (#6212)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-21 21:00:18 +08:00 |
|
liji-nv
|
3e0fb60e50
|
[TRTLLM-4279] feat: Multistream initial support for torch compile flow (#5847)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-21 19:10:22 +08:00 |
|
Linda
|
3efad2e58c
|
feat: nanobind bindings (#6185)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-21 08:56:57 +01:00 |
|
xinhe-nv
|
b46fd41026
|
test: [CI] remove closed bugs (#6201)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-21 15:40:30 +08:00 |
|
Yuening Li
|
e8c068b4b1
|
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow (#5850)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
Co-authored-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
|
2025-07-21 15:17:35 +08:00 |
|
brb-nv
|
ca9bc5727e
|
fix: Flush stale PlanParams with custom attention mask (#6163)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-21 09:55:09 +08:00 |
|
ruodil
|
6a3c9f8061
|
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-21 11:29:19 +10:00 |
|
danielafrimi
|
5300a99bd8
|
W4A8 GEMM (#6005)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-20 17:34:57 +03:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
bhsueh_NV
|
2e14c8f443
|
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-20 10:25:25 +08:00 |
|
Ziyi Xiong
|
66030ef815
|
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-19 13:17:15 +08:00 |
|
wili
|
82d3587bb8
|
[refactor] Unify name of NGram speculative decoding (#5937)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-19 12:59:57 +08:00 |
|
xiaoqi
|
28858c8711
|
feat(eagle3):support qwen3 dense model (#5879)
Signed-off-by: xq25478 <xq25478@qq.com>
|
2025-07-19 01:24:32 +08:00 |
|
Venky
|
22d4a8c48a
|
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-19 00:50:40 +08:00 |
|
Bo Deng
|
2c6fa145ee
|
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-19 00:48:44 +08:00 |
|
Stefan Niebler
|
fd6ce7f20e
|
[ci] Speedup beam search unit tests with fixtures for LLM (#5843)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-18 22:54:49 +08:00 |
|
Erin
|
9522cde464
|
fix: NVBug 5385576 py_batch_idx issue (#6153)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-18 22:36:43 +08:00 |
|
Emma Qiao
|
77acb4f753
|
[Infra] - Waive failed tests in post-merge (#6176)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-18 17:34:34 +08:00 |
|
Chuang Zhu
|
c0e416535e
|
fix single_disagg_test (#6166)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-18 13:18:37 +08:00 |
|
Zhenhuan Chen
|
992b273045
|
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e (#6140)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-18 10:34:37 +08:00 |
|
Iman Tabrizian
|
b75e53ab69
|
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-18 10:12:54 +08:00 |
|
2ez4bz
|
8480c120b1
|
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-17 11:04:17 -07:00 |
|
Linda
|
5bff317abf
|
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-17 22:42:52 +08:00 |
|
Stanley Sun
|
9518e14f69
|
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (#6115)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-07-17 20:55:04 +10:00 |
|
Yi Zhang
|
a718486900
|
fix: Fix DeepSeek R1 CI (#6129)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-17 18:24:49 +08:00 |
|
nv-guomingz
|
9b45499caa
|
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-17 18:05:45 +08:00 |
|