Iman Tabrizian
|
bc2fb29c5e
|
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support (#6224)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-23 05:27:16 +08:00 |
|
Lucas Liebenwein
|
41fb8aa8b1
|
[AutoDeploy] merge feat/ad-2025-07-07 (#6196)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
|
2025-07-23 05:11:04 +08:00 |
|
Raayan Dhar
|
5234502717
|
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency (#6222)
Signed-off-by: raayandhar <rdhar@nvidia.com>
|
2025-07-22 11:28:23 -07:00 |
|
yuanjingx87
|
ef4878db05
|
set NVIDIA_IMEX_CHANNELS for dlcluster slurm job only (#6234)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-07-22 11:27:54 -07:00 |
|
2ez4bz
|
ab7434ac62
|
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-22 11:06:41 -07:00 |
|
John Calderon
|
b7c8a672da
|
[Issue 6193] Fix gemma3vl weight loader (#6233)
Signed-off-by: John Calderon <johncalesp@gmail.com>
|
2025-07-22 10:32:18 -07:00 |
|
danielafrimi
|
ff9963978a
|
Add register_fake for finegrained_mixed_dtype_gemm torch_op (#6255)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-22 16:59:55 +03:00 |
|
Linda
|
60073731ca
|
fix: bindings unit tests for nanobind (#6221)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-22 14:51:43 +01:00 |
|
Stanley Sun
|
04f2d4b2eb
|
test: update test list for RTX6KD (#6213)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-07-22 18:55:24 +08:00 |
|
Lizhi Zhou
|
3e1a0fbac4
|
[TRTLLM-6537][infra] extend multi-gpu tests related file list (#6139)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-07-22 16:57:06 +08:00 |
|
Yiqing Yan
|
3e18ee5fe1
|
chore: bump version to 1.0.0rc5 (#6252)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-22 16:24:28 +08:00 |
|
Yechan Kim
|
b85ab139f9
|
doc: add supported data modality and types on multimodal serve (#5988)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-22 14:32:41 +08:00 |
|
Pengyun Lin
|
48ddc3d4b9
|
[fix]: Revert commit 388b491 (#6143)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
bhsueh_NV
|
24ce6b9517
|
[Doc][Qwen3] update qwen3 into support-matrix (#6161)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
pcastonguay
|
310bdd9830
|
fix: Fix triton backend build [nvbug 5396469] (#6098)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
QI JUN
|
a03c680581
|
add release notes for 0.21 release (#6049)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-22 12:48:00 +08:00 |
|
nv-guomingz
|
34dd071bd6
|
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation. (#6039)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Yi Zhang
|
eb7d0f84b5
|
[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Fanrong Li
|
c66941036f
|
fix: fix index out of bounds error in spec decoding (#5954)
|
2025-07-22 12:48:00 +08:00 |
|
Nikita Korobov
|
9d26b7891a
|
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Yan Chunwei
|
f194b65f3e
|
fix [nvbug/5351244]: address remote mpi session submit (#5664)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
amirkl94
|
f4f2176cd5
|
chore: Port leftover 0.20 (#5907)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Yingge He <yinggeh@nvidia.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Bo Li
|
537757e669
|
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. (#5896)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Bo Li
|
db77d83a2a
|
bug: [https://nvbugs/5368507] Fix test_generate_with_seed. (#6206)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:28:38 +08:00 |
|
2ez4bz
|
37d0b68442
|
[fix] Fix flaky mistral E2E test (#6230)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-22 11:55:28 +08:00 |
|
WeiHaocheng
|
fddb7f1141
|
feat: moe prepare support topk % 4 != 0 (#5742)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-07-22 10:42:46 +08:00 |
|
Ivy Zhang
|
eb5cb5b642
|
tests: add timeout_manager to tensorrt flow test cases (#5942)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-22 10:23:41 +08:00 |
|
Shunkangz
|
ee45e0c63f
|
feat: Refactor the fetching request logic (#5786)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-22 09:16:28 +08:00 |
|
Chang Liu
|
7381f1dba7
|
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444)
Only supports qwen in this PR
|
2025-07-21 16:11:58 -07:00 |
|
Simeng Liu
|
4a0951f85c
|
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] (#5859)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-07-21 15:46:37 -07:00 |
|
Mike Iovine
|
9645814bdf
|
[chore] Clean up quickstart_advanced.py (#6021)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-07-21 15:00:59 -04:00 |
|
Ziyi Xiong
|
d7f0b0ab68
|
[fix] Correct the returned value of has_spec_drafter (#6178)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-21 11:38:59 -04:00 |
|
Yi Zhang
|
f9b0a911fb
|
test: Enable GB200 torch compile multi gpu tests (#6145)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-21 22:17:13 +08:00 |
|
Pengyun Lin
|
9832bef07d
|
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-21 21:09:43 +08:00 |
|
Emma Qiao
|
e41507a253
|
[Infra] - Waive failed cases on recent post-merge (#6212)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-21 21:00:18 +08:00 |
|
liji-nv
|
3e0fb60e50
|
[TRTLLM-4279] feat: Multistream initial support for torch compile flow (#5847)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-21 19:10:22 +08:00 |
|
QI JUN
|
aea91b2541
|
doc: add Deprecation Policy section (#5784)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-21 18:47:22 +08:00 |
|
Zhanrui Sun
|
3cbc23f783
|
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image) (#4656)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-21 16:06:43 +08:00 |
|
Linda
|
3efad2e58c
|
feat: nanobind bindings (#6185)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-21 08:56:57 +01:00 |
|
xinhe-nv
|
b46fd41026
|
test: [CI] remove closed bugs (#6201)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-21 15:40:30 +08:00 |
|
Yuening Li
|
e8c068b4b1
|
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow (#5850)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
Co-authored-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
|
2025-07-21 15:17:35 +08:00 |
|
Jinyang Yuan
|
88076eecd0
|
[fix] Fix can_use_alltoall in fused_moe_wide_ep.py (#6173)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-21 10:53:07 +08:00 |
|
nv-guomingz
|
b4c7e8c9a5
|
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-21 10:49:29 +08:00 |
|
brb-nv
|
ca9bc5727e
|
fix: Flush stale PlanParams with custom attention mask (#6163)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-21 09:55:09 +08:00 |
|
ruodil
|
6a3c9f8061
|
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-21 11:29:19 +10:00 |
|
brb-nv
|
a433ebad2b
|
enh: Lift expectation of single image per sample in Gemma3 VLM (#6195)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-21 08:43:07 +08:00 |
|
danielafrimi
|
5300a99bd8
|
W4A8 GEMM (#6005)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-20 17:34:57 +03:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Martin Marciniszyn Mehringer
|
943fd418dd
|
fix: Ensure mlx5 library is installed for deep_ep and remove deprecated python bindings (#6189)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-07-20 10:38:51 +08:00 |
|
bhsueh_NV
|
2e14c8f443
|
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-20 10:25:25 +08:00 |
|