Robin Kobus
|
30a19fcf7c
|
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-07 16:30:43 +02:00 |
|
Tailing Yuan
|
85b4a6808d
|
Refactor: move DeepEP from Docker images to wheel building (#5534)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-07-07 22:57:03 +09:00 |
|
Daniel Cámpora
|
1260e2f33f
|
feat: Optimize TRTLLM Sampler perf single beam single step (#5550)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-07-07 15:44:47 +02:00 |
|
DylanChen-NV
|
5ca2b9bb15
|
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
|
2025-07-07 18:04:57 +08:00 |
|
Yi Zhang
|
ed1b3c884a
|
fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1 tests (#5774)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-07 18:38:54 +09:00 |
|
Yan Chunwei
|
dfce61f4b9
|
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler (#5751)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-07 17:05:14 +08:00 |
|
xinhe-nv
|
ded38ebdbd
|
test: [CI] remove closed bugs (#5770)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-07 18:06:07 +10:00 |
|
ChristinaZ
|
12d8c7d129
|
Refactor the topk parallelization part for the routing kernels (#5567)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-07-07 15:53:25 +08:00 |
|
Bo Li
|
9db2e9ee47
|
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-07 14:58:32 +08:00 |
|
Zheng Duan
|
de10774c2e
|
chore: log stack trace on error in openai server (#5749)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-07 14:54:36 +08:00 |
|
Yanchao Lu
|
092e0eb86a
|
[Infra] - Fix a syntax issue in the image check (#5775)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-07 11:19:59 +09:00 |
|
bhsueh_NV
|
85e934a7fe
|
[Doc] update the document of qwen3 and cuda_graph usage (#5703)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-07 09:44:25 +08:00 |
|
Daniel Stokes
|
ec6c7dff1a
|
feat: Add support for MXFP8xMXFP4 in pytorch (#5535)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
|
2025-07-06 15:32:06 -07:00 |
|
Yiteng Niu
|
66f299a205
|
[TRTLLM-5878] add stage for image registration to nspect (#5699)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-06 23:52:54 +08:00 |
|
Yanchao Lu
|
2013034948
|
[Test] - Waive or fix few known test failures (#5769)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-06 21:14:16 +08:00 |
|
Robin Kobus
|
ae27261094
|
refactor: decoding inputs (#5679)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-06 08:21:02 +02:00 |
|
Yanchao Lu
|
d95ae1378b
|
[Infra] - Always use x86 image for the Jenkins agent and few clean-ups (#5753)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-06 10:25:57 +08:00 |
|
Julien Debache
|
6bddaf6df6
|
chore: Improve documentation of Kv_block_array (#5765)
Signed-off-by: Julien Debache <julien.debache@hotmail.com>
|
2025-07-05 22:25:27 +02:00 |
|
Xianjie Qiao
|
b1976c2add
|
Add wide-ep benchmarking scripts (#5760)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-07-05 19:29:39 +08:00 |
|
Xianjie Qiao
|
089fd55eda
|
Add dummy all_reduce for kernel breakdown (#5745)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
2025-07-05 13:08:58 +09:00 |
|
jthomson04
|
1b588f8390
|
feat: KV events for sliding window attention (#5580)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
|
2025-07-05 06:05:20 +08:00 |
|
Frank
|
d61893dc77
|
[fix] Update to properly set cuda graphs in trtllm-bench overrides. (#5634)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-07-05 05:19:16 +09:00 |
|
Stefan Niebler
|
d1112aac37
|
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-05 01:35:13 +09:00 |
|
Chuang Zhu
|
ffc0b8f5da
|
Cache transceiver support VSWA (#5505)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-07-05 01:18:42 +09:00 |
|
HuiGao-NV
|
3ed3bbcb5d
|
Fix: pass allreduce strategy to pytorchConfig (#5746)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-04 21:32:13 +09:00 |
|
Yiqing Yan
|
7f3ea058f0
|
[Infra] - Waive L0 flaky test (#5759)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-04 19:25:12 +09:00 |
|
Shunkangz
|
32339d1b20
|
Raise shut down error for each request (#4936)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-04 18:58:24 +09:00 |
|
ixlmar
|
471bf0b4fc
|
fix: check file exists in dev container script (#5755)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-07-04 10:29:17 +02:00 |
|
xinhe-nv
|
3869b969a6
|
test: [CI] Add failed cases into waives.txt (#5718)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-04 17:24:48 +09:00 |
|
Faraz
|
81c0764012
|
Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5724)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-04 16:53:20 +09:00 |
|
Robin Kobus
|
07f9cf1519
|
fix: Improve chunking test and skip empty kernel calls (#5710)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-04 09:08:15 +02:00 |
|
Yiqing Yan
|
b8fef809ae
|
[Infra] - Waive L0 test (#5748)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-04 15:04:49 +08:00 |
|
Tailing Yuan
|
e134a52e07
|
Perf: reduce DeepEPLowLatency memory and time (#5712)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-07-04 14:46:28 +08:00 |
|
nv-guomingz
|
c434147366
|
chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-04 15:39:15 +09:00 |
|
Yuan Tong
|
32b244af38
|
feat: reduce unnecessary kernel generation (#5476)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-07-04 14:37:49 +08:00 |
|
Shunkangz
|
a79d8c9f5e
|
Fix none response in PD (#5422)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-04 14:25:10 +08:00 |
|
Netanel Haber
|
134b2383ff
|
[fix: nvbugs/5355493] Correctly clamp max sequence len to max attention window (#5720)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-07-04 08:16:25 +02:00 |
|
Linda
|
94f0252b46
|
Doc: Update invalid hugging face URLs (#5683)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Emma Qiao
|
a0135c0f6f
|
[Infra] - Waive failed cases on release/0.21 (#5674)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-04 13:14:13 +08:00 |
|
brb-nv
|
cdaa6abce7
|
fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Frank
|
819ae903de
|
[https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking calculation. (#5625)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Kaiyu Xie
|
ab488a5a5d
|
doc: Fix outdated config in DeepSeek best perf practice doc (#5638)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Yi Zhang
|
73d30a23c7
|
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Zheng Duan
|
cb9f596dbe
|
[nvbug 5300551] test: increase block count in eviction test (#5465)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
nv-guomingz
|
d0b3d2ac65
|
fix:https://nvbugs/5362398 (#5609)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Yan Chunwei
|
77288d3671
|
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
xinhe-nv
|
7f837b6e8b
|
tests: waive failures on main (#5704)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-04 12:39:12 +09:00 |
|
Venky
|
4762e0b244
|
Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example (#5740)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-07-04 11:01:08 +09:00 |
|
Clay
|
7a319524da
|
feat: support more parameters in openai worker of scaffolding (#5115)
Signed-off-by: Clay <ccs96307@gmail.com>
|
2025-07-04 09:35:34 +08:00 |
|
Lucas Liebenwein
|
24ac9b5f69
|
[AutoDeploy] merge feat/ad-2025-06-29 (#5737)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2025-07-04 10:21:18 +09:00 |
|