WeiHaocheng
|
fddb7f1141
|
feat: moe prepare support topk % 4 != 0 (#5742)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-07-22 10:42:46 +08:00 |
|
Shunkangz
|
ee45e0c63f
|
feat: Refactor the fetching request logic (#5786)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-22 09:16:28 +08:00 |
|
Chang Liu
|
7381f1dba7
|
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444)
Only supports qwen in this PR
|
2025-07-21 16:11:58 -07:00 |
|
Pengyun Lin
|
9832bef07d
|
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-21 21:09:43 +08:00 |
|
Emma Qiao
|
e41507a253
|
[Infra] - Waive failed cases on recent post-merge (#6212)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-21 21:00:18 +08:00 |
|
liji-nv
|
3e0fb60e50
|
[TRTLLM-4279] feat: Multistream initial support for torch compile flow (#5847)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-21 19:10:22 +08:00 |
|
Linda
|
3efad2e58c
|
feat: nanobind bindings (#6185)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-21 08:56:57 +01:00 |
|
Yuening Li
|
e8c068b4b1
|
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow (#5850)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
Co-authored-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
|
2025-07-21 15:17:35 +08:00 |
|
brb-nv
|
ca9bc5727e
|
fix: Flush stale PlanParams with custom attention mask (#6163)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-21 09:55:09 +08:00 |
|
danielafrimi
|
5300a99bd8
|
W4A8 GEMM (#6005)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-20 17:34:57 +03:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Ziyi Xiong
|
66030ef815
|
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-19 13:17:15 +08:00 |
|
Venky
|
22d4a8c48a
|
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-19 00:50:40 +08:00 |
|
Stefan Niebler
|
fd6ce7f20e
|
[ci] Speedup beam search unit tests with fixtures for LLM (#5843)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-18 22:54:49 +08:00 |
|
Emma Qiao
|
77acb4f753
|
[Infra] - Waive failed tests in post-merge (#6176)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-18 17:34:34 +08:00 |
|
Zhenhuan Chen
|
992b273045
|
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e (#6140)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-18 10:34:37 +08:00 |
|
Iman Tabrizian
|
b75e53ab69
|
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-18 10:12:54 +08:00 |
|
Linda
|
5bff317abf
|
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-17 22:42:52 +08:00 |
|
nv-guomingz
|
9b45499caa
|
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-17 18:05:45 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|
Wanli Jiang
|
2d2b8bae32
|
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-17 06:30:58 +08:00 |
|
shaharmor98
|
e0836f9ca9
|
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-07-17 00:50:30 +08:00 |
|
Wanli Jiang
|
9354114f68
|
fix: Update trtllm args issues with extra nested config (#5996)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-16 12:41:45 -04:00 |
|
Yan Chunwei
|
a02606a9e2
|
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:42:59 +08:00 |
|
Yan Chunwei
|
7568deb2f1
|
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig (#6001)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:05:38 +08:00 |
|
Jaedeok Kim
|
ab1c54709d
|
fix: adjust window sizes of VSWA at torch backend (#5880)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
|
2025-07-15 17:41:54 +08:00 |
|
nv-guomingz
|
4e4d18826f
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-15 15:50:03 +09:00 |
|
brb-nv
|
f5f5be9e94
|
enh: Bidirectional mask with multiple images for Gemma3 (#5976)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-14 22:39:18 +08:00 |
|
brb-nv
|
1a2d96919c
|
feat: Update Gemma3 Vision Encoder (#5973)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-14 22:38:10 +08:00 |
|
Clay
|
dbf29184dc
|
fix #4974: A thread leak issue in scaffolding unittest (#5020)
Signed-off-by: Clay <ccs96307@gmail.com>
|
2025-07-14 20:22:03 +09:00 |
|
Kaiyu Xie
|
aa97fbb2ad
|
[Nvbug/5383670] fix: switch test case to non-fp4 ckpt for more GPU coverage (#5882)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-14 20:21:46 +09:00 |
|
Yiqing Yan
|
c720d7f779
|
Waive L0 test (#6002)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-14 19:55:34 +09:00 |
|
Zhenhuan Chen
|
30608a5e6d
|
[https://nvbugs/5355316] fix: update torch.compile option to fix triton store_cubin error (#5865)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
3fcaa8a310
|
[nvbug 5327706][fix] fix mgmn postprocess error (#5835)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Yan Chunwei
|
3e1fd983c3
|
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
388b4919b8
|
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
6992616c1f
|
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
dominicshanshan
|
c9e7f831dc
|
Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default (#5480)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-07-14 16:42:23 +08:00 |
|
QI JUN
|
ce39409530
|
fix cancel request logic (#5800)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-14 10:23:20 +08:00 |
|
wili
|
3dfc819849
|
[BUG5374319][fix] WAR for draft-target-model unit tests error (#5958)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-12 23:48:57 +09:00 |
|
Enwei Zhu
|
bc1d4fb5da
|
[NvBug 5378370] fix: Fix alltoall for llama4 (apply_router_weight_on_input=True) (#5902)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-12 15:50:31 +09:00 |
|
brb-nv
|
0385f89abc
|
test: Fix Gemma3 unit tests due to transformers upgrade (#5921)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-10 17:24:10 -07:00 |
|
2ez4bz
|
c19840235d
|
[fix] Fix mistral unit tests due to transformers upgrade (#5904)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-10 10:45:27 -07:00 |
|
wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Yiqing Yan
|
3aa53ec36c
|
[None] - Waive L0 tests (#5915)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-10 18:33:17 +08:00 |
|
Enwei Zhu
|
055c4a9fe6
|
[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-10 16:30:00 +08:00 |
|
CarstyYou
|
dc32f9ae73
|
[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2025-07-10 15:16:18 +08:00 |
|
Anthony Chang
|
7d21b55b5a
|
[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-07-10 14:06:50 +08:00 |
|
Yan Chunwei
|
07f6da763d
|
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-10 11:31:35 +08:00 |
|