danielafrimi
|
5300a99bd8
|
W4A8 GEMM (#6005)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-20 17:34:57 +03:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Ziyi Xiong
|
66030ef815
|
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-19 13:17:15 +08:00 |
|
Venky
|
22d4a8c48a
|
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-19 00:50:40 +08:00 |
|
Stefan Niebler
|
fd6ce7f20e
|
[ci] Speedup beam search unit tests with fixtures for LLM (#5843)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-18 22:54:49 +08:00 |
|
Emma Qiao
|
77acb4f753
|
[Infra] - Waive failed tests in post-merge (#6176)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-18 17:34:34 +08:00 |
|
Zhenhuan Chen
|
992b273045
|
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e (#6140)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-18 10:34:37 +08:00 |
|
Iman Tabrizian
|
b75e53ab69
|
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-18 10:12:54 +08:00 |
|
Linda
|
5bff317abf
|
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-17 22:42:52 +08:00 |
|
nv-guomingz
|
9b45499caa
|
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-17 18:05:45 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|
Wanli Jiang
|
2d2b8bae32
|
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-17 06:30:58 +08:00 |
|
shaharmor98
|
e0836f9ca9
|
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-07-17 00:50:30 +08:00 |
|
Wanli Jiang
|
9354114f68
|
fix: Update trtllm args issues with extra nested config (#5996)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-16 12:41:45 -04:00 |
|
Yan Chunwei
|
a02606a9e2
|
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:42:59 +08:00 |
|
Yan Chunwei
|
7568deb2f1
|
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig (#6001)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:05:38 +08:00 |
|
Jaedeok Kim
|
ab1c54709d
|
fix: adjust window sizes of VSWA at torch backend (#5880)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
|
2025-07-15 17:41:54 +08:00 |
|
nv-guomingz
|
4e4d18826f
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-15 15:50:03 +09:00 |
|
brb-nv
|
f5f5be9e94
|
enh: Bidirectional mask with multiple images for Gemma3 (#5976)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-14 22:39:18 +08:00 |
|
brb-nv
|
1a2d96919c
|
feat: Update Gemma3 Vision Encoder (#5973)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-14 22:38:10 +08:00 |
|
Clay
|
dbf29184dc
|
fix #4974: A thread leak issue in scaffolding unittest (#5020)
Signed-off-by: Clay <ccs96307@gmail.com>
|
2025-07-14 20:22:03 +09:00 |
|
Kaiyu Xie
|
aa97fbb2ad
|
[Nvbug/5383670] fix: switch test case to non-fp4 ckpt for more GPU coverage (#5882)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-14 20:21:46 +09:00 |
|
Yiqing Yan
|
c720d7f779
|
Waive L0 test (#6002)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-14 19:55:34 +09:00 |
|
Zhenhuan Chen
|
30608a5e6d
|
[https://nvbugs/5355316] fix: update torch.compile option to fix triton store_cubin error (#5865)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
3fcaa8a310
|
[nvbug 5327706][fix] fix mgmn postprocess error (#5835)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Yan Chunwei
|
3e1fd983c3
|
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
388b4919b8
|
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
6992616c1f
|
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
dominicshanshan
|
c9e7f831dc
|
Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default (#5480)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-07-14 16:42:23 +08:00 |
|
QI JUN
|
ce39409530
|
fix cancel request logic (#5800)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-14 10:23:20 +08:00 |
|
wili
|
3dfc819849
|
[BUG5374319][fix] WAR for draft-target-model unit tests error (#5958)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-12 23:48:57 +09:00 |
|
Enwei Zhu
|
bc1d4fb5da
|
[NvBug 5378370] fix: Fix alltoall for llama4 (apply_router_weight_on_input=True) (#5902)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-12 15:50:31 +09:00 |
|
brb-nv
|
0385f89abc
|
test: Fix Gemma3 unit tests due to transformers upgrade (#5921)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-10 17:24:10 -07:00 |
|
2ez4bz
|
c19840235d
|
[fix] Fix mistral unit tests due to transformers upgrade (#5904)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-10 10:45:27 -07:00 |
|
wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Yiqing Yan
|
3aa53ec36c
|
[None] - Waive L0 tests (#5915)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-10 18:33:17 +08:00 |
|
Enwei Zhu
|
055c4a9fe6
|
[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-10 16:30:00 +08:00 |
|
CarstyYou
|
dc32f9ae73
|
[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2025-07-10 15:16:18 +08:00 |
|
Anthony Chang
|
7d21b55b5a
|
[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-07-10 14:06:50 +08:00 |
|
Yan Chunwei
|
07f6da763d
|
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-10 11:31:35 +08:00 |
|
Venky
|
f57b3d6829
|
Waive unittest failures introduced by PR#5345 (removal of ScaffoldingOutput class) (#5886)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-07-10 09:53:31 +08:00 |
|
brb-nv
|
3209b31665
|
feat: Custom masking utils for Gemma3 VLM (#5853)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-10 06:18:04 +09:00 |
|
2ez4bz
|
87fe44fd29
|
feat(models): Mistral3.1 VLM pytorch backend support (#5529)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-09 13:17:40 -07:00 |
|
Chang Liu
|
b61a717275
|
[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396)
|
2025-07-10 05:12:53 +09:00 |
|
Wanli Jiang
|
3f7cedec7c
|
Update transformers to 4.53.0 (#5747)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-09 09:32:24 -07:00 |
|
Omer Ullman Argov
|
a32f7083b4
|
[ci] parallelize torch unittests (#5714)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-09 11:05:57 +03:00 |
|
Dom Brown
|
3e3b1769ad
|
[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-07-09 08:21:58 +01:00 |
|
Erin
|
e277766f0d
|
chores: merge examples for v1.0 doc (#5736)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-08 21:00:42 -07:00 |
|
Lucas Liebenwein
|
d14dd2f597
|
[AutoDeploy] re-enable waive for flaky AD test (#5867)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-07-09 11:47:48 +09:00 |
|