Wanli Jiang
|
9632dba02e
|
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-30 09:20:16 -07:00 |
|
NVShreyas
|
e67f4da9b5
|
[Perf]: Add residual, norm for nemotron_nas models (#6455)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
|
2025-07-30 09:10:38 -07:00 |
|
pcastonguay
|
0f083b9daf
|
fix: Unwaive triton cpp test [nvbug 5401088] (#6412)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-30 11:25:18 -04:00 |
|
nv-guomingz
|
03e38c9087
|
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 11:11:06 -04:00 |
|
Chang Liu
|
b4065d8ca6
|
[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-07-30 10:00:15 -04:00 |
|
pcastonguay
|
e7ae5e2824
|
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-07-30 09:42:13 -04:00 |
|
tomeras91
|
a2514d93fc
|
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-30 07:22:32 -04:00 |
|
Leslie Fang
|
d980928c96
|
[doc] update the doc of feature combination matrix (#6441)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-07-30 18:48:49 +08:00 |
|
Yiqing Yan
|
0cf2f6f154
|
[TRTLLM-5633] - Merge current waive list with the TOT waive list (#5198)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-30 17:50:05 +08:00 |
|
Yechan Kim
|
22b29df38c
|
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 17:29:55 +08:00 |
|
xinhe-nv
|
d9ab3fd35e
|
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 18:45:58 +10:00 |
|
nv-guomingz
|
a5540acfce
|
chore: add trtllm-serve json schema example into doc. (#6418)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 04:33:08 -04:00 |
|
QI JUN
|
2fe9cc0889
|
chore: remove draft_model_engine from init parameter list of PyExecutor (#6325)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-30 03:31:49 -04:00 |
|
QI JUN
|
1f39a11af0
|
chore: clean code of PyExecutor (#6445)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-30 02:11:43 -04:00 |
|
2ez4bz
|
d6eed1b624
|
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-30 14:10:36 +08:00 |
|
Jinyang Yuan
|
a427f5bece
|
[fix] Fix wide EP when using DeepEP with online EPLB (#6429)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-30 00:13:18 -04:00 |
|
Zheng Duan
|
c9ed1ab436
|
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-30 10:39:40 +08:00 |
|
xinhe-nv
|
c00d6763b2
|
test: [CI] Add failed cases into waives.txt (#6457)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-30 12:36:58 +10:00 |
|
peaceh-nv
|
5b420ad267
|
Rename layer to comply with deepseek (#6393)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-30 10:00:48 +08:00 |
|
Venky
|
ab40369053
|
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 10:53:43 +10:00 |
|
Yechan Kim
|
d6eb8e2366
|
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 08:52:31 +08:00 |
|
Yunfan Fan
|
1a8e28d295
|
[FIX] fix bugs caused by None attention_bias during Qwen3 model convert engine (#6344)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
|
2025-07-30 07:13:44 +08:00 |
|
Yan Chunwei
|
ad662ddcdd
|
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 16:16:52 -04:00 |
|
Yan Chunwei
|
1a6930986a
|
chore: remove unused kv_cache_dtype in api reference (#6444)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 14:57:20 -04:00 |
|
Michal Guzek
|
7efe3cb0cd
|
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
2025-07-29 10:16:59 -07:00 |
|
Zhanrui Sun
|
c3729dbd7d
|
infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build (#4939)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-07-29 12:54:38 -04:00 |
|
nv-guomingz
|
7231134996
|
doc: remove backend parameter for trtllm-bench when backend is set to… (#6428)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-29 11:01:21 -04:00 |
|
xinhe-nv
|
f1086e7d4f
|
test: [CI] remove closed bugs (#6381)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-29 19:01:23 +10:00 |
|
xinhe-nv
|
4fbb344caf
|
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-29 19:00:30 +10:00 |
|
Yukun He
|
0eee2e2850
|
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-29 16:41:48 +08:00 |
|
QI JUN
|
13e24ab1cb
|
chore: remove unused code in PyExecutor (#6351)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-29 16:24:26 +08:00 |
|
ruodil
|
e11255e9d0
|
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-29 15:52:45 +10:00 |
|
Frank
|
d2a04abb95
|
[fix] Fixes to parameter usage and low latency configuration. (#6343)
|
2025-07-29 01:36:13 -04:00 |
|
Kaiyu Xie
|
e58afa510e
|
doc: Add README for wide EP (#6356)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-29 00:36:12 -04:00 |
|
Zhanrui Sun
|
64ba483656
|
infra: [TRTLLM-6499] Split L0_Test into two pipeline by single GPU and multi GPU(For SBSA) (#6132)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-07-28 22:54:37 -04:00 |
|
Frank
|
ee3cbb073e
|
[fix] Add trust_remote_code option to prepare_dataset. (#6338)
|
2025-07-28 14:49:45 -07:00 |
|
Venky
|
2d21bca25e
|
[infra] Remove auto_apply_labels option from .coderabbit.yaml reviews section (#6416)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-07-28 14:16:45 -07:00 |
|
Michal Guzek
|
2573bb729d
|
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-28 14:02:14 -07:00 |
|
Aurelien Chartier
|
738ab61593
|
[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs (#6339)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-28 12:36:44 -07:00 |
|
Po-Wei (Vincent)
|
bca14157a9
|
[infra] Add an auto-labeling github action to TRTLLM (#6373)
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
|
2025-07-28 12:25:51 -07:00 |
|
yuanjingx87
|
608ed89f96
|
[None][infra]Update slurm config keys (#6370)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-07-28 11:56:37 -07:00 |
|
2ez4bz
|
cdca541148
|
[test] Unwaive mistral3.1 small E2E test (#6352)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 14:37:42 -04:00 |
|
2ez4bz
|
60e4d3a9d4
|
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 09:41:44 -07:00 |
|
nv-guomingz
|
49044733e1
|
chore: delete useless gitkeep files. (#6400)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-28 11:38:30 -04:00 |
|
ruodil
|
03632a679f
|
test: organize perf cases and add missing perflab cases in qa test list (#6283)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-28 20:33:32 +10:00 |
|
xinhe-nv
|
971be1fe86
|
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-28 20:31:43 +10:00 |
|
QI JUN
|
4efc6496b7
|
chore: add _prepare_and_schedule_batch function in PyExecutor (#6365)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-28 05:50:27 -04:00 |
|
Yuan Tong
|
413a83ff80
|
fix: compatibility with CUDA < 12.9 on __CUDA_ARCH_SPECIFIC__ macro (#5917)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-07-28 16:02:26 +08:00 |
|
Yan Chunwei
|
45d441e60c
|
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 15:57:07 +08:00 |
|
Ivy Zhang
|
2945817cae
|
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-28 15:33:30 +08:00 |
|