Zheng Duan
|
c9ed1ab436
|
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-30 10:39:40 +08:00 |
|
xinhe-nv
|
c00d6763b2
|
test: [CI] Add failed cases into waives.txt (#6457)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-30 12:36:58 +10:00 |
|
peaceh-nv
|
5b420ad267
|
Rename layer to comply with deepseek (#6393)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-30 10:00:48 +08:00 |
|
Venky
|
ab40369053
|
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 10:53:43 +10:00 |
|
Yechan Kim
|
d6eb8e2366
|
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 08:52:31 +08:00 |
|
Yunfan Fan
|
1a8e28d295
|
[FIX] fix bugs caused by None attention_bias during Qwen3 model convert engine (#6344)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
|
2025-07-30 07:13:44 +08:00 |
|
Yan Chunwei
|
ad662ddcdd
|
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 16:16:52 -04:00 |
|
Yan Chunwei
|
1a6930986a
|
chore: remove unused kv_cache_dtype in api reference (#6444)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 14:57:20 -04:00 |
|
Michal Guzek
|
7efe3cb0cd
|
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
2025-07-29 10:16:59 -07:00 |
|
Zhanrui Sun
|
c3729dbd7d
|
infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build (#4939)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-07-29 12:54:38 -04:00 |
|
nv-guomingz
|
7231134996
|
doc: remove backend parameter for trtllm-bench when backend is set to… (#6428)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-29 11:01:21 -04:00 |
|
xinhe-nv
|
f1086e7d4f
|
test: [CI] remove closed bugs (#6381)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-29 19:01:23 +10:00 |
|
xinhe-nv
|
4fbb344caf
|
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-29 19:00:30 +10:00 |
|
Yukun He
|
0eee2e2850
|
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-29 16:41:48 +08:00 |
|
QI JUN
|
13e24ab1cb
|
chore: remove unused code in PyExecutor (#6351)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-29 16:24:26 +08:00 |
|
ruodil
|
e11255e9d0
|
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-29 15:52:45 +10:00 |
|
Frank
|
d2a04abb95
|
[fix] Fixes to parameter usage and low latency configuration. (#6343)
|
2025-07-29 01:36:13 -04:00 |
|
Kaiyu Xie
|
e58afa510e
|
doc: Add README for wide EP (#6356)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-29 00:36:12 -04:00 |
|
Zhanrui Sun
|
64ba483656
|
infra: [TRTLLM-6499] Split L0_Test into two pipeline by single GPU and multi GPU(For SBSA) (#6132)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-07-28 22:54:37 -04:00 |
|
Frank
|
ee3cbb073e
|
[fix] Add trust_remote_code option to prepare_dataset. (#6338)
|
2025-07-28 14:49:45 -07:00 |
|
Venky
|
2d21bca25e
|
[infra] Remove auto_apply_labels option from .coderabbit.yaml reviews section (#6416)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-07-28 14:16:45 -07:00 |
|
Michal Guzek
|
2573bb729d
|
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-28 14:02:14 -07:00 |
|
Aurelien Chartier
|
738ab61593
|
[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs (#6339)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-28 12:36:44 -07:00 |
|
Po-Wei (Vincent)
|
bca14157a9
|
[infra] Add an auto-labeling github action to TRTLLM (#6373)
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
|
2025-07-28 12:25:51 -07:00 |
|
yuanjingx87
|
608ed89f96
|
[None][infra]Update slurm config keys (#6370)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-07-28 11:56:37 -07:00 |
|
2ez4bz
|
cdca541148
|
[test] Unwaive mistral3.1 small E2E test (#6352)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 14:37:42 -04:00 |
|
2ez4bz
|
60e4d3a9d4
|
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 09:41:44 -07:00 |
|
nv-guomingz
|
49044733e1
|
chore: delete useless gitkeep files. (#6400)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-28 11:38:30 -04:00 |
|
ruodil
|
03632a679f
|
test: organize perf cases and add missing perflab cases in qa test list (#6283)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-28 20:33:32 +10:00 |
|
xinhe-nv
|
971be1fe86
|
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-28 20:31:43 +10:00 |
|
QI JUN
|
4efc6496b7
|
chore: add _prepare_and_schedule_batch function in PyExecutor (#6365)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-28 05:50:27 -04:00 |
|
Yuan Tong
|
413a83ff80
|
fix: compatibility with CUDA < 12.9 on __CUDA_ARCH_SPECIFIC__ macro (#5917)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-07-28 16:02:26 +08:00 |
|
Yan Chunwei
|
45d441e60c
|
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 15:57:07 +08:00 |
|
Ivy Zhang
|
2945817cae
|
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-28 15:33:30 +08:00 |
|
Emma Qiao
|
b3ca159787
|
[Infa] - waive failed cases and fix a typo (#6384)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-28 02:06:57 -04:00 |
|
Zero Zeng
|
c9b8b6180f
|
Add Acceptance Rate calculation to benchmark_serving (#6240)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
|
2025-07-28 14:00:58 +08:00 |
|
Jinyang Yuan
|
97f7e12588
|
[fix] Fix perf regression caused by MoE autotuner when using DeepEPLowLatency (#6288)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-28 01:37:11 -04:00 |
|
Chang Liu
|
dc757799e1
|
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266)
|
2025-07-27 23:29:21 -04:00 |
|
Void
|
f172face98
|
DeepEP LL dispatch FP4 (#6296)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-07-28 11:25:42 +08:00 |
|
Yukun He
|
93a0fd0a23
|
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4. (#6205)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-28 09:36:26 +08:00 |
|
YueWeng
|
2dd3186727
|
fix: remove cudaStreamSynchronize when using relaxed acceptance (#5262)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-07-28 09:18:41 +08:00 |
|
Yan Chunwei
|
908f49a4ad
|
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 09:01:10 +08:00 |
|
Ziyi Xiong
|
d853811190
|
[https://nvbugs/5402719][fix]: Add cuda graph dummy requests to the spec_resource_manager (#6258)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-26 20:32:39 -04:00 |
|
Liana Koleva
|
96d004d800
|
doc: fix invalid link in llama 4 example documentation (#6340)
Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
|
2025-07-26 11:27:10 -04:00 |
|
Jhao-Ting Chen
|
54f68287fc
|
fix precompiled multi_query_token kernel not having is_fp8_out hash key (#6279)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-07-25 20:45:53 -04:00 |
|
Michal Guzek
|
08d57123f9
|
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-25 18:10:40 -04:00 |
|
Iman Tabrizian
|
c35c78ff58
|
[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-25 12:47:01 -07:00 |
|
ameynaik-hub
|
1e5e71aa42
|
Mtp optimizations round1 (#5689)
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
|
2025-07-25 13:48:27 -04:00 |
|
Simeng Liu
|
7bff341553
|
[doc] Add NGram tech blog (#6311)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-07-25 10:26:33 -07:00 |
|
nv-guomingz
|
b8d4cb8beb
|
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
|
2025-07-25 12:55:56 -04:00 |
|