ruodil
|
22f45a0e19
|
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-07 22:57:04 -04:00 |
|
xinhe-nv
|
88ced50ca7
|
[TRTQA-2920][fix] Add failed cases into waives.txt (#6719)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-08 12:54:13 +10:00 |
|
Daniel Cámpora
|
efca359b66
|
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-08-07 22:19:37 -04:00 |
|
Iman Tabrizian
|
82276167e6
|
[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-08-07 17:28:14 -07:00 |
|
Haohang Huang
|
980929e1a9
|
[https://nvbugs/5410687][fix] Hopper w4a8 groupwise MoE interleave (#6708)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
|
2025-08-07 15:30:16 -07:00 |
|
Yuan Tong
|
db8dc97b7b
|
[None][fix] Migrate to new cuda binding package name (#6700)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-08-07 16:29:55 -04:00 |
|
Andrew Chen
|
4ecda91ecc
|
[https://nvbugs/5423962][fix] Address broken links (#6531)
|
2025-08-07 16:00:05 -04:00 |
|
pcastonguay
|
3b2dd40d50
|
[None][chore] Remove py_executor from disagg gh team (#6716)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-08-07 13:29:01 -04:00 |
|
Mike Iovine
|
e968f98b43
|
[None][feat] Clean up ngram auto mode, add max_concurrency to configs (#6676)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-08-07 12:51:47 -04:00 |
|
Raayan Dhar
|
4055b764db
|
[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
|
2025-08-07 11:18:02 -04:00 |
|
Guoming Zhang
|
0223de0727
|
[None][doc] Add deployment guide section for VDR task (#6669)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-07 10:30:47 -04:00 |
|
Yiqing Yan
|
46357e7869
|
[None][package] Pin cuda-python version to >=12,<13 (#6702)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-07 10:01:04 -04:00 |
|
Emma Qiao
|
3c44b44e45
|
[None][infra] Fix guardwords (#6711)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-08-07 21:06:47 +08:00 |
|
pcastonguay
|
453a06e6ab
|
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events (#6563)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-08-07 14:17:07 +02:00 |
|
Enwei Zhu
|
1b9781e8e7
|
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-07 05:53:48 -04:00 |
|
shaharmor98
|
c23e8e7b05
|
[TRTLLM-6092][doc] Add LoRA feature usage doc (#6603)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-08-07 05:24:12 -04:00 |
|
peaceh-nv
|
8ec3b1de10
|
[None][feat] : Add FP8 context MLA support for SM120 (#6059)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-08-07 16:16:34 +08:00 |
|
xinhe-nv
|
0a467b00cc
|
[https://nvbugs/5409414][fix] fix Not registered specs (#6660)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-07 17:55:53 +10:00 |
|
hlu1
|
8207d5fd39
|
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-07 03:04:18 -04:00 |
|
ruodil
|
6c1f7d8b91
|
[None][test] correct test-db context for perf yaml file (#6686)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-07 02:47:10 -04:00 |
|
amitz-nv
|
85af62184b
|
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6510)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-08-07 09:05:36 +03:00 |
|
Yiqing Yan
|
5fa1914cab
|
[None][chore] Bump version to 1.1.0rc0 (#6651)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-08-07 13:39:49 +08:00 |
|
Chuang Zhu
|
ee471df07c
|
[None][chore] optimize kv cache transfer for context TEP and gen DEP (#6657)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-08-07 11:36:05 +08:00 |
|
Yiqing Yan
|
3e41e6c077
|
[TRTLLM-6892][infra] Run guardwords scan first in Release Check stage (#6659)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-06 23:00:15 -04:00 |
|
YueWeng
|
157ea77549
|
[https://nvbugs/5375966][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-08-07 10:25:17 +08:00 |
|
Guoming Zhang
|
f7f46a5017
|
doc: remove the outdated features which marked as Experimental (#5995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-06 22:01:42 -04:00 |
|
Pengbo Wang @ NVIDIA
|
2e90b0b550
|
[None][fix] Explicitly add tiktoken as required by kimi k2 (#6663)
|
2025-08-07 09:47:45 +08:00 |
|
ruodil
|
780d7507f9
|
[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-07 10:02:13 +10:00 |
|
ruodil
|
f30398470d
|
[None][chore] update readme for perf release test (#6664)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-07 10:00:45 +10:00 |
|
Yibin Li
|
2a946859a7
|
[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-08-06 14:21:03 -07:00 |
|
Izzy Putterman
|
7e0158b583
|
Qwen3: Fix eagle hidden states (#6199)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-08-06 17:05:18 -04:00 |
|
chenfeiz0326
|
a16ba6445c
|
[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 (#6550)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
|
2025-08-06 22:15:24 +08:00 |
|
Yuxian Qiu
|
3a71ddfe09
|
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. (#6579)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-08-06 22:13:54 +08:00 |
|
Yan Chunwei
|
5eae3184fa
|
[None][chore] add missing tests to test list (#6590)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-08-06 22:12:27 +08:00 |
|
Yechan Kim
|
1aed7511fe
|
[https://nvbugs/5430124][fix] Mistral mixture_text_image test case fix (#6648)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-06 06:58:58 -07:00 |
|
Iman Tabrizian
|
13ecb4aced
|
[https://nvbugs/5328160][fix] Unwaive disaggregated serving tests (#6644)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-08-06 09:08:29 -04:00 |
|
Pengyun Lin
|
79fc2f48c0
|
[None][chore] Enhance trtllm-serve example test (#6604)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-08-06 20:30:35 +08:00 |
|
Yanchao Lu
|
b7347ce7d1
|
[https://nvbugs/5433581][fix] Revert deep_gemm installation workaround for SBSA (#6666)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-06 18:50:53 +08:00 |
|
Yiqing Yan
|
98424f3186
|
[TRTLLM-5633][infra] Change the TOT repo to default-llm-repo for merge waive list (#6605)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-06 06:19:03 -04:00 |
|
Hanjun Cho
|
80f918cc22
|
[None][feat] Add Qwen3 MoE support to TensorRT backend (#6470)
Signed-off-by: gkswns0531 <gkswns0531@gmail.com>
Signed-off-by: hanjuncho <gkswns0531@gmail.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
|
2025-08-06 17:02:35 +08:00 |
|
Zongfei Jing
|
0ff8df95b7
|
[https://nvbugs/5433581][fix] DeepGEMM installation on SBSA (#6588)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-08-06 16:44:21 +08:00 |
|
ruodil
|
907c180eb2
|
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-06 02:25:57 -04:00 |
|
Iman Tabrizian
|
43bd861ce1
|
Update allreduce benchmark for torch (#6271)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-08-05 23:25:23 -07:00 |
|
Netanel Haber
|
83ee91e17b
|
[None][fix] Fix 6522 mpi.pkl5.intracomm.Request has wait not Wait (#6646)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-08-06 14:18:09 +08:00 |
|
Guoming Zhang
|
3036d49071
|
[None][doc] Unify the tech blogs naming. (#6649)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-06 01:45:40 -04:00 |
|
ruodil
|
0bd99b5d6d
|
[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-06 01:45:13 -04:00 |
|
jiahanc
|
3170039e36
|
[None][doc] Add llama4 hybrid guide (#6640)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-08-06 01:25:38 -04:00 |
|
juney-nvidia
|
da072277d1
|
[None][doc] Exposing the GPT OSS model support blog (#6647)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
|
2025-08-05 23:50:34 -04:00 |
|
JunyiXu-nv
|
13e0214fe0
|
[TRTLLM-6263][feat] Enable fp8 SwiGLU to minimize host overhead (#6540)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
|
2025-08-06 10:42:19 +08:00 |
|
brb-nv
|
9a01934dbf
|
[None][feat] Switch to internal version of MMProjector in Gemma3 (#6572)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-05 21:48:23 -04:00 |
|