Yibin Li
|
2a946859a7
|
[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-08-06 14:21:03 -07:00 |
|
Izzy Putterman
|
7e0158b583
|
Qwen3: Fix eagle hidden states (#6199)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-08-06 17:05:18 -04:00 |
|
chenfeiz0326
|
a16ba6445c
|
[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 (#6550)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
|
2025-08-06 22:15:24 +08:00 |
|
Yuxian Qiu
|
3a71ddfe09
|
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. (#6579)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-08-06 22:13:54 +08:00 |
|
Yan Chunwei
|
5eae3184fa
|
[None][chore] add missing tests to test list (#6590)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-08-06 22:12:27 +08:00 |
|
Yechan Kim
|
1aed7511fe
|
[https://nvbugs/5430124][fix] Mistral mixture_text_image test case fix (#6648)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-06 06:58:58 -07:00 |
|
Iman Tabrizian
|
13ecb4aced
|
[https://nvbugs/5328160][fix] Unwaive disaggregated serving tests (#6644)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-08-06 09:08:29 -04:00 |
|
Pengyun Lin
|
79fc2f48c0
|
[None][chore] Enhance trtllm-serve example test (#6604)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-08-06 20:30:35 +08:00 |
|
Yanchao Lu
|
b7347ce7d1
|
[https://nvbugs/5433581][fix] Revert deep_gemm installation workaround for SBSA (#6666)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-06 18:50:53 +08:00 |
|
Yiqing Yan
|
98424f3186
|
[TRTLLM-5633][infra] Change the TOT repo to default-llm-repo for merge waive list (#6605)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-06 06:19:03 -04:00 |
|
Hanjun Cho
|
80f918cc22
|
[None][feat] Add Qwen3 MoE support to TensorRT backend (#6470)
Signed-off-by: gkswns0531 <gkswns0531@gmail.com>
Signed-off-by: hanjuncho <gkswns0531@gmail.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
|
2025-08-06 17:02:35 +08:00 |
|
Zongfei Jing
|
0ff8df95b7
|
[https://nvbugs/5433581][fix] DeepGEMM installation on SBSA (#6588)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-08-06 16:44:21 +08:00 |
|
ruodil
|
907c180eb2
|
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-06 02:25:57 -04:00 |
|
Iman Tabrizian
|
43bd861ce1
|
Update allreduce benchmark for torch (#6271)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-08-05 23:25:23 -07:00 |
|
Netanel Haber
|
83ee91e17b
|
[None][fix] Fix 6522 mpi.pkl5.intracomm.Request has wait not Wait (#6646)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-08-06 14:18:09 +08:00 |
|
Guoming Zhang
|
3036d49071
|
[None][doc] Unify the tech blogs naming. (#6649)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-06 01:45:40 -04:00 |
|
ruodil
|
0bd99b5d6d
|
[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-06 01:45:13 -04:00 |
|
jiahanc
|
3170039e36
|
[None][doc] Add llama4 hybrid guide (#6640)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-08-06 01:25:38 -04:00 |
|
juney-nvidia
|
da072277d1
|
[None][doc] Exposing the GPT OSS model support blog (#6647)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
|
2025-08-05 23:50:34 -04:00 |
|
JunyiXu-nv
|
13e0214fe0
|
[TRTLLM-6263][feat] Enable fp8 SwiGLU to minimize host overhead (#6540)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
|
2025-08-06 10:42:19 +08:00 |
|
brb-nv
|
9a01934dbf
|
[None][feat] Switch to internal version of MMProjector in Gemma3 (#6572)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-05 21:48:23 -04:00 |
|
yunruis
|
3ff4f503ad
|
[None][opt] ADP schedule balance optimization (#6061)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-08-06 09:38:02 +08:00 |
|
Ransiki
|
19b7524ff6
|
[None][feat] Add vLLM KV Pool support for XQA kernel (#6013)
Signed-off-by: Ransiki Zhang <ransikiz@nvidia.com>
|
2025-08-06 09:29:37 +08:00 |
|
Yechan Kim
|
c17f4984e2
|
[None][feat] Refactor Llava-Next (#6478)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-05 17:53:53 -07:00 |
|
Venky
|
f92397493e
|
[TRTLLM-5500][infra] Update CODEOWNERS with new ownership rules for additional paths (#6564)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-08-05 15:54:24 -04:00 |
|
Aurelien Chartier
|
6da95f29a9
|
[None][feat] Add support for fused gate_up_proj scales for FP8 blockwise (#6496)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-08-05 11:22:32 -07:00 |
|
Wanli Jiang
|
46df8712c8
|
[https://nvbugs/5355007][fix] Set enable_chunked_context as True by default in trtllm bench (#6582)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-08-05 11:11:36 -07:00 |
|
ixlmar
|
1ebceb790d
|
[TRTLLM-5508][feat] check input tokens + improve error handling (#5170)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-08-05 18:27:43 +01:00 |
|
Farshad Ghodsian
|
6af1514dc3
|
[None][doc] Adding GPT-OSS Deployment Guide documentation (#6637)
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
|
2025-08-05 19:19:48 +02:00 |
|
liji-nv
|
dcbfa7e509
|
[https://nvbugs/5252313][fix] Fix torch compile + MTP (#6554)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-08-05 10:31:29 -04:00 |
|
Venky
|
61da2daeb4
|
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-08-05 07:14:24 -07:00 |
|
Zhanrui Sun
|
6a9b4b11be
|
[https://nvbugs/5433581][infra] Temporarily disable Docker Image use wheel from build stage (#6630)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-08-05 09:33:11 -04:00 |
|
Emma Qiao
|
78a75c2990
|
[None][Infra] - Split gb200 stages for each test (#6594)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-08-05 07:10:00 -04:00 |
|
xinhe-nv
|
c32584125e
|
[TRTQA-2920][fix] Add failed cases into waives.txt (#6600)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-05 20:12:55 +10:00 |
|
Pengbo Wang @ NVIDIA
|
c289880afb
|
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-08-05 18:05:33 +08:00 |
|
Ivy Zhang
|
08ed9d7305
|
[None][doc] add introduction doc on qa test (#6535)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-05 17:02:17 +08:00 |
|
Ivy Zhang
|
d101a6cebc
|
[https://nvbugs/5410279][test] resubmit timeout refactor (#6337)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-05 16:39:25 +08:00 |
|
Zhanrui Sun
|
7cbe30e17d
|
[TRTLLM-6893][infra] fix Build Docker Image tag issue (#6555)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-08-05 04:33:36 -04:00 |
|
amitz-nv
|
dc84695520
|
[TRTLLM-6826][feat] Allow sending more than 2GiB through MPI by using mpi4py.util.pkl5 (#6522)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-08-05 11:28:26 +03:00 |
|
danielafrimi
|
ed801ff74b
|
[None][fix] Remove expand configuration from mamba2 mixer (#6521)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-08-05 04:18:25 -04:00 |
|
Haohang Huang
|
c9eebcb454
|
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
|
2025-08-05 07:47:41 +00:00 |
|
Chuang Zhu
|
4d040b50b7
|
[None][chore] ucx establish connection with zmq (#6090)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-08-05 02:50:45 -04:00 |
|
Leslie Fang
|
164acfa31e
|
[None][infra] Skip test_eagle3 test with device memory check (#6617)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-05 02:36:03 -04:00 |
|
ruodil
|
7625845365
|
test: add README_release_test.md for perf test (#6443)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-05 02:07:42 -04:00 |
|
Guoming Zhang
|
db51ab11a9
|
[TRTLLM-5990][doc] trtllm-serve doc improvement. (#5220)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-08-05 13:04:01 +08:00 |
|
Yanchao Lu
|
d53cc2374b
|
[https://nvbugs/5433581][infra] Update install docs and CI script for SBSA deep_gemm workaround (#6607)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-08-04 23:36:38 -04:00 |
|
xinhe-nv
|
a178cea324
|
[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-05 12:47:53 +10:00 |
|
xinhe-nv
|
fe3d607c4b
|
[TRTQA-2920][fix] Add failed cases into waives.txt (#6581)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-05 12:41:23 +10:00 |
|
Enwei Zhu
|
899b74c357
|
[None][doc] Fix blog4 typo (#6612)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-05 10:20:37 +08:00 |
|
kris1025
|
6a3a921284
|
[TRTLLM-6685][feat] Add speculative metrics for trt llm bench (#6476)
Signed-off-by: linquanh <linquanh@nvidia.com>
|
2025-08-04 15:22:57 -07:00 |
|