Commit Graph

121 Commits

Author SHA1 Message Date
Li Min
1e82ff7a0c
[TRTLLM-9989][fix] Fix tvm_ffi aaarch64 issue. (#10199)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-23 10:20:40 +08:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
Zhanrui Sun
b757ea73ba
[TRTLLM-9641][infra] Use public triton 3.5.0 in SBSA (#9652)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-15 18:58:59 -08:00
bhsueh_NV
e49c70f6df
[None][feat] Support Mistral Large3 LLM part (#9820)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-13 11:44:27 +08:00
Li Min
a422d70be6
[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (#9690)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-08 13:28:11 +08:00
Li Min
1797e91dfd
[TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. (#9543)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-01 10:19:36 +08:00
Enwei Zhu
1bf2d750a2
[None][chore] Upgrade CuteDSL to 4.3.0 (#9444)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-26 14:53:09 +08:00
tburt-nv
f8dd52621d
[None][chore] Upgrade starlette and FastAPI (#9319)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-11-20 11:21:14 -08:00
Enwei Zhu
7c4777a571
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-18 17:40:12 -08:00
Zhanrui Sun
bdcf837784
[TRTLLM-9079][infra] upgrade tritonserver DLFW 25.10 (#8929)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-14 20:22:10 -08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
Zhanrui Sun
4de31bece2
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 18:59:34 +08:00
Yanchao Lu
da73410d3b
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package (#8857)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-02 09:57:37 +08:00
Pengyun Lin
2aade46d18
[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-29 15:48:29 +08:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
yuanjingx87
d90b4c57cc
[None][infra] Pin numexpr in requirements.txt (#8343)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-13 21:09:08 -07:00
Yilin Fan
1a9044949f
[None][fix] Fix bench_serving import error (#8296)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-12 22:46:31 -07:00
Pengbo Wang
7da4b05289
[https://nvbugs/5501820][fix] Add requirements for numba-cuda version to WAR mem corruption (#7992)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-10-10 10:18:27 +08:00
QI JUN
e10121345e
[None][ci] pin flashinfer-python version (#8217)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-09 02:48:49 -07:00
Bo Deng
e107749a69
[None][fix] fix patchelf version issue (#8112)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-01 16:39:22 -04:00
Yanchao Lu
7e2521a7f0
[None][chore] Some clean-ups for CUDA 13.0 dependencies (#7979)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-26 08:46:11 +08:00
PeganovAnton
396c0ea677
[None][chore] relax version constraints on fastapi (#7935)
Signed-off-by: Anton Peganov <apeganov@nvidia.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-25 21:58:53 +08:00
Li Min
0252cee4c3
[None][chore] Recover cutlass-dsl pkg install and dsl op testing. (#7945)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-24 15:45:18 +08:00
Enwei Zhu
8330d5363a
[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 09:10:09 +08:00
Wanli Jiang
2a30f11d63
[None][chore] Upgrade transformers to 4.56.0 (#7523)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-22 22:20:16 +08:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
Li Min
14e455da3e
[None][fix] Fix CI issue for dsl pkg install (#7784)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-09-18 13:58:20 +08:00
Li Min
b278d06481
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-16 14:25:26 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Fridah-nv
f03053b4dd
[None][fix] update accelerate dependency to 1.7+ for AutoDeploy (#7077)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-20 19:52:37 -07:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation (#5785)
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Yiqing Yan
46357e7869
[None][package] Pin cuda-python version to >=12,<13 (#6702)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-07 10:01:04 -04:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Pengbo Wang @ NVIDIA
2e90b0b550
[None][fix] Explicitly add tiktoken as required by kimi k2 (#6663) 2025-08-07 09:47:45 +08:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
Zongfei Jing
0ff8df95b7
[https://nvbugs/5433581][fix] DeepGEMM installation on SBSA (#6588)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-06 16:44:21 +08:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Yanchao Lu
f39d621c3b
[None][infra] Pin the version for triton to 3.3.1 (#6508) (#6519) (#6549)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-01 07:33:24 -04:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell (#6486)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
Emma Qiao
baece56758
[None][infra] Pin the version for triton to 3.3.1 (#6508)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-31 19:25:15 +08:00
Enwei Zhu
4b299cb77e
feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 (#6408)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-31 09:53:52 +08:00
Lucas Liebenwein
41fb8aa8b1
[AutoDeploy] merge feat/ad-2025-07-07 (#6196)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou  <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-07-23 05:11:04 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
nv-guomingz
509dc7c831
chroe: upgrade modelopt to 0.33 (#6058)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-16 13:10:48 +08:00
Wanli Jiang
3f7cedec7c
Update transformers to 4.53.0 (#5747)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-09 09:32:24 -07:00
Wanli Jiang
e1fb1de4d9
feat: TRTLLM-6224 update xgrammar version to 0.1.19 (#5830)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-09 09:59:14 +08:00
Yanchao Lu
d95ae1378b
[Infra] - Always use x86 image for the Jenkins agent and few clean-ups (#5753)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 10:25:57 +08:00