Commit Graph

104 Commits

Author SHA1 Message Date
Pengbo Wang
7da4b05289
[https://nvbugs/5501820][fix] Add requirements for numba-cuda version to WAR mem corruption (#7992)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-10-10 10:18:27 +08:00
QI JUN
e10121345e
[None][ci] pin flashinfer-python version (#8217)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-09 02:48:49 -07:00
Bo Deng
e107749a69
[None][fix] fix patchelf version issue (#8112)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-01 16:39:22 -04:00
Yanchao Lu
7e2521a7f0
[None][chore] Some clean-ups for CUDA 13.0 dependencies (#7979)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-26 08:46:11 +08:00
PeganovAnton
396c0ea677
[None][chore] relax version constraints on fastapi (#7935)
Signed-off-by: Anton Peganov <apeganov@nvidia.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-25 21:58:53 +08:00
Li Min
0252cee4c3
[None][chore] Recover cutlass-dsl pkg install and dsl op testing. (#7945)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-24 15:45:18 +08:00
Enwei Zhu
8330d5363a
[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 09:10:09 +08:00
Wanli Jiang
2a30f11d63
[None][chore] Upgrade transformers to 4.56.0 (#7523)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-22 22:20:16 +08:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
Li Min
14e455da3e
[None][fix] Fix CI issue for dsl pkg install (#7784)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-09-18 13:58:20 +08:00
Li Min
b278d06481
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-16 14:25:26 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Fridah-nv
f03053b4dd
[None][fix] update accelerate dependency to 1.7+ for AutoDeploy (#7077)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-20 19:52:37 -07:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation (#5785)
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Yiqing Yan
46357e7869
[None][package] Pin cuda-python version to >=12,<13 (#6702)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-07 10:01:04 -04:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Pengbo Wang @ NVIDIA
2e90b0b550
[None][fix] Explicitly add tiktoken as required by kimi k2 (#6663) 2025-08-07 09:47:45 +08:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
Zongfei Jing
0ff8df95b7
[https://nvbugs/5433581][fix] DeepGEMM installation on SBSA (#6588)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-06 16:44:21 +08:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Yanchao Lu
f39d621c3b
[None][infra] Pin the version for triton to 3.3.1 (#6508) (#6519) (#6549)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-01 07:33:24 -04:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell (#6486)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
Emma Qiao
baece56758
[None][infra] Pin the version for triton to 3.3.1 (#6508)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-31 19:25:15 +08:00
Enwei Zhu
4b299cb77e
feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 (#6408)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-31 09:53:52 +08:00
Lucas Liebenwein
41fb8aa8b1
[AutoDeploy] merge feat/ad-2025-07-07 (#6196)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou  <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-07-23 05:11:04 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
nv-guomingz
509dc7c831
chroe: upgrade modelopt to 0.33 (#6058)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-16 13:10:48 +08:00
Wanli Jiang
3f7cedec7c
Update transformers to 4.53.0 (#5747)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-09 09:32:24 -07:00
Wanli Jiang
e1fb1de4d9
feat: TRTLLM-6224 update xgrammar version to 0.1.19 (#5830)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-09 09:59:14 +08:00
Yanchao Lu
d95ae1378b
[Infra] - Always use x86 image for the Jenkins agent and few clean-ups (#5753)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 10:25:57 +08:00
Wanli Jiang
3789ba1d37 feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Lucas Liebenwein
619709fc33
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-29 03:52:14 +08:00
jellysnack
0623ffe3bc
feat: Add LLGuidance Support for PyTorch Backend (#5214)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-18 19:33:34 +08:00
Emma Qiao
ff32caf4d7
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 23:48:34 +08:00
Chang Liu
f70815c945
[TRTLLM-5007][feat] Add multimodal hashing support (image hashing) (#4145)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-06-10 01:59:56 +08:00
nv-guomingz
786e32d56f
chore:update modelopt to 0.31 (#5003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-08 15:55:33 +08:00
nv-guomingz
d8abb91dc8
chore:set the flashinfer to 0.2.5. (#5004)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-07 20:42:09 +08:00
Shunkangz
ae9a6cf24f
feat: Add integration of etcd (#3738)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Batsheva Black <bblack@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
2025-06-03 20:01:44 +08:00
Chuang Zhu
44cfd757b2
Agent interface impl for NIXL (#4125)
* agentConnection

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

recv

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

agentState

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

NIXL interfaces

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

update cmakelists

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

nixl improve

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

remove cppzmq

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

fix

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

transferAgent remove register

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

work for cache Test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

reduce sleep time

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

fix test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

intergarte

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

nixl env

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

fix rebase error

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

cpp test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

stash for send metaData

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

loadRemoteMD after fetchRemoteMD

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

workaround for mixed gen and context

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

test_env

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

avoid port conflict in test

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* format

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* use std::string

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* typo

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* fix transferAgentTest

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 09:09:41 +08:00
Martin Marciniszyn Mehringer
3485347584
doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and document (#4400)
* doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and documentation

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* WAR against https://github.com/advisories/GHSA-vqfr-h8mv-ghfj

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Fix default assignment for CUDA architectures in SBSA build

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Push new docker images

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Handle constraints.txt in setup.py

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

---------

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-19 23:45:01 -07:00
Yanchao Lu
504f4bf779
[Infra] - Update the upstream PyTorch dependency to 2.7.0 (#4235)
[Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-14 22:28:13 +08:00
Yiqing Yan
fda8b0277a
[Infra][TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 (#4049)
* [TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix review

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* update images

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* Update jenkins/L0_Test.groovy

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* update image name

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-13 14:59:12 +08:00
nv-guomingz
420048205f
chore:update modelopt to 0.29 (#4150)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-12 10:32:19 +08:00
Tracin
446f62bbab
chore: Deprecate evaltool (#4173)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-09 20:31:53 +08:00
Zhenhuan Chen
19da82d68f
fix(requirements): fix neither 'setup.py' nor 'pyproject.toml' found (#3906)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-04-28 18:35:19 +08:00
milesial
362a8272f8
feat: llama4 input processor (#3383)
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-25 16:47:14 -07:00
rakib-hasan
ff3b741045
feat: adding multimodal (only image for now) support in trtllm-bench (#3490)
* feat: adding multimodal (only image for now) support in trtllm-bench

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* fix: add  in load_dataset() calls to maintain the v2.19.2 behavior

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* re-adding prompt_token_ids and using that for prompt_len

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* updating the datasets version in examples as well

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* api changes are not needed

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* moving datasets requirement and removing a missed api change

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* addressing review comments

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* refactoring the quickstart example

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

---------

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-04-18 07:06:16 +08:00