Yiqing Yan
907bc22fcb
[None][chore] Bump version to 1.1.0rc2 ( #7167 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-22 22:02:28 +08:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Zhenhuan Chen
20f54cb272
[None][fix] fix scaffolding dynasor test ( #7070 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-08-20 15:20:46 +08:00
zhhuang-nv
7e135d2ea7
[None][feat] Use Separate QKV Input Layout for Context MLA ( #6538 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-08-19 22:04:48 +08:00
Xianjie Qiao
19667304b5
[None] [chore] Update wide-ep genonly scripts ( #6995 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 07:44:07 -04:00
Kaiyu Xie
9a74ee9dae
[None] [doc] Add more documents for large scale EP ( #7029 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 19:04:39 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
Yiqing Yan
ec3d9f8052
[None][chore] Bump version to 1.1.0rc1 ( #6953 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-16 10:32:47 +08:00
Xianjie Qiao
c2fe8b03a2
[ https://nvbugs/5405041 ][fix] Update wide-ep doc ( #6933 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-08-15 05:32:32 -04:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context ( #6929 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix ( #6908 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
qianbiao
5c2f0fd03d
[None] [feat] Add Tencent HunYuanMoEV1 model support ( #5521 )
...
Signed-off-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-15 06:56:44 +08:00
Matthias Jouanneaux
69574ad730
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types ( #6816 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-08-14 09:00:02 -07:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands ( #6323 )
2025-08-14 03:48:57 -04:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Aurelien Chartier
2e0081b53e
[ #6530 ][fix] Fix script when using calibration tensors from modelopt ( #6803 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-12 20:41:10 -07:00
Kaiyu Xie
47806f09d9
feat: Support custom repo_dir for SLURM script ( #6546 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
2025-08-12 22:06:59 -04:00
rakib-hasan
2923eb88a1
[None][fix] Refactoring input prep to allow out-of-tree models ( #6497 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-12 20:29:10 -04:00
nvchenghaoz
81f0ded1c4
[None][feat] Add GPT OSS support for AutoDeploy ( #6641 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-08-12 14:03:22 -04:00
Chang Liu
be9dd4713c
[ https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version ( #6673 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
Liao Lanyu
a2e9153cb0
[None][doc] Add K2 tool calling examples ( #6667 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-11 16:25:41 +08:00
Yibin Li
97787883c3
[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick changes and minor fix ( #6493 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-08 21:40:48 -04:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Guoming Zhang
0223de0727
[None][doc] Add deployment guide section for VDR task ( #6669 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-07 10:30:47 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Yiqing Yan
5fa1914cab
[None][chore] Bump version to 1.1.0rc0 ( #6651 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-07 13:39:49 +08:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental ( #5995 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability ( #6506 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
chenfeiz0326
a16ba6445c
[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 ( #6550 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-06 22:15:24 +08:00
Yuxian Qiu
3a71ddfe09
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. ( #6579 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-06 22:13:54 +08:00
Pengyun Lin
79fc2f48c0
[None][chore] Enhance trtllm-serve example test ( #6604 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-06 20:30:35 +08:00
jiahanc
3170039e36
[None][doc] Add llama4 hybrid guide ( #6640 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-08-06 01:25:38 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization ( #6061 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Olya Kozlova
13cc1c4878
[TRTLLM-5271][feat] best_of/n for pytorch workflow ( #5997 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-08-04 14:08:06 +02:00
Yiqing Yan
3916dbd98b
[None][chore] Bump version to 1.0.0rc6 ( #6597 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-04 04:39:15 -04:00
Pengyun Lin
a15e33351d
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens ( #6259 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-04 15:09:51 +08:00
Kaiyu Xie
aee35e2dbd
chore: Make example SLURM scripts more parameterized ( #6511 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-01 12:53:15 +08:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell ( #6486 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control ( #6220 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI ( #6477 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. ( #6419 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
nv-guomingz
a5540acfce
chore: add trtllm-serve json schema example into doc. ( #6418 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 04:33:08 -04:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts ( #6345 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… ( #6428 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
Kaiyu Xie
e58afa510e
doc: Add README for wide EP ( #6356 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-29 00:36:12 -04:00
Liana Koleva
96d004d800
doc: fix invalid link in llama 4 example documentation ( #6340 )
...
Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
2025-07-26 11:27:10 -04:00
Michal Guzek
08d57123f9
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache ( #5974 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-25 18:10:40 -04:00
Simeng Liu
7bff341553
[doc] Add NGram tech blog ( #6311 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-25 10:26:33 -07:00