Tailing Yuan
51ef0379d2
[None][feat] Add a parser to layer-wise benchmarks ( #9440 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-25 05:45:16 -08:00
Suyog Gupta
efd503751f
[ #9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy ( #9322 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-24 19:50:10 -08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve ( #9269 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
Yiqing Yan
8cd3b496e9
[None][chore] Bump version to 1.2.0rc4 ( #9363 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-21 18:28:12 +08:00
cheshirekow
1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile ( #8986 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
jiahanc
255e4ea9f0
[None][doc] Update DS-R1 example doc ( #9231 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-18 21:10:02 -08:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted ( #9155 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
Ajinkya Rasane
8d7cda2318
[None][chore] Update the Flux autodeploy example ( #8434 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-11-18 14:16:04 -08:00
Zero Zeng
43896af1b1
[None][chore] benchmark refactor ( #9207 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-17 23:29:28 -08:00
Stanley Sun
96cfdd8a72
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme ( #9173 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-17 22:02:24 -08:00
Zero Zeng
c6cce398f5
[TRTLLM-9053][feat] Support accuracy test and install from wheel ( #9038 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-13 23:34:47 -08:00
dongxuy04
84483a238a
[None][doc] update docs for EPLB ( #9166 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-13 22:24:29 -08:00
Fanrong Li
25bd2e6917
[None][doc] Add DeepSeek-V3.2-Exp document ( #9141 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-13 22:01:58 -08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module ( #8682 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Tailing Yuan
cc4c980e03
[None][feat] Add Qwen3-Next to layer-wise benchmarks ( #9065 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-14 10:03:00 +08:00
Timothy Gao
96132b4274
[None] [doc] Add Mixed Precision Context and Generation section to Disagg ( #8769 )
...
Signed-off-by: Timothy Gao <35588167+timothygao8710@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-11 23:46:12 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm ( #8840 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
Lucas Liebenwein
6bf4e59267
[ #8763 ][feature] AutoDeploy: configurable dtype for caching ( #8812 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-10 22:17:14 -08:00
jiahanc
de6088e363
[None][doc] update llama and llama4 example doc ( #9048 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-10 22:04:26 -08:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py ( #8938 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 ( #8943 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
c836ae5aaa
[None][chore] Bump version to 1.2.0rc3 ( #9004 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-07 01:24:32 -08:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely ( #8856 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
shuyixiong
c73efe12e7
[None][chore] Use cached model in all ray tests ( #8962 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-06 15:14:15 +01:00
Yi Sun
cc12d33393
[None][feat] Deep Research Implemented with Scaffolding ( #8452 )
...
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2025-11-06 10:33:28 +08:00
JadoTu
6bbb43f2b9
[None][feat] Add qwen3-next nvfp4 support ( #8526 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-06 09:45:44 +08:00
fredricz-20070104
fdd9e4fe00
[TRTLLM-7251][test] Get submit eplb slots empty key work ( #8945 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-05 05:21:02 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration ( #8302 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Anish Shanbhag
6a6317727b
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs ( #8173 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-11-03 17:42:41 -08:00
Kaiyu Xie
db2a42f641
[None][chore] Add sample yaml for wide-ep example and minor fixes ( #8825 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-03 07:48:34 -08:00
Cao Dong
2ff772ef71
[None][feat] Add benchmark to DeepConf ( #8776 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-03 16:05:50 +08:00
Robin Kobus
1b3ad7259d
[None][feat] Use ruff for formatting and linting new files by default ( #8629 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-01 16:11:40 +01:00
Tailing Yuan
ec31363a86
[None][fix] Layer wise benchmarks: use local models, lint ( #8799 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 09:47:46 -07:00
Tailing Yuan
f9c7786dc8
[None][feat] Add layer wise benchmarks ( #8777 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 20:29:34 +08:00
WeiHaocheng
cc286687c4
[None][feat] Refactor scaffolding streaming feature and fix openai wo… ( #8622 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-10-30 16:02:40 +08:00
Lizhi Zhou
24167d00eb
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests ( #8602 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-28 17:04:53 -07:00
Anish Shanbhag
a09b38a862
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum ( #8330 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-28 09:17:26 -07:00
Aurelien Chartier
0a02f5f25d
[None][chore] Use a cached model path for Ray integration test ( #8660 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-27 19:16:06 -07:00
gramnarayan
88b0fbc8ff
[ #8245 ][feat] Autodeploy: Guided Decoding Support ( #8551 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter ( #8127 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing ( #5897 )
...
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Robin Kobus
990b0c0c47
[TRTLLM-7159][docs] Add documentation for additional outputs ( #8325 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-27 09:52:04 +01:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache ( #8405 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Yechan Kim
2d86d6be40
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve ( #8528 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-24 12:53:40 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly ( #8561 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
Anish Shanbhag
15de45d782
[TRTLLM-8682][chore] Remove auto_parallel module ( #8329 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-22 20:53:08 -04:00
Patrice Castonguay
879039f6d5
[ https://nvbugs/5429636 ][feat] Kv transfer timeout ( #8459 )
...
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
2025-10-22 09:29:02 -04:00
Yiqing Yan
b04e51291a
[None][chore] Bump version to 1.2.0rc2 ( #8562 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-10-22 14:35:05 +08:00
Shi Xiaowei
50149ac2bd
[None][doc] Fix the incorrect doc figure ( #8536 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 10:08:55 +08:00
Zero Zeng
4545700fcf
[None][chore] Move submit.sh to python and use yaml configuration ( #8003 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-10-20 22:36:50 -04:00