Kaiyu Xie
aee35e2dbd
chore: Make example SLURM scripts more parameterized ( #6511 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-01 12:53:15 +08:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell ( #6486 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control ( #6220 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI ( #6477 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. ( #6419 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
nv-guomingz
a5540acfce
chore: add trtllm-serve json schema example into doc. ( #6418 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 04:33:08 -04:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts ( #6345 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… ( #6428 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
Kaiyu Xie
e58afa510e
doc: Add README for wide EP ( #6356 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-29 00:36:12 -04:00
Liana Koleva
96d004d800
doc: fix invalid link in llama 4 example documentation ( #6340 )
...
Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
2025-07-26 11:27:10 -04:00
Michal Guzek
08d57123f9
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache ( #5974 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-25 18:10:40 -04:00
Simeng Liu
7bff341553
[doc] Add NGram tech blog ( #6311 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-25 10:26:33 -07:00
nv-guomingz
31d3eff24b
doc: fix invalid links related with llm api example ( #6317 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-24 00:46:51 -04:00
Yechan Kim
83c3ed128b
chore: set default device to cpu on Multimodal models ( #5994 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-22 21:45:31 -07:00
Kaiyu Xie
f08286c679
doc: Refactor documents and examples of disaggregated serving and wide ep ( #6054 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-23 09:20:57 +08:00
Lucas Liebenwein
41fb8aa8b1
[AutoDeploy] merge feat/ad-2025-07-07 ( #6196 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-07-23 05:11:04 +08:00
Yiqing Yan
3e18ee5fe1
chore: bump version to 1.0.0rc5 ( #6252 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-22 16:24:28 +08:00
Mike Iovine
9645814bdf
[chore] Clean up quickstart_advanced.py ( #6021 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-21 15:00:59 -04:00
Linda
3efad2e58c
feat: nanobind bindings ( #6185 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
nv-guomingz
b4c7e8c9a5
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… ( #6150 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-21 10:49:29 +08:00
bhsueh_NV
2e14c8f443
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 ( #6065 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-20 10:25:25 +08:00
wili
82d3587bb8
[refactor] Unify name of NGram speculative decoding ( #5937 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-19 12:59:57 +08:00
Zhenhuan Chen
992b273045
[ https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e ( #6140 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-18 10:34:37 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler ( #6000 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support ( #5644 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Yiqing Yan
e51c541617
chore: Bump version to 1.0.0rc4 ( #6086 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-16 13:02:23 +08:00
Xiaodong (Vincent) Huang
0523f77b36
support TRTLLM_DEEP_EP_TOKEN_LIMIT to allow run deep-ep on memory-con… ( #5684 )
...
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
2025-07-15 18:34:21 +03:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
jiahanc
24dfd4cd0b
Doc: Update llama-3.3-70B guide ( #6028 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-15 11:37:26 +09:00
Yechan Kim
2320f12321
doc: update EXAONE 4.0 news ( #6034 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-15 10:26:51 +09:00
Yechan Kim
63139fdcff
feat: EXAONE4.0 support ( #5696 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-14 22:28:10 +09:00
Zhanrui Sun
3a0ef73414
infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check ( #5709 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-14 18:52:13 +09:00
Yan Chunwei
9c673e9707
[TRTLLM-6160] chore: add sampling examples for pytorch ( #5951 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 15:28:32 +09:00
Yan Chunwei
c30eead09f
[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch ( #5956 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 14:09:39 +08:00
Xianjie Qiao
c7ffadf692
Fix errors in wide-ep scripts ( #5992 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-07-14 14:07:27 +09:00
Shi Xiaowei
49359574c1
[TRTLLM-5673] Doc: ensure the disagg doc is up to date ( #5938 )
2025-07-11 17:39:05 +09:00
William Tambellini
fbb4cc7379
[TRTLLM-4770][feat] Enhance cpp executor cmake to listen to ENABLE_MU… ( #5104 )
...
...LTI_DEVICE
Signed-off-by: William Tambellini <wtambellini@sdl.com>
2025-07-11 10:59:44 +08:00
Iman Tabrizian
c32c9e2fad
doc: Add instructions for running gemma in disaggregated serving ( #5922 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-10 10:21:19 -07:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner ( #5876 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
Erin
e277766f0d
chores: merge examples for v1.0 doc ( #5736 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
jiahanc
c24eb67054
Doc: fix link in llama4 Maverick example ( #5864 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 11:09:58 +09:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide ( #5810 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
Yan Chunwei
e50d95c40d
chore [TRTLLM-6161]: add LLM speculative decoding example ( #5706 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-09 07:33:11 +08:00
Yiqing Yan
5203a0f6df
chore: bump version to 1.0.0rc3 ( #5819 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-08 16:04:40 +09:00