Tao Li @ NVIDIA
458203d805
update fp8 doc ( #3647 )
...
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-04-17 13:16:07 +08:00
Zhanrui Sun
3471d6ccf0
chore: bump version to 0.19.0 ( #3598 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-16 12:15:19 +08:00
narutolhy
ccd73c71a5
feat: Add stream generation task scaffolding examples ( #3527 )
...
* stream generation task/controller
Signed-off-by: narutolhy <582909902@qq.com>
* edit README
Signed-off-by: narutolhy <582909902@qq.com>
* rename README
Signed-off-by: narutolhy <582909902@qq.com>
---------
Signed-off-by: narutolhy <582909902@qq.com>
2025-04-16 11:33:55 +08:00
Kaiyu Xie
f5f68ded26
Minor fixes for documents ( #3577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-16 07:47:18 +08:00
Pengyun Lin
1899e71364
doc: add genai-perf benchmark & slurm multi-node for trtllm-serve doc ( #3407 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-04-16 00:11:58 +08:00
nv-guomingz
39bdb1fe1c
docs:update llm api examples and customizations sections' links. ( #3566 )
...
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-15 13:55:22 +08:00
Bo Li
5eae397b3b
doc: Update instructions to enable FP8 MLA for Deepseek. ( #3488 )
...
* doc: Update doc to enable FP8 MLA for Deepseek.
Signed-off-by: Bo Li <bobboli0202@gmail.com>
* Update.
Signed-off-by: Bo Li <bobboli0202@gmail.com>
* Update.
Signed-off-by: Bo Li <bobboli0202@gmail.com>
* Update the status on Hopper and Blackwell.
Signed-off-by: Bo Li <bobboli0202@gmail.com>
* Update.
Signed-off-by: Bo Li <bobboli0202@gmail.com>
* Update table of contents.
Signed-off-by: Bo Li <bobboli0202@gmail.com>
---------
Signed-off-by: Bo Li <bobboli0202@gmail.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-04-15 13:12:33 +08:00
Zhanrui Sun
714ff3eedd
chore: bump version to 0.19.0rc0 ( #3535 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-14 18:11:20 +08:00
Zhanrui Sun
ee4ce0379d
chore: bump version to 0.19.0rc0 ( #3514 )
...
* chore: bump version to 0.19.0.rc0
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Update README
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-14 17:32:30 +08:00
Kaiyu Xie
f99be2726f
doc: Add example section for multi-node DeepSeek R1 benchmark on GB200 ( #3519 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-14 16:45:55 +08:00
brb-nv
44090a5388
Add support for Phi-4-MM ( #3296 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-14 14:24:10 +08:00
Yan Chunwei
b37c5c0a4d
make LLM-API slurm examples executable ( #3402 )
...
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-04-13 21:42:45 +08:00
QI JUN
d167cbd5bb
refactor: remove ParallelConfig in tensorrt_llm._torch.distributed module ( #3370 )
...
* remove tensorrt_llm._torch.distributed.ParallelConfig
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix embedding test
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix comments
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* polish
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* rebase
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-04-11 15:34:20 -07:00
Shunkangz
ea050084ad
feat: Add support of chat completion in PD ( #2985 )
...
* Add support of chat completion in PD
Add support of include_usage in PD
Reformat
* Remove redundant code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Add chat completion test
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-11 17:53:28 +08:00
amitz-nv
a6a2ae6cc1
chore: Rename nvsmall to nemotron nas ( #3447 )
...
* Rename nvsmall to nemotron NAS
* Revert nvsmall to nemotron_nas rename in paths in tests that access llm_models_root/nvsmall/tests
* Add NemotronNAS to pytorch supported models table
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-04-10 23:16:52 +08:00
wm2012011492
af05749e90
feat: add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa… ( #3369 )
...
* add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa_llmapi.py
Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
* fix coding style
Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
* add unittest
Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
---------
Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
Co-authored-by: mengw <12670782+wm2012011492@users.noreply.github.com>
2025-04-10 22:45:57 +08:00
Kefeng-Duan
67949f7c39
Update README and add benchmarking blog for DeepSeek-R1 ( #3232 )
...
- Added a new entry in the README for the published benchmarking best practices for DeepSeek-R1.
- Introduced a new blog post detailing performance benchmarking configurations and procedures for DeepSeek-R1 in TensorRT-LLM, including installation, dataset preparation, and benchmarking steps for both B200 and H200 GPUs.
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-04-10 17:00:49 +08:00
brb-nv
c59abae436
feat: Add Gemma3 text-only model support ( #3247 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-10 12:34:58 +08:00
Yechan Kim
943218b54a
feat: Add Qwen2.5-VL and refactor Qwen2-VL ( #3156 )
...
* feat: Add Qwen2.5-VL and refactor Qwen2-VL
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix yapf and codespell
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix test_e2e
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* generalize get_rope_index
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix qwen2.5-vl on REAME
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix image test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-10 04:09:03 +08:00
WeiHaocheng
6eee15900e
feat: Enhance the integrated robustness of scaffolding with __init__.py #3305 ( #3312 )
...
Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>
2025-04-09 21:13:47 +08:00
wili
6f1b2cdb83
Doc: update steps of using Draft-Target-Model (DTM) in the documents. ( #3366 )
...
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-09 17:35:01 +08:00
Mike Iovine
5bdf997963
Add Llama 4 ( #3302 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-04-09 03:35:21 +08:00
wili
54ad95eaa8
Feat: Variable-Beam-Width-Search (VBWS) part3 ( #3338 )
...
* feat/Variable-Beam-Width-Search-Part3, v1.0
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat/Variable-Beam-Width-Search-Part3, v1.1
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat/Variable-Beam-Width-Search-Part3, v1.2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-08 23:51:27 +08:00
sugunav14
84fc07b011
feat: [TRTLLM-3510] DeepseekV3 support in AutoDeploy ( #3281 )
...
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
2025-04-08 21:47:57 +08:00
Zhanrui Sun
63b0194c50
chore: bump version to 0.19.0.dev2025041500 ( #3360 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-08 20:45:27 +08:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. ( #3270 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
amirkl94
e04f6a1b9b
fix: Fix p-tuning test bug ( #3326 )
...
* fix: Fix p-tuning test bug
* A change in the vocab_size calculation for T5Tokenizer,
introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning.
In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added.
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-04-08 17:14:00 +08:00
Gabriel Wu
f1655afb0d
feat: enable DeepGEMM by default ( #3341 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-08 13:58:57 +08:00
Chuang Zhu
1c88af1378
feat: use cudaMalloc to allocate kvCache ( #3303 )
2025-04-08 10:59:14 +08:00
Chuang Zhu
f3237e52ed
update readme for disaggregated ( #3323 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-07 21:29:15 +08:00
Gabriel Wu
376731013d
feat: use NVRTC for DeepGEMM JIT compilation ( #3239 )
...
* feat: use NVRTC for DeepGEMM JIT compilation
Signed-off-by: Zihua Wu
* fix: add license
Signed-off-by: Zihua Wu
* feat: store NVRTC JIT results in memory by default
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* test: set timeout to 7200
Signed-off-by: Zihua Wu
---------
Signed-off-by: Zihua Wu
2025-04-07 20:29:23 +08:00
tburt-nv
7a659885e3
chore: remove usernames from comments ( #3291 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-04-05 13:44:28 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python ( #3025 )
...
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
Pengyun Lin
f25c7cefb4
doc: refactor trtllm-serve examples and doc ( #3187 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-04 11:40:43 +08:00
pcastonguay
b763051ba4
chore: Refactor disaggregated serving scripts ( #3073 )
...
* chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Updating README, removing launch script
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding scripts to populate urls section of disagg config based on SLURM env vars
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-03 14:55:05 -04:00
Kaiyu Xie
385a01055c
doc: Add serving section for DS V3 document ( #3262 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-03 21:57:48 +08:00
Fanrong Li
11624a8e96
fix deepseek-v3 mtp doc. ( #3272 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
2025-04-03 21:12:17 +08:00
Yechan Kim
c7533d271f
doc: add supported-models on PyTorch example ( #3179 )
...
* doc: add supported-models on PyTorch example
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove vision support from Llama3.2
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
2025-04-03 21:09:25 +08:00
Enwei Zhu
d3948cd9b2
fix: GPT-Next convert failure ( #3220 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-02 17:14:39 +08:00
WeiHaocheng
e64c565750
doc: add a directory for scaffolding contributors ( #3224 )
...
Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
2025-04-02 16:08:00 +08:00
brb-nv
1fe3e30356
Add support for Phi-4-mini ( #2990 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-02 08:34:39 +08:00
Zhanrui Sun
42963baacd
chore: bump version to 0.19.0.dev2025040800 ( #3171 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-02 08:21:55 +08:00
Fridah-nv
a5f32f46fd
fix: [AutoDeploy] Update README.md ( #3072 )
...
* update support matrix and add toggle list
Signed-off-by: fridah <201670829+Fridah-nv@users.noreply.github.com>
* Update README.md
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
* Update README.md
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
---------
Signed-off-by: fridah <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-04-01 16:16:36 -07:00
Enwei Zhu
b2f69db507
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval ( #3167 )
...
* add eval_llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
tmp commit
port to CLI tool
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
setup llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix spec_dec_algo
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
_update_from_hf_quant_config
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
migrate test_pytorch.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 block scales
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 rowwise
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
adj alpha
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move test_pytorch.py cases
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
rename test_accuracy.py to test_cli.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix cnn_dailymail
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* renaming to cli flow
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename MMLU
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add error
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-01 22:20:29 +08:00
WeiHaocheng
ff35af77ea
feat: refactor scaffolding worker and support openai api worker ( #3166 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
2025-04-01 18:31:52 +08:00
brb-nv
727d78e785
Support prequantized fp8 ckpt for nemotron-mini-4b-instruct ( #3046 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 14:52:09 +08:00
Yan Chunwei
7575dd00e7
add slurm script examples for llm-api ( #3135 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-01 14:31:57 +08:00
Zhanrui Sun
36ac5e78ed
chore: bump version to 0.19.0.dev2025040100 ( #3152 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-03-31 16:36:06 +08:00
bhsueh_NV
322ac565fc
chore: clean some ci of qa test ( #3083 )
...
* move some models to examples/models/contrib
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* update the document
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove arctic, blip2, cogvlm, dbrx from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove tests of dit, mmdit and stdit from qa test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove grok, jais, sdxl, skywork, smaug from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* re-organize the glm examples
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix issues after running pre-commit
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix some typo in glm_4_9b readme
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-31 14:30:41 +08:00
musvaage
88e1c90fd0
doc: use alert formatting ( #3153 )
...
Signed-off-by: musvaage <musvaage@users.noreply.github.com>
Co-authored-by: musvaage <musvaage@users.noreply.github.com>
2025-03-31 07:30:52 +08:00