Anthony Chang
|
852e5060aa
|
[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json (#8617)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-31 04:41:44 -07:00 |
|
Anish Shanbhag
|
a09b38a862
|
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-28 09:17:26 -07:00 |
|
Anish Shanbhag
|
15de45d782
|
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-22 20:53:08 -04:00 |
|
Yan Chunwei
|
f81caf5491
|
[None][chore] replace print_colored_debug with logger_debug (#8417)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-10-22 17:54:38 +08:00 |
|
Guoming Zhang
|
202bed4574
|
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Pengbo Wang
|
5792464d37
|
[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-09-23 10:47:03 +08:00 |
|
Yuxian Qiu
|
2d46dda6a7
|
[https://nvbugs/5448754][fix] Download HF model for all nodes. (#6824)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Tian Zheng
|
e257cb3533
|
[None][feat] Support NVFP4 KV Cache (#6244)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
|
2025-09-01 09:24:52 +08:00 |
|
hlu1
|
8207d5fd39
|
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-07 03:04:18 -04:00 |
|
Aurelien Chartier
|
812243bdd6
|
feat: add support for Modelopt fp8_pb_wo quantization scheme (#6106)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
|
2025-07-18 10:35:12 +08:00 |
|
Yan Chunwei
|
a02606a9e2
|
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:42:59 +08:00 |
|
Yan Chunwei
|
7568deb2f1
|
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig (#6001)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:05:38 +08:00 |
|
wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Robin Kobus
|
30a19fcf7c
|
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-07 16:30:43 +02:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Lucas Liebenwein
|
619709fc33
|
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-29 03:52:14 +08:00 |
|
Izzy Putterman
|
e607768e45
|
Speculation: Draft Target in new FW (#4558)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-06-17 02:26:08 +08:00 |
|
Lucas Liebenwein
|
743fb0a159
|
[AutoDeploy] _AutoDeployLlmArgs as primary config object (#4891)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-05 17:20:55 +08:00 |
|
pcastonguay
|
d7d455e7ea
|
[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243)
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-05-22 22:01:06 -04:00 |
|
Yan Chunwei
|
4798d088d9
|
chore: Partition LlmArgs into TorchLlmArgs and TrtLlmArgs (#3823)
* partition LlmArgs
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* update backend
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-05-22 09:40:56 +08:00 |
|
Thor Johnsen
|
5d438be59a
|
[TRTLLM-5000][feat] Pytorch implementation of ngram drafter (#3936)
* v1.5
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
v1.5.4 Add back draft_overhead to spec dec stats
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* v1.5.5: fix CI error
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.6: fix CI error 8196 > 8192
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* Address reviewer concerns
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* Address reviewer concerns
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* precommit run
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* v2.0: Address reviewer concerns
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v2.1: add fix from wili
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* Revert changes that require use of TypeAlias because that requires python version >= 3.10
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
---------
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-05-21 10:40:00 +08:00 |
|
Yan Chunwei
|
ad4226d946
|
fix: trtllm-bench build trt engine on slurm (#3825)
* add submit_sync to RemoteMpiSessionClient
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
add barrier
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
fix comment
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
disable test
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-04-27 22:26:23 +08:00 |
|
Enwei Zhu
|
44da0e8d60
|
fix: LLM API _hf_model_dir for non-cached case (#3562)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-04-16 10:39:34 +08:00 |
|
Enwei Zhu
|
cf9ceea890
|
test: Add DeepSeek-V3-Lite PP=4 cases (#3454)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-04-12 00:09:12 +08:00 |
|
Yan Chunwei
|
b21cfcfed1
|
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025)
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-04-05 13:31:48 +08:00 |
|
Enwei Zhu
|
b2f69db507
|
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval (#3167)
* add eval_llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
tmp commit
port to CLI tool
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
setup llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix spec_dec_algo
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
_update_from_hf_quant_config
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
migrate test_pytorch.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 block scales
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 rowwise
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
adj alpha
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move test_pytorch.py cases
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
rename test_accuracy.py to test_cli.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix cnn_dailymail
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* renaming to cli flow
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename MMLU
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add error
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-04-01 22:20:29 +08:00 |
|
Kaiyu Xie
|
2631f21089
|
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-03-23 16:39:35 +08:00 |
|
Kaiyu Xie
|
9b931c0f63
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
Kaiyu Xie
|
77d7fe1eb2
|
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
|
2025-03-04 18:44:00 +08:00 |
|
Kaiyu Xie
|
ab5b19e027
|
Update TensorRT-LLM (#2820)
|
2025-02-25 21:21:49 +08:00 |
|
Kaiyu Xie
|
2ea17cdad2
|
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
|
2025-02-18 21:27:39 +08:00 |
|
Kaiyu Xie
|
e88da961c5
|
Update TensorRT-LLM (#2783)
|
2025-02-13 18:40:22 +08:00 |
|
Dan Blanaru
|
16d2467ea8
|
Update TensorRT-LLM (#2755)
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
|
2025-02-11 03:01:00 +00:00 |
|
Denis Kayshev
|
d93a2dde84
|
Fix kwarg name (#2691)
|
2025-01-20 12:18:26 +08:00 |
|
Kaiyu Xie
|
be17881062
|
Update TensorRT-LLM (#2582)
|
2024-12-16 21:50:47 -08:00 |
|
Kaiyu Xie
|
aaacc9bd68
|
Update TensorRT-LLM (#2562)
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
|
2024-12-11 00:31:05 -08:00 |
|
石晓伟
|
548b5b7310
|
Update TensorRT-LLM (#2532)
* blossom-ci.yml: run vulnerability scan on blossom
* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec
---------
Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2024-12-04 21:16:56 +08:00 |
|
Kaiyu Xie
|
385626572d
|
Update TensorRT-LLM (#2502)
* Update TensorRT-LLM
---------
Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
|
2024-11-26 16:51:34 +08:00 |
|
Kaiyu Xie
|
535c9cc673
|
Update TensorRT-LLM (#2460)
|
2024-11-19 18:30:34 +08:00 |
|
Kaiyu Xie
|
c629546ce4
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
Kaiyu Xie
|
b7868dd1bd
|
Update TensorRT-LLM (#2413)
|
2024-11-05 16:27:06 +08:00 |
|
Kaiyu Xie
|
f14d1d433c
|
Update TensorRT-LLM (#2389)
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
|
2024-10-29 22:24:38 +08:00 |
|
Kaiyu Xie
|
1730a587d8
|
Update TensorRT-LLM (#2363)
* Update TensorRT-LLM
---------
Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>
|
2024-10-22 20:27:35 +08:00 |
|
Kaiyu Xie
|
75057cd036
|
Update TensorRT-LLM (#2333)
* Update TensorRT-LLM
---------
Co-authored-by: Puneesh Khanna <puneesh.khanna@tii.ae>
Co-authored-by: Ethan Zhang <26497102+ethnzhng@users.noreply.github.com>
|
2024-10-15 15:28:40 +08:00 |
|