Commit Graph

171 Commits

Author SHA1 Message Date
Shunkangz
8ee840159b
Add updateKVCacheTransfer (#2984)
Add kv cache transfer measurement
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-03-25 21:45:35 +08:00
Chuang Zhu
110c6fc0f0
wait long time for disagg test (#2998)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-03-25 20:52:38 +08:00
Yuan Tong
53adb3cb4e
test: waive flaky test_kv_cache_event_async_api (#3062)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-03-25 18:41:30 +08:00
Xiaowei Wang
d9acce72bb
doc: Update DeepSeekV3 doc (#3052)
* Update DeepGEMM and flashMLA related content

* Add single-node command for deepgemm

* Fix spelling

---------

Signed-off-by: xiaoweiw-nv <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-03-25 18:17:26 +08:00
Perkz Zheng
e9df23f815
fix: [MLA] fix the bug with fp8 MLA kernels on Blackwell. (#3008)
* update cubins
* update error message

---------

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-03-25 18:03:29 +08:00
bhsueh_NV
5724c61934
chore: fix bug of model paths in confset.py (#3011)
* fix bugs of model paths of models in examples/models/contrib/

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix bug of code layout

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix bug of test_multimodal.py

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* add gptj_example_root back

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 17:00:44 +08:00
xiweny
aacb8d66f4
doc: document running CI stage locally (#3060)
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-03-25 16:18:17 +08:00
QI JUN
a8ec1cc4ea
remove examples/test_gptj.py::test_llm_gptj_fp8_manage_weights_summary test case (#3057)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-25 15:41:27 +08:00
Yan Chunwei
69feafc947
fix: amend the test list (#3056)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 14:17:36 +08:00
WeiHaocheng
7ac04ada2a
doc: Add README.md for scaffolding (#3048)
* Add README.md for scaffolding

Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>

* Update tensorrt_llm/scaffolding/README.md

Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>

---------

Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>
Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
2025-03-25 13:58:01 +08:00
bhsueh_NV
ed84f8f923
fix bug of test_phi (#3050)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 13:12:06 +08:00
Aurelien Chartier
ef78518310
Only gather responses on rank 0 (#3040)
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-24 21:54:51 -07:00
Aurelien Chartier
a33c595c88
Fix logits dtype in assert (#3038)
Remove extra methods in trtGptModelInflightBatching.h. The methods were moved out of that class during a previous refactoring, but the definitions have been left behind.

Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-25 10:35:21 +08:00
Zhanrui Sun
c2ffce7dbd
chore: bump version to "0.19.0.dev2025032500" (#3019)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-03-25 10:04:17 +08:00
Yan Chunwei
c29cebf79d
Deprecate model_api examples (#2999)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 09:37:20 +08:00
bhsueh_NV
11f9ecb2fd
chore: remove useless param (#3023)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 08:36:45 +08:00
Kaiyu Xie
59deb8b06e
doc: Update CONTRIBUTING.md (#3033)
* Update CONTRIBUTING.md

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* Update pre-commit example message

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

---------

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-25 08:06:23 +08:00
Enwei Zhu
705eef68c2
test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite (#2982)
* Accuracy test improvement (Part 2)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* WAR OOM

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

update

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-25 07:34:10 +08:00
nv-guomingz
dc0463b0e2
doc:add version.txt for internal cutlass library and nvrtc_wrapper so files (#3030)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 23:44:21 +08:00
Pradeep Raj Prabhu Raj
5b4a5014d1
Fix: wrong path to constraints.txt in bloom/requirements.txt (#3003)
Signed-off-by: Pradeep Raj Prabhu Raj <pradeepraj18062002@gmail.com>
2025-03-24 23:03:40 +08:00
Netanel Haber
da0b0e0ee3
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983)
* fix variable window size reuse - disable when *min attention window* starts sliding, not max

* isPreCyclic -> isCyclic, and invert logic, for clarity

* getDecoderState()

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-03-24 22:49:52 +08:00
Yan Chunwei
531b98ed62
feat: Add several pure python configs to LlmArgs (#2997)
* add SchedulerConfig
* add PeftCacheConfig
2025-03-24 16:16:17 +08:00
Yiteng Niu
cb11c10719
add ratelimit in workflow (#3001)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-03-24 15:54:11 +08:00
QI JUN
832ea997f6
chore: Simplify quickstart of PyTorch flow (#3000)
* simplify quickstart of PyTorch flow

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-24 14:32:17 +08:00
nv-guomingz
ec4f43a0ab
test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… (#2987)
* test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases.

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

* updatet test case per review comments

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

---------

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 14:18:06 +08:00
Michael Gschwind
08b45d1bb9
Update README.md (#2862)
fix various typos

Signed-off-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>
2025-03-24 13:46:09 +08:00
bhsueh_NV
7413cb555a
relax the limitation of setuptools (#2992)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-24 13:36:10 +08:00
Oguz Vuruskaner
c3c5a07dca
Update setup.py (#2876)
update path for the script.

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
2025-03-24 13:10:53 +08:00
Laikh Tewari
456a850e66
Claim support for QwQ 32B (#2877)
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
2025-03-24 13:05:15 +08:00
Yiteng Niu
37644e22bc
update approver list (#2994)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-03-24 12:51:27 +08:00
Enwei Zhu
c03d59817f
fix: LLM API logits processor example comments (#2962)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-24 12:22:12 +08:00
juney-nvidia
a570578c7f
Update the CONTRIBUTING.md as the ramp-up for TensorRT-LLM github firstly (#2980)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-03-23 19:58:16 +08:00
Kaiyu Xie
2631f21089
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
tburt-nv
c2ac9e6269
update github workflow (#2943)
cherry-picks aa1c52f

Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-03-18 22:20:46 -04:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
niukuo
aa1c52fa26 update github workflow 2025-03-17 23:11:07 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Yiteng Niu
c384d26736
migrate to l0-test.yml (#2858)
Signed-off-by: niukuo <6831097+niukuo@users.noreply.github.com>
2025-03-06 15:24:40 +08:00
Kaiyu Xie
225b77667c
Fix .gitmodules (#2852) 2025-03-04 22:34:09 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
tburt-nv
0bcfdca6aa
Use NVIDIA-gha runners to collect test results (#2830)
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2025-02-27 23:02:02 -05:00
Laikh Tewari
d2b7b64b25
Add R1 perf data to latest news page (#2823)
* Update README.md

Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>

* add r1 perf chart to repo

Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>

* Delete docs/source/blogs/media/r1-perf.jpeg

Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>

* add file to correct media dir

Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>

* Update README.md with local img + remove old img

Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>

---------

Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
2025-02-25 16:50:19 -08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
tburt-nv
5c794e3714
allow build command arguments (#2808)
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2025-02-21 10:38:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM

---------

Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00
Denis Kayshev
d93a2dde84
Fix kwarg name (#2691) 2025-01-20 12:18:26 +08:00
Kaiyu Xie
0d0583a639
Update README.md (#2668) 2025-01-08 14:40:59 +08:00
Kaiyu Xie
be17881062
Update TensorRT-LLM (#2582) 2024-12-16 21:50:47 -08:00