Shunkangz
8ee840159b
Add updateKVCacheTransfer ( #2984 )
...
Add kv cache transfer measurement
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-03-25 21:45:35 +08:00
Chuang Zhu
110c6fc0f0
wait long time for disagg test ( #2998 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-03-25 20:52:38 +08:00
Yuan Tong
53adb3cb4e
test: waive flaky test_kv_cache_event_async_api ( #3062 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-03-25 18:41:30 +08:00
Xiaowei Wang
d9acce72bb
doc: Update DeepSeekV3 doc ( #3052 )
...
* Update DeepGEMM and flashMLA related content
* Add single-node command for deepgemm
* Fix spelling
---------
Signed-off-by: xiaoweiw-nv <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-03-25 18:17:26 +08:00
Perkz Zheng
e9df23f815
fix: [MLA] fix the bug with fp8 MLA kernels on Blackwell. ( #3008 )
...
* update cubins
* update error message
---------
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-03-25 18:03:29 +08:00
bhsueh_NV
5724c61934
chore: fix bug of model paths in confset.py ( #3011 )
...
* fix bugs of model paths of models in examples/models/contrib/
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug of code layout
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug of test_multimodal.py
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* add gptj_example_root back
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 17:00:44 +08:00
xiweny
aacb8d66f4
doc: document running CI stage locally ( #3060 )
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-03-25 16:18:17 +08:00
QI JUN
a8ec1cc4ea
remove examples/test_gptj.py::test_llm_gptj_fp8_manage_weights_summary test case ( #3057 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-25 15:41:27 +08:00
Yan Chunwei
69feafc947
fix: amend the test list ( #3056 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 14:17:36 +08:00
WeiHaocheng
7ac04ada2a
doc: Add README.md for scaffolding ( #3048 )
...
* Add README.md for scaffolding
Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
* Update tensorrt_llm/scaffolding/README.md
Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>
---------
Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>
Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
2025-03-25 13:58:01 +08:00
bhsueh_NV
ed84f8f923
fix bug of test_phi ( #3050 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 13:12:06 +08:00
Aurelien Chartier
ef78518310
Only gather responses on rank 0 ( #3040 )
...
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-24 21:54:51 -07:00
Aurelien Chartier
a33c595c88
Fix logits dtype in assert ( #3038 )
...
Remove extra methods in trtGptModelInflightBatching.h. The methods were moved out of that class during a previous refactoring, but the definitions have been left behind.
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-25 10:35:21 +08:00
Zhanrui Sun
c2ffce7dbd
chore: bump version to "0.19.0.dev2025032500" ( #3019 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-03-25 10:04:17 +08:00
Yan Chunwei
c29cebf79d
Deprecate model_api examples ( #2999 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 09:37:20 +08:00
bhsueh_NV
11f9ecb2fd
chore: remove useless param ( #3023 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 08:36:45 +08:00
Kaiyu Xie
59deb8b06e
doc: Update CONTRIBUTING.md ( #3033 )
...
* Update CONTRIBUTING.md
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Update pre-commit example message
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
---------
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-25 08:06:23 +08:00
Enwei Zhu
705eef68c2
test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite ( #2982 )
...
* Accuracy test improvement (Part 2)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* WAR OOM
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
update
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-25 07:34:10 +08:00
nv-guomingz
dc0463b0e2
doc:add version.txt for internal cutlass library and nvrtc_wrapper so files ( #3030 )
...
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 23:44:21 +08:00
Pradeep Raj Prabhu Raj
5b4a5014d1
Fix: wrong path to constraints.txt in bloom/requirements.txt ( #3003 )
...
Signed-off-by: Pradeep Raj Prabhu Raj <pradeepraj18062002@gmail.com>
2025-03-24 23:03:40 +08:00
Netanel Haber
da0b0e0ee3
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size ( #2983 )
...
* fix variable window size reuse - disable when *min attention window* starts sliding, not max
* isPreCyclic -> isCyclic, and invert logic, for clarity
* getDecoderState()
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-03-24 22:49:52 +08:00
Yan Chunwei
531b98ed62
feat: Add several pure python configs to LlmArgs ( #2997 )
...
* add SchedulerConfig
* add PeftCacheConfig
2025-03-24 16:16:17 +08:00
Yiteng Niu
cb11c10719
add ratelimit in workflow ( #3001 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-03-24 15:54:11 +08:00
QI JUN
832ea997f6
chore: Simplify quickstart of PyTorch flow ( #3000 )
...
* simplify quickstart of PyTorch flow
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-24 14:32:17 +08:00
nv-guomingz
ec4f43a0ab
test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… ( #2987 )
...
* test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases.
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* updatet test case per review comments
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
---------
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 14:18:06 +08:00
Michael Gschwind
08b45d1bb9
Update README.md ( #2862 )
...
fix various typos
Signed-off-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>
2025-03-24 13:46:09 +08:00
bhsueh_NV
7413cb555a
relax the limitation of setuptools ( #2992 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-24 13:36:10 +08:00
Oguz Vuruskaner
c3c5a07dca
Update setup.py ( #2876 )
...
update path for the script.
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
2025-03-24 13:10:53 +08:00
Laikh Tewari
456a850e66
Claim support for QwQ 32B ( #2877 )
...
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
2025-03-24 13:05:15 +08:00
Yiteng Niu
37644e22bc
update approver list ( #2994 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-03-24 12:51:27 +08:00
Enwei Zhu
c03d59817f
fix: LLM API logits processor example comments ( #2962 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-24 12:22:12 +08:00
juney-nvidia
a570578c7f
Update the CONTRIBUTING.md as the ramp-up for TensorRT-LLM github firstly ( #2980 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-03-23 19:58:16 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
tburt-nv
c2ac9e6269
update github workflow ( #2943 )
...
cherry-picks aa1c52f
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-03-18 22:20:46 -04:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
niukuo
aa1c52fa26
update github workflow
2025-03-17 23:11:07 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Yiteng Niu
c384d26736
migrate to l0-test.yml ( #2858 )
...
Signed-off-by: niukuo <6831097+niukuo@users.noreply.github.com>
2025-03-06 15:24:40 +08:00
Kaiyu Xie
225b77667c
Fix .gitmodules ( #2852 )
2025-03-04 22:34:09 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
tburt-nv
0bcfdca6aa
Use NVIDIA-gha runners to collect test results ( #2830 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2025-02-27 23:02:02 -05:00
Laikh Tewari
d2b7b64b25
Add R1 perf data to latest news page ( #2823 )
...
* Update README.md
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* add r1 perf chart to repo
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* Delete docs/source/blogs/media/r1-perf.jpeg
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* add file to correct media dir
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* Update README.md with local img + remove old img
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
---------
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
2025-02-25 16:50:19 -08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
tburt-nv
5c794e3714
allow build command arguments ( #2808 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2025-02-21 10:38:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Denis Kayshev
d93a2dde84
Fix kwarg name ( #2691 )
2025-01-20 12:18:26 +08:00
Kaiyu Xie
0d0583a639
Update README.md ( #2668 )
2025-01-08 14:40:59 +08:00
Kaiyu Xie
be17881062
Update TensorRT-LLM ( #2582 )
2024-12-16 21:50:47 -08:00