Kaiyu Xie
59deb8b06e
doc: Update CONTRIBUTING.md ( #3033 )
...
* Update CONTRIBUTING.md
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Update pre-commit example message
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
---------
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-25 08:06:23 +08:00
Enwei Zhu
705eef68c2
test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite ( #2982 )
...
* Accuracy test improvement (Part 2)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* WAR OOM
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
update
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-25 07:34:10 +08:00
nv-guomingz
dc0463b0e2
doc:add version.txt for internal cutlass library and nvrtc_wrapper so files ( #3030 )
...
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 23:44:21 +08:00
Pradeep Raj Prabhu Raj
5b4a5014d1
Fix: wrong path to constraints.txt in bloom/requirements.txt ( #3003 )
...
Signed-off-by: Pradeep Raj Prabhu Raj <pradeepraj18062002@gmail.com>
2025-03-24 23:03:40 +08:00
Netanel Haber
da0b0e0ee3
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size ( #2983 )
...
* fix variable window size reuse - disable when *min attention window* starts sliding, not max
* isPreCyclic -> isCyclic, and invert logic, for clarity
* getDecoderState()
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-03-24 22:49:52 +08:00
Yan Chunwei
531b98ed62
feat: Add several pure python configs to LlmArgs ( #2997 )
...
* add SchedulerConfig
* add PeftCacheConfig
2025-03-24 16:16:17 +08:00
Yiteng Niu
cb11c10719
add ratelimit in workflow ( #3001 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-03-24 15:54:11 +08:00
QI JUN
832ea997f6
chore: Simplify quickstart of PyTorch flow ( #3000 )
...
* simplify quickstart of PyTorch flow
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-24 14:32:17 +08:00
nv-guomingz
ec4f43a0ab
test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… ( #2987 )
...
* test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases.
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* updatet test case per review comments
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
---------
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 14:18:06 +08:00
Michael Gschwind
08b45d1bb9
Update README.md ( #2862 )
...
fix various typos
Signed-off-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>
2025-03-24 13:46:09 +08:00
bhsueh_NV
7413cb555a
relax the limitation of setuptools ( #2992 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-24 13:36:10 +08:00
Oguz Vuruskaner
c3c5a07dca
Update setup.py ( #2876 )
...
update path for the script.
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
2025-03-24 13:10:53 +08:00
Laikh Tewari
456a850e66
Claim support for QwQ 32B ( #2877 )
...
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
2025-03-24 13:05:15 +08:00
Yiteng Niu
37644e22bc
update approver list ( #2994 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-03-24 12:51:27 +08:00
Enwei Zhu
c03d59817f
fix: LLM API logits processor example comments ( #2962 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-24 12:22:12 +08:00
juney-nvidia
a570578c7f
Update the CONTRIBUTING.md as the ramp-up for TensorRT-LLM github firstly ( #2980 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-03-23 19:58:16 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
tburt-nv
c2ac9e6269
update github workflow ( #2943 )
...
cherry-picks aa1c52f
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-03-18 22:20:46 -04:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
niukuo
aa1c52fa26
update github workflow
2025-03-17 23:11:07 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Yiteng Niu
c384d26736
migrate to l0-test.yml ( #2858 )
...
Signed-off-by: niukuo <6831097+niukuo@users.noreply.github.com>
2025-03-06 15:24:40 +08:00
Kaiyu Xie
225b77667c
Fix .gitmodules ( #2852 )
2025-03-04 22:34:09 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
tburt-nv
0bcfdca6aa
Use NVIDIA-gha runners to collect test results ( #2830 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2025-02-27 23:02:02 -05:00
Laikh Tewari
d2b7b64b25
Add R1 perf data to latest news page ( #2823 )
...
* Update README.md
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* add r1 perf chart to repo
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* Delete docs/source/blogs/media/r1-perf.jpeg
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* add file to correct media dir
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
* Update README.md with local img + remove old img
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
---------
Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>
2025-02-25 16:50:19 -08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
tburt-nv
5c794e3714
allow build command arguments ( #2808 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2025-02-21 10:38:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Denis Kayshev
d93a2dde84
Fix kwarg name ( #2691 )
2025-01-20 12:18:26 +08:00
Kaiyu Xie
0d0583a639
Update README.md ( #2668 )
2025-01-08 14:40:59 +08:00
Kaiyu Xie
be17881062
Update TensorRT-LLM ( #2582 )
2024-12-16 21:50:47 -08:00
Kaiyu Xie
b171e87956
Add issue triage workflows ( #2566 )
2024-12-11 09:27:40 -08:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM ( #2562 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
Kevin Chen
340a1b62fc
Add issue triage workflows ( #2498 )
...
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
2024-12-04 23:50:46 +08:00
石晓伟
548b5b7310
Update TensorRT-LLM ( #2532 )
...
* blossom-ci.yml: run vulnerability scan on blossom
* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec
---------
Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00
Kyungmin Lee
4420547017
Fix typo ( #2473 )
2024-12-02 10:11:27 +08:00
niukuo
c994b69731
blossom-ci.yml: run vulnerability scan on ubuntu
2024-11-29 00:47:11 -08:00
niukuo
af3d49ce53
update blossom-ci.yml
2024-11-28 23:43:11 -08:00
niukuo
ae640fd376
Add blossom-ci.yml ( #2512 )
2024-11-29 15:01:26 +08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM ( #2502 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00
Kaiyu Xie
535c9cc673
Update TensorRT-LLM ( #2460 )
2024-11-19 18:30:34 +08:00
Kaiyu Xie
c629546ce4
Update TensorRT-LLM ( #2436 )
2024-11-12 15:27:49 +08:00
Kaiyu Xie
b7868dd1bd
Update TensorRT-LLM ( #2413 )
2024-11-05 16:27:06 +08:00
Kaiyu Xie
f6821ee393
Update the latest news ( #2391 )
2024-10-29 23:23:02 +08:00
Kaiyu Xie
f14d1d433c
Update TensorRT-LLM ( #2389 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
2024-10-29 22:24:38 +08:00
Laikh Tewari
3c46c2794e
Specify Llama 3.x information in example ( #2343 )
2024-10-25 16:10:57 +08:00
Kaiyu Xie
1730a587d8
Update TensorRT-LLM ( #2363 )
...
* Update TensorRT-LLM
---------
Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>
2024-10-22 20:27:35 +08:00