TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Shunkangz	8ee840159b	Add updateKVCacheTransfer (#2984 ) Add kv cache transfer measurement Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-03-25 21:45:35 +08:00
Chuang Zhu	110c6fc0f0	wait long time for disagg test (#2998 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-03-25 20:52:38 +08:00
Yuan Tong	53adb3cb4e	test: waive flaky test_kv_cache_event_async_api (#3062 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-03-25 18:41:30 +08:00
Xiaowei Wang	d9acce72bb	doc: Update DeepSeekV3 doc (#3052 ) * Update DeepGEMM and flashMLA related content * Add single-node command for deepgemm * Fix spelling --------- Signed-off-by: xiaoweiw-nv <100599594+xiaoweiw-nv@users.noreply.github.com>	2025-03-25 18:17:26 +08:00
Perkz Zheng	e9df23f815	fix: [MLA] fix the bug with fp8 MLA kernels on Blackwell. (#3008 ) * update cubins * update error message --------- Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-03-25 18:03:29 +08:00
bhsueh_NV	5724c61934	chore: fix bug of model paths in confset.py (#3011 ) * fix bugs of model paths of models in examples/models/contrib/ Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug of code layout Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug of test_multimodal.py Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * add gptj_example_root back Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-03-25 17:00:44 +08:00
xiweny	aacb8d66f4	doc: document running CI stage locally (#3060 ) Signed-off-by: Xiwen Yu <xiweny@nvidia.com>	2025-03-25 16:18:17 +08:00
QI JUN	a8ec1cc4ea	remove examples/test_gptj.py::test_llm_gptj_fp8_manage_weights_summary test case (#3057 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-03-25 15:41:27 +08:00
Yan Chunwei	69feafc947	fix: amend the test list (#3056 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-03-25 14:17:36 +08:00
WeiHaocheng	7ac04ada2a	doc: Add README.md for scaffolding (#3048 ) * Add README.md for scaffolding Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com> * Update tensorrt_llm/scaffolding/README.md Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com> --------- Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com> Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>	2025-03-25 13:58:01 +08:00
bhsueh_NV	ed84f8f923	fix bug of test_phi (#3050 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-03-25 13:12:06 +08:00
Aurelien Chartier	ef78518310	Only gather responses on rank 0 (#3040 ) Signed-off-by: Aurelien Chartier <achartier@nvidia.com>	2025-03-24 21:54:51 -07:00
Aurelien Chartier	a33c595c88	Fix logits dtype in assert (#3038 ) Remove extra methods in trtGptModelInflightBatching.h. The methods were moved out of that class during a previous refactoring, but the definitions have been left behind. Signed-off-by: Aurelien Chartier <achartier@nvidia.com>	2025-03-25 10:35:21 +08:00
Zhanrui Sun	c2ffce7dbd	chore: bump version to "0.19.0.dev2025032500" (#3019 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-03-25 10:04:17 +08:00
Yan Chunwei	c29cebf79d	Deprecate model_api examples (#2999 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-03-25 09:37:20 +08:00
bhsueh_NV	11f9ecb2fd	chore: remove useless param (#3023 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-03-25 08:36:45 +08:00
Kaiyu Xie	59deb8b06e	doc: Update CONTRIBUTING.md (#3033 ) * Update CONTRIBUTING.md Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * Update pre-commit example message Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> --------- Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-03-25 08:06:23 +08:00
Enwei Zhu	705eef68c2	test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite (#2982 ) * Accuracy test improvement (Part 2) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * WAR OOM Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> update Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-03-25 07:34:10 +08:00
nv-guomingz	dc0463b0e2	doc:add version.txt for internal cutlass library and nvrtc_wrapper so files (#3030 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-03-24 23:44:21 +08:00
Pradeep Raj Prabhu Raj	5b4a5014d1	Fix: wrong path to constraints.txt in bloom/requirements.txt (#3003 ) Signed-off-by: Pradeep Raj Prabhu Raj <pradeepraj18062002@gmail.com>	2025-03-24 23:03:40 +08:00
Netanel Haber	da0b0e0ee3	fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983 ) * fix variable window size reuse - disable when min attention window starts sliding, not max * isPreCyclic -> isCyclic, and invert logic, for clarity * getDecoderState() Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-03-24 22:49:52 +08:00
Yan Chunwei	531b98ed62	feat: Add several pure python configs to LlmArgs (#2997 ) * add SchedulerConfig * add PeftCacheConfig	2025-03-24 16:16:17 +08:00
Yiteng Niu	cb11c10719	add ratelimit in workflow (#3001 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-03-24 15:54:11 +08:00
QI JUN	832ea997f6	chore: Simplify quickstart of PyTorch flow (#3000 ) * simplify quickstart of PyTorch flow Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-03-24 14:32:17 +08:00
nv-guomingz	ec4f43a0ab	test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… (#2987 ) * test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases. Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> * updatet test case per review comments Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> --------- Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-03-24 14:18:06 +08:00
Michael Gschwind	08b45d1bb9	Update README.md (#2862 ) fix various typos Signed-off-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>	2025-03-24 13:46:09 +08:00
bhsueh_NV	7413cb555a	relax the limitation of setuptools (#2992 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-03-24 13:36:10 +08:00
Oguz Vuruskaner	c3c5a07dca	Update setup.py (#2876 ) update path for the script. Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com> Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>	2025-03-24 13:10:53 +08:00
Laikh Tewari	456a850e66	Claim support for QwQ 32B (#2877 ) Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>	2025-03-24 13:05:15 +08:00
Yiteng Niu	37644e22bc	update approver list (#2994 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-03-24 12:51:27 +08:00
Enwei Zhu	c03d59817f	fix: LLM API logits processor example comments (#2962 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-03-24 12:22:12 +08:00
juney-nvidia	a570578c7f	Update the CONTRIBUTING.md as the ramp-up for TensorRT-LLM github firstly (#2980 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-03-23 19:58:16 +08:00
Kaiyu Xie	2631f21089	Update (#2978 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-03-23 16:39:35 +08:00
tburt-nv	c2ac9e6269	update github workflow (#2943 ) cherry-picks `aa1c52f` Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-03-18 22:20:46 -04:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00
niukuo	aa1c52fa26	update github workflow	2025-03-17 23:11:07 +08:00
Kaiyu Xie	9b931c0f63	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
Yiteng Niu	c384d26736	migrate to l0-test.yml (#2858 ) Signed-off-by: niukuo <6831097+niukuo@users.noreply.github.com>	2025-03-06 15:24:40 +08:00
Kaiyu Xie	225b77667c	Fix .gitmodules (#2852 )	2025-03-04 22:34:09 +08:00
Kaiyu Xie	77d7fe1eb2	Update TensorRT-LLM (#2849 ) * Update TensorRT-LLM --------- Co-authored-by: aotman <chenhangatm@gmail.com>	2025-03-04 18:44:00 +08:00
tburt-nv	0bcfdca6aa	Use NVIDIA-gha runners to collect test results (#2830 ) Signed-off-by: Tyler Burt <tburt@nvidia.com>	2025-02-27 23:02:02 -05:00
Laikh Tewari	d2b7b64b25	Add R1 perf data to latest news page (#2823 ) * Update README.md Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com> * add r1 perf chart to repo Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com> * Delete docs/source/blogs/media/r1-perf.jpeg Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com> * add file to correct media dir Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com> * Update README.md with local img + remove old img Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com> --------- Signed-off-by: Laikh Tewari <laikhtewari1@gmail.com>	2025-02-25 16:50:19 -08:00
Kaiyu Xie	ab5b19e027	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00
tburt-nv	5c794e3714	allow build command arguments (#2808 ) Signed-off-by: Tyler Burt <tburt@nvidia.com>	2025-02-21 10:38:49 +08:00
Kaiyu Xie	2ea17cdad2	Update TensorRT-LLM (#2792 ) * Update TensorRT-LLM --------- Co-authored-by: jlee <jungmoolee@clika.io>	2025-02-18 21:27:39 +08:00
Kaiyu Xie	e88da961c5	Update TensorRT-LLM (#2783 )	2025-02-13 18:40:22 +08:00
Dan Blanaru	16d2467ea8	Update TensorRT-LLM (#2755 ) * Update TensorRT-LLM --------- Co-authored-by: Denis Kayshev <topenkoff@gmail.com> Co-authored-by: akhoroshev <arthoroshev@gmail.com> Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com> Update	2025-02-11 03:01:00 +00:00
Denis Kayshev	d93a2dde84	Fix kwarg name (#2691 )	2025-01-20 12:18:26 +08:00
Kaiyu Xie	0d0583a639	Update README.md (#2668 )	2025-01-08 14:40:59 +08:00
Kaiyu Xie	be17881062	Update TensorRT-LLM (#2582 )	2024-12-16 21:50:47 -08:00

1 2 3 4

171 Commits