TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Kaiyu Xie	385a01055c	doc: Add serving section for DS V3 document (#3262 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-03 21:57:48 +08:00
Zhanrui Sun	bd75ec02f2	Fix bot check error when triggered by pull request (#3268 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-03 21:47:05 +08:00
Fanrong Li	11624a8e96	fix deepseek-v3 mtp doc. (#3272 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>	2025-04-03 21:12:17 +08:00
Yechan Kim	c7533d271f	doc: add supported-models on PyTorch example (#3179 ) * doc: add supported-models on PyTorch example Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove vision support from Llama3.2 Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>	2025-04-03 21:09:25 +08:00
Yukun He	d138795485	Fix minor issues in test_autotuner.py and loose the cache check for test gemms. (#3261 ) This test can cause nondeterministic failures on CI with unexpected kernel profiling results. Given longer delay time or cache clear will not solve the issue. Thus, loose the test checks to avoid these false alarms. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-04-03 18:24:08 +08:00
Zhanrui Sun	67e9f99d46	infra: [TRTLLM-4308] Add Bot help (#3192 ) * Add bot command help and check bot command Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix permission error Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix add comment Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Update bot-command.yml Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> * Update .github/workflows/bot-command.yml Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix pre-commit Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-03 17:48:25 +08:00
Zhanrui Sun	587a36db96	infra: [TRTLLM-4370] Fix the build error when build GH200 image (#3229 ) * infra: Fix the build error when build GH200 image Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * remove and update checkoutSource function Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-03 17:33:50 +08:00
xinhe-nv	2005e5aaaf	remove tests from qa test lists (#3256 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-03 16:06:39 +08:00
xiweny	174a5af779	doc: refine integration test guide (#3215 ) * doc: refine integration test guide Signed-off-by: Xiwen Yu <xiweny@nvidia.com>	2025-04-03 15:36:13 +08:00
Fanrong Li	1fe64b90be	fix: fix the acceptance rate of pytorch workflow in trtllm-bench (#3240 ) * fix acceptance rate of pytorch workflow. * revert the RequestOutput API change. --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-04-03 15:12:24 +08:00
Frank	2d80db4c36	chore: Remove build config from Pytorch kwargs. (#3210 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-04-03 15:00:29 +08:00
Zhanrui Sun	7f03125098	test: [TRTLLM-3994] Support only run pytorch tests (#3013 ) * [TRTLLM-3994] Support only run pytorch tests Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Move perf test to TensorRT backend Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-03 13:46:09 +08:00
Zongfei Jing	dcc0ebd273	Fix warning (#3254 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-03 13:30:23 +08:00
pcastonguay	b5b83009ff	chore: Reenabling get_stats_async test which seems to have been fixed by recent commit (#3246 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-04-02 20:57:31 -07:00
Jinyang Yuan	2fdfa39ea8	fix: Fix an error related to dummy request when MTP is used (#3146 )	2025-04-03 11:08:12 +08:00
QI JUN	664f428476	set test timeout threshold to 5400 second (#3249 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-03 10:14:00 +08:00
Ming Wei	ca6615d800	Remove gen_cuda_headers_for_xqa.py (#3222 ) No longer needed.	2025-04-03 07:13:22 +08:00
Chuang Zhu	f5bf74bc7f	enable some disagg test (#3203 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-03 06:10:48 +08:00
Anurag Mukkara	d998339855	Raise error for PP + MTP (#3244 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-04-03 04:45:31 +08:00
Lucas Liebenwein	5fc2f63fec	infra: Devcontainer productivity improvements (#3075 ) * devcontainer improvements Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * docker compose path Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * updated some more devcontainer settings Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * clean devcontainer name Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-04-03 02:23:38 +08:00
QI JUN	abcb0486dc	fix deepseek failure with pipeline parallelism (#3225 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-02 22:56:39 +08:00
Robin Kobus	b5bc0a9fcd	chore: Add output of first token to additional generation outputs (#3205 ) - Updated the first dimension of additional output tensors to match mMaxNewTokens. - Copy output of last context token to generation outputs. - Adjusted the expected output size calculations in unit tests to reflect the correct maximum output length. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-02 20:14:16 +08:00
Zheng Duan	c9e94ec807	fix: remove test relies on timing (#3228 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-02 18:38:37 +08:00
WeiHaocheng	228e453780	doc: add doc ahout developent on cloud or runpod (#3194 ) Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-02 18:10:56 +08:00
Enwei Zhu	3cf7066350	test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) (#3219 ) * remove test_llm_models_multi_gpu.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * qwen 2.5 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * upgrade Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:29:57 +08:00
Enwei Zhu	d3948cd9b2	fix: GPT-Next convert failure (#3220 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:14:39 +08:00
WeiHaocheng	e64c565750	doc: add a directory for scaffolding contributors (#3224 ) Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-02 16:08:00 +08:00
Zheng Duan	5a72945eec	fix: conditional disagg test name (#3161 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-02 15:34:30 +08:00
William Tambellini	dbc0496f37	fix: upgrade cmake minimum from 3.18 to 3.27 (#3208 ) Required to correctly support recent archs like 90a, ... Fix issue #3173 Signed-off-by: William Tambellini <wtambellini@sdl.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-04-02 15:14:36 +08:00
Julien Debache	76a6a62073	fix: segfault in cudaDriverWrapper (#3017 ) * fix segmentation fault in cudaDriverWrapper Signed-off-by: jdebache <jdebache@nvidia.com> * replace cuGetErrorMessage with cuGetErrorString and added tests Signed-off-by: jdebache <jdebache@nvidia.com> --------- Signed-off-by: jdebache <jdebache@nvidia.com>	2025-04-02 08:55:19 +02:00
Zongfei Jing	8d48b96545	reduce test cases for deepseek (#3211 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-02 13:57:55 +08:00
wili	34e63d07e6	feat: Variable-Beam-Width-Search (VBWS) Part2 (#3133 ) * feat: Variable-Beam-Width-Search Part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part2, fix CPP tests Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part3, simplify CPP tests Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part4, move beam_width_array param Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search, fix CI error Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2, fix pre-commit Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2, fix review Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>	2025-04-02 12:31:28 +08:00
Gabriel Wu	05b50b297f	[feat] open source fp8_blockscale_gemm (#3071 ) Signed-off-by: Zihua Wu <zihuaw@nvidia.com>	2025-04-02 12:12:52 +08:00
Yiqing Yan	c19b7f7c2a	waive L0 test (#3217 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-02 11:16:22 +08:00
QI JUN	bb10cdcfb8	chore: refine fetch new requests method (#3213 ) * refine broadcast new requests method Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * refine fetch new requests method Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-02 10:46:00 +08:00
Zhanrui Sun	c5199c0b3d	infra: Support get file change for github PR (#3098 ) * Support get file change for github PR Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix reviews Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Check if only pytorch related file changed Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix global var Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Use globalVars for global values Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-02 10:35:33 +08:00
Zheng Duan	35b828ca2d	fix streaming in dist-serving (#3087 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-02 10:08:07 +08:00
Chuang Zhu	bc5811da65	chore: Ucx ip port remove mpi depend (#3101 ) * initial ucx support Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixes to support dynloading and ucx connection establishment - not stable yet Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * update Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * more connection bringup fixes - faillig on connection vector build Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * executor test pass Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * update Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * passed full benchmark Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * changing to TLLM_THROW and removing cout Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * stoping progress thread at ucxComm destructor Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fix copyrights Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * adding ucx flavor to cache transceiver test and insertto the CI pipeline Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * allowing sending non ib interfaces IPs Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * setting UCX port reuse for the tests in pipeline Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * code review fixes Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * querying ep after GID message is sent to avoid UCX Errors Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixing more CR issues Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * querying ep to not fail is ep_not_connected yet Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * remove mpi dependency and debug Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * debug to info Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * mpirun n 2 Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * remove mpi comm split when disaggOrchestrator mode Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * waive disagg_mtp test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use future instead of thread Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use future_promise instead of cv wait Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * connectionId type Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * improve test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * imporve test 2 Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * gtest_skip Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>	2025-04-02 09:42:29 +08:00
Zongfei Jing	c7548ad72c	perf: Add optimizations for deepseek in min latency mode (#3093 ) * Add optimizations for deepseek min latency Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Fix compile error Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Update internal cutlass kernel libs Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Format code Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Resolve conflicts Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-02 09:05:24 +08:00
brb-nv	1fe3e30356	Add support for Phi-4-mini (#2990 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-02 08:34:39 +08:00
Zhanrui Sun	42963baacd	chore: bump version to 0.19.0.dev2025040800 (#3171 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-02 08:21:55 +08:00
QI JUN	8fe2e5865e	refine broadcast new requests method (#3198 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-02 08:05:20 +08:00
Fridah-nv	a5f32f46fd	fix: [AutoDeploy] Update README.md (#3072 ) * update support matrix and add toggle list Signed-off-by: fridah <201670829+Fridah-nv@users.noreply.github.com> * Update README.md Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> * Update README.md Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: fridah <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-04-01 16:16:36 -07:00
Chang Liu	1d3a5d38af	fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions (#3144 ) * Update fp8 sf layout for blackwell and enable fp8 gemm e2e * Add test case when m needs to be padded * Better comment Signed-off-by: Chang Liu <liuc@nvidia.com> * Add TODO for fp8 quant kernel Signed-off-by: Chang Liu <liuc@nvidia.com> * Enable DCO check Signed-off-by: Chang Liu <liuc@nvidia.com> * Fix lint --------- Signed-off-by: Chang Liu <liuc@nvidia.com>	2025-04-01 13:08:29 -07:00
Robin Kobus	d880f4a7c6	chore: Cursor ignore cubin in headers (#3202 ) Add `*cubin.h` to ignore-file. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-01 23:42:19 +08:00
Enwei Zhu	b2f69db507	test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of `trtllm-eval` (#3167 ) * add eval_llmapi Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> tmp commit port to CLI tool Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> setup llmapi Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix spec_dec_algo Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> _update_from_hf_quant_config Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> migrate test_pytorch.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix fp8 block scales Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix fp8 rowwise Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> adj alpha Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move test_pytorch.py cases Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> rename test_accuracy.py to test_cli.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix cnn_dailymail Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * renaming to cli flow Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * rename MMLU Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * rename Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add error Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-01 22:20:29 +08:00
amirkl94	bf02b9144f	feature: Add LoRA support for gemma (#3068 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-04-01 19:15:55 +08:00
Robin Kobus	d7386d14a8	refactor: Simplify disableLookahead and improve numDecodingEngineTokens handling (#3103 ) * refactor: Simplifiy disableLookahead method Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * Update DecoderBuffers comments Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Move numDecodingEngineTokens to DecoderState This commit introduces new methods in the DecoderState class to manage the number of tokens for each request in a batch. The following changes were made: - Added `getNumDecodingEngineTokens()` to retrieve the number of tokens for all requests. - Added `getNumDecodingEngineTokens(SizeType32 batchIdx)` to get the token count for a specific request. - Added `setNumDecodingEngineTokens(SizeType32 batchIdx, SizeType32 numTokens)` to set the token count for a specific request. - Updated the setup method to initialize the token count vector based on the maximum batch size. - Refactored the `CreateNewDecoderRequests` class to utilize the new token management methods, improving clarity and maintainability. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Improve shape variables in DecoderState Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-01 18:47:31 +08:00
WeiHaocheng	ff35af77ea	feat: refactor scaffolding worker and support openai api worker (#3166 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-01 18:31:52 +08:00
bhsueh_NV	d34202273b	fix bug of glm-4-9b ci (#3184 ) bug nvbug_5196515 Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-01 16:58:42 +08:00

1 2 3 4 5 ...

283 Commits