TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Ming Wei	ca6615d800	Remove gen_cuda_headers_for_xqa.py (#3222 ) No longer needed.	2025-04-03 07:13:22 +08:00
Chuang Zhu	f5bf74bc7f	enable some disagg test (#3203 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-03 06:10:48 +08:00
Anurag Mukkara	d998339855	Raise error for PP + MTP (#3244 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-04-03 04:45:31 +08:00
Lucas Liebenwein	5fc2f63fec	infra: Devcontainer productivity improvements (#3075 ) * devcontainer improvements Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * docker compose path Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * updated some more devcontainer settings Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * clean devcontainer name Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-04-03 02:23:38 +08:00
QI JUN	abcb0486dc	fix deepseek failure with pipeline parallelism (#3225 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-02 22:56:39 +08:00
Robin Kobus	b5bc0a9fcd	chore: Add output of first token to additional generation outputs (#3205 ) - Updated the first dimension of additional output tensors to match mMaxNewTokens. - Copy output of last context token to generation outputs. - Adjusted the expected output size calculations in unit tests to reflect the correct maximum output length. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-02 20:14:16 +08:00
Zheng Duan	c9e94ec807	fix: remove test relies on timing (#3228 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-02 18:38:37 +08:00
WeiHaocheng	228e453780	doc: add doc ahout developent on cloud or runpod (#3194 ) Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-02 18:10:56 +08:00
Enwei Zhu	3cf7066350	test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) (#3219 ) * remove test_llm_models_multi_gpu.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * qwen 2.5 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * upgrade Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:29:57 +08:00
Enwei Zhu	d3948cd9b2	fix: GPT-Next convert failure (#3220 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:14:39 +08:00
WeiHaocheng	e64c565750	doc: add a directory for scaffolding contributors (#3224 ) Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-02 16:08:00 +08:00
Zheng Duan	5a72945eec	fix: conditional disagg test name (#3161 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-02 15:34:30 +08:00
William Tambellini	dbc0496f37	fix: upgrade cmake minimum from 3.18 to 3.27 (#3208 ) Required to correctly support recent archs like 90a, ... Fix issue #3173 Signed-off-by: William Tambellini <wtambellini@sdl.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-04-02 15:14:36 +08:00
Julien Debache	76a6a62073	fix: segfault in cudaDriverWrapper (#3017 ) * fix segmentation fault in cudaDriverWrapper Signed-off-by: jdebache <jdebache@nvidia.com> * replace cuGetErrorMessage with cuGetErrorString and added tests Signed-off-by: jdebache <jdebache@nvidia.com> --------- Signed-off-by: jdebache <jdebache@nvidia.com>	2025-04-02 08:55:19 +02:00
Zongfei Jing	8d48b96545	reduce test cases for deepseek (#3211 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-02 13:57:55 +08:00
wili	34e63d07e6	feat: Variable-Beam-Width-Search (VBWS) Part2 (#3133 ) * feat: Variable-Beam-Width-Search Part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part2, fix CPP tests Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part3, simplify CPP tests Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part4, move beam_width_array param Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search, fix CI error Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2, fix pre-commit Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2, fix review Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>	2025-04-02 12:31:28 +08:00
Gabriel Wu	05b50b297f	[feat] open source fp8_blockscale_gemm (#3071 ) Signed-off-by: Zihua Wu <zihuaw@nvidia.com>	2025-04-02 12:12:52 +08:00
Yiqing Yan	c19b7f7c2a	waive L0 test (#3217 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-02 11:16:22 +08:00
QI JUN	bb10cdcfb8	chore: refine fetch new requests method (#3213 ) * refine broadcast new requests method Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * refine fetch new requests method Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-02 10:46:00 +08:00
Zhanrui Sun	c5199c0b3d	infra: Support get file change for github PR (#3098 ) * Support get file change for github PR Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix reviews Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Check if only pytorch related file changed Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix global var Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Use globalVars for global values Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-02 10:35:33 +08:00
Zheng Duan	35b828ca2d	fix streaming in dist-serving (#3087 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-02 10:08:07 +08:00
Chuang Zhu	bc5811da65	chore: Ucx ip port remove mpi depend (#3101 ) * initial ucx support Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixes to support dynloading and ucx connection establishment - not stable yet Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * update Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * more connection bringup fixes - faillig on connection vector build Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * executor test pass Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * update Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * passed full benchmark Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * changing to TLLM_THROW and removing cout Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * stoping progress thread at ucxComm destructor Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fix copyrights Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * adding ucx flavor to cache transceiver test and insertto the CI pipeline Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * allowing sending non ib interfaces IPs Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * setting UCX port reuse for the tests in pipeline Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * code review fixes Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * querying ep after GID message is sent to avoid UCX Errors Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * fixing more CR issues Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * querying ep to not fail is ep_not_connected yet Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> * remove mpi dependency and debug Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * debug to info Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * mpirun n 2 Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * remove mpi comm split when disaggOrchestrator mode Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * waive disagg_mtp test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use future instead of thread Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use future_promise instead of cv wait Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * connectionId type Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * improve test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * imporve test 2 Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * gtest_skip Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>	2025-04-02 09:42:29 +08:00
Zongfei Jing	c7548ad72c	perf: Add optimizations for deepseek in min latency mode (#3093 ) * Add optimizations for deepseek min latency Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Fix compile error Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Update internal cutlass kernel libs Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Format code Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Resolve conflicts Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-02 09:05:24 +08:00
brb-nv	1fe3e30356	Add support for Phi-4-mini (#2990 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-02 08:34:39 +08:00
Zhanrui Sun	42963baacd	chore: bump version to 0.19.0.dev2025040800 (#3171 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-02 08:21:55 +08:00
QI JUN	8fe2e5865e	refine broadcast new requests method (#3198 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-02 08:05:20 +08:00
Fridah-nv	a5f32f46fd	fix: [AutoDeploy] Update README.md (#3072 ) * update support matrix and add toggle list Signed-off-by: fridah <201670829+Fridah-nv@users.noreply.github.com> * Update README.md Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> * Update README.md Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: fridah <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-04-01 16:16:36 -07:00
Chang Liu	1d3a5d38af	fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions (#3144 ) * Update fp8 sf layout for blackwell and enable fp8 gemm e2e * Add test case when m needs to be padded * Better comment Signed-off-by: Chang Liu <liuc@nvidia.com> * Add TODO for fp8 quant kernel Signed-off-by: Chang Liu <liuc@nvidia.com> * Enable DCO check Signed-off-by: Chang Liu <liuc@nvidia.com> * Fix lint --------- Signed-off-by: Chang Liu <liuc@nvidia.com>	2025-04-01 13:08:29 -07:00
Robin Kobus	d880f4a7c6	chore: Cursor ignore cubin in headers (#3202 ) Add `*cubin.h` to ignore-file. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-01 23:42:19 +08:00
Enwei Zhu	b2f69db507	test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of `trtllm-eval` (#3167 ) * add eval_llmapi Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> tmp commit port to CLI tool Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> setup llmapi Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix spec_dec_algo Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> _update_from_hf_quant_config Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> migrate test_pytorch.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix fp8 block scales Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> fix fp8 rowwise Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> adj alpha Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move test_pytorch.py cases Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> move Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> rename test_accuracy.py to test_cli.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix cnn_dailymail Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * renaming to cli flow Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * rename MMLU Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * rename Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add error Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-01 22:20:29 +08:00
amirkl94	bf02b9144f	feature: Add LoRA support for gemma (#3068 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-04-01 19:15:55 +08:00
Robin Kobus	d7386d14a8	refactor: Simplify disableLookahead and improve numDecodingEngineTokens handling (#3103 ) * refactor: Simplifiy disableLookahead method Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * Update DecoderBuffers comments Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Move numDecodingEngineTokens to DecoderState This commit introduces new methods in the DecoderState class to manage the number of tokens for each request in a batch. The following changes were made: - Added `getNumDecodingEngineTokens()` to retrieve the number of tokens for all requests. - Added `getNumDecodingEngineTokens(SizeType32 batchIdx)` to get the token count for a specific request. - Added `setNumDecodingEngineTokens(SizeType32 batchIdx, SizeType32 numTokens)` to set the token count for a specific request. - Updated the setup method to initialize the token count vector based on the maximum batch size. - Refactored the `CreateNewDecoderRequests` class to utilize the new token management methods, improving clarity and maintainability. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Improve shape variables in DecoderState Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-01 18:47:31 +08:00
WeiHaocheng	ff35af77ea	feat: refactor scaffolding worker and support openai api worker (#3166 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-01 18:31:52 +08:00
bhsueh_NV	d34202273b	fix bug of glm-4-9b ci (#3184 ) bug nvbug_5196515 Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-01 16:58:42 +08:00
Yiteng Niu	c725f1043f	update user list (#3193 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-04-01 16:41:15 +08:00
Jinyang Yuan	992d513bc6	feat: Optionally split MoE inputs into chunks to reduce GPU memory usage (#3104 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Co-authored-by: raccoonliukai <raccoonliu@tencent.com>	2025-04-01 16:07:02 +08:00
brb-nv	727d78e785	Support prequantized fp8 ckpt for nemotron-mini-4b-instruct (#3046 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-01 14:52:09 +08:00
Yan Chunwei	7575dd00e7	add slurm script examples for llm-api (#3135 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-01 14:31:57 +08:00
Yuan Tong	2994527110	chore: cutlass cleanup (#3165 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-04-01 13:57:38 +08:00
dongjiyingdjy	22ff81b047	fix：fix illeagel memory access when mtp >= 2 (#3006 ) * fix - fix illeagel memory access when mtp > 2 --------- Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-04-01 13:36:45 +08:00
QI JUN	75495730bc	Revert "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183 ) This reverts commit `3ee4332fb1`. Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-01 12:49:27 +08:00
Shunkangz	dda7354d1a	Refactor return of first gen token in PD (#2986 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-01 12:28:27 +08:00
brb-nv	1901bfcf76	test: Add Eagle tests with untrained heads (#2991 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-01 11:41:59 +08:00
jiahanc	c4ee14e43a	fix: Reverse cuda graph size order (#3116 ) Signed-off-by: jiahanc <jiahanc@nvidia.com>	2025-04-01 11:28:36 +08:00
Erin	68bcd0ac07	doc: update README (#3162 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-04-01 10:37:06 +08:00
Aurelien Chartier	14e194433c	chore: cleanup py_executor code (#3132 ) * chore: cleanup py_executor code * Add common loop cleanup function * Remove checks for attention DP if nothing to queue * Remove extra return statements * Remove extra variables * Remove commented debug print Signed-off-by: Aurelien Chartier <achartier@nvidia.com> * rename cleanup function Signed-off-by: Aurelien Chartier <achartier@nvidia.com> --------- Signed-off-by: Aurelien Chartier <achartier@nvidia.com>	2025-04-01 09:27:04 +08:00
Anurag Mukkara	435cd2983d	perf: Optimisations for PP + attention DP (#3134 ) * Minor tp_rank fix Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Delete unused function Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * PP broadcast for ADP new requests Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Sync request finish point for intermediate and last pp ranks Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> * Use local PP layers only for KV cache estimation Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> --------- Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-04-01 08:59:16 +08:00
Frank	8bb3eea285	perf: Readd iteration logging for trtllm-bench. (#3039 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-04-01 08:13:09 +08:00
Iman Tabrizian	e8731ba3b7	fix: disable cuda graph and MTP for overlap tests (#3155 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-03-31 11:35:35 -07:00
WeiHaocheng	f665f83256	feat: improve scaffolding shutdown process (#3084 )	2025-03-31 20:39:20 +08:00

... 2 3 4 5 6 ...

417 Commits