TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
bhsueh_NV	c4d86b267c	chore: add pull request template (#3760 ) * add pull request template Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix pre-commit issue Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-04-23 10:21:31 +08:00
Yiteng Niu	ca88674210	update user list (#3614 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-04-16 15:13:29 +08:00
tburt-nv	5616c0d232	add precommit check to github actions (#3129 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-04-11 06:40:53 +08:00
tburt-nv	8d164f40d7	update allowlist (#3428 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-04-10 06:41:40 +08:00
tburt-nv	3a8443f1e1	extend allowlist (#3379 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-04-09 11:10:42 +08:00
Zhanrui Sun	c692474b59	infra: Fix bot help error when " in bot command (#3314 ) * Fix bot help error when " in bot command Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Delete a.txt Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-08 18:16:05 +08:00
Zhanrui Sun	bd75ec02f2	Fix bot check error when triggered by pull request (#3268 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-03 21:47:05 +08:00
Zhanrui Sun	67e9f99d46	infra: [TRTLLM-4308] Add Bot help (#3192 ) * Add bot command help and check bot command Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix permission error Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix add comment Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Update bot-command.yml Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> * Update .github/workflows/bot-command.yml Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix pre-commit Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-03 17:48:25 +08:00
Yiteng Niu	c725f1043f	update user list (#3193 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-04-01 16:41:15 +08:00
Yiteng Niu	3aae124a00	infra: update concurrency control (#3120 ) * update concurrency control Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> * Update .github/workflows/blossom-ci.yml Co-authored-by: tburt-nv <195370667+tburt-nv@users.noreply.github.com> Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> --------- Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> Co-authored-by: tburt-nv <195370667+tburt-nv@users.noreply.github.com>	2025-03-30 23:28:50 +08:00
tburt-nv	e68749ca1e	2025-03-25 update CI allowlist (#3074 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com> Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>	2025-03-26 08:13:01 +08:00
Yiteng Niu	cb11c10719	add ratelimit in workflow (#3001 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-03-24 15:54:11 +08:00
Yiteng Niu	37644e22bc	update approver list (#2994 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-03-24 12:51:27 +08:00
Kaiyu Xie	2631f21089	Update (#2978 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-03-23 16:39:35 +08:00
tburt-nv	c2ac9e6269	update github workflow (#2943 ) cherry-picks `aa1c52f` Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-03-18 22:20:46 -04:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00
niukuo	aa1c52fa26	update github workflow	2025-03-17 23:11:07 +08:00
Yiteng Niu	c384d26736	migrate to l0-test.yml (#2858 ) Signed-off-by: niukuo <6831097+niukuo@users.noreply.github.com>	2025-03-06 15:24:40 +08:00
Kaiyu Xie	77d7fe1eb2	Update TensorRT-LLM (#2849 ) * Update TensorRT-LLM --------- Co-authored-by: aotman <chenhangatm@gmail.com>	2025-03-04 18:44:00 +08:00
tburt-nv	0bcfdca6aa	Use NVIDIA-gha runners to collect test results (#2830 ) Signed-off-by: Tyler Burt <tburt@nvidia.com>	2025-02-27 23:02:02 -05:00
Kaiyu Xie	ab5b19e027	Update TensorRT-LLM (#2820 )	2025-02-25 21:21:49 +08:00
tburt-nv	5c794e3714	allow build command arguments (#2808 ) Signed-off-by: Tyler Burt <tburt@nvidia.com>	2025-02-21 10:38:49 +08:00
Dan Blanaru	16d2467ea8	Update TensorRT-LLM (#2755 ) * Update TensorRT-LLM --------- Co-authored-by: Denis Kayshev <topenkoff@gmail.com> Co-authored-by: akhoroshev <arthoroshev@gmail.com> Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com> Update	2025-02-11 03:01:00 +00:00
Kaiyu Xie	be17881062	Update TensorRT-LLM (#2582 )	2024-12-16 21:50:47 -08:00
Kaiyu Xie	b171e87956	Add issue triage workflows (#2566 )	2024-12-11 09:27:40 -08:00
Kaiyu Xie	aaacc9bd68	Update TensorRT-LLM (#2562 ) * Update TensorRT-LLM --------- Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>	2024-12-11 00:31:05 -08:00
Kevin Chen	340a1b62fc	Add issue triage workflows (#2498 ) Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2024-12-04 23:50:46 +08:00
niukuo	c994b69731	blossom-ci.yml: run vulnerability scan on ubuntu	2024-11-29 00:47:11 -08:00
niukuo	af3d49ce53	update blossom-ci.yml	2024-11-28 23:43:11 -08:00
niukuo	ae640fd376	Add blossom-ci.yml (#2512 )	2024-11-29 15:01:26 +08:00
Kaiyu Xie	bf0a5afc92	Update TensorRT-LLM (#1598 ) * Update TensorRT-LLM	2024-05-14 16:43:41 +08:00
Kaiyu Xie	89ba1b1a67	Update TensorRT-LLM (#1554 )	2024-05-07 23:34:28 +08:00
Kaiyu Xie	06c0e9b1ec	Update TensorRT-LLM (#1530 )	2024-04-30 17:19:10 +08:00
Kaiyu Xie	c89653021e	Update TensorRT-LLM (20240116) (#891 ) * Update TensorRT-LLM --------- Co-authored-by: Eddie-Wang1120 <81598289+Eddie-Wang1120@users.noreply.github.com> Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-01-16 20:03:11 +08:00
juney-nvidia	6cc5e177ff	Update issue templates	2024-01-03 16:22:51 +08:00
juney-nvidia	a413d132b8	Update issue templates	2024-01-03 16:22:03 +08:00

37 Commits