TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
QI JUN	b5473f7eca	waive llama3.1 8B test cases with pipeline parallelism (#3433 ) * waive llama3.1 8B test cases with pipeline parallelism Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-10 11:07:58 +08:00
peaceh-nv	215fb20567	chore : split GptExecutor tests out of gpt tests to reduce single test time (#3412 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-10 09:08:15 +08:00
Yechan Kim	943218b54a	feat: Add Qwen2.5-VL and refactor Qwen2-VL (#3156 ) * feat: Add Qwen2.5-VL and refactor Qwen2-VL Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix yapf and codespell Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix test_e2e Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * generalize get_rope_index Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix qwen2.5-vl on REAME Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix image test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-10 04:09:03 +08:00
danielafrimi	47f5cf6c0d	lora_tests (#3201 ) LoRA tests and layers Signed-off-by: Ubuntu <dafrimi@nvidia.com> Co-authored-by: Ubuntu <dafrimi@nvidia.com>	2025-04-09 18:06:52 +03:00
WeiHaocheng	6eee15900e	feat: Enhance the integrated robustness of scaffolding with __init__.py #3305 (#3312 ) Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>	2025-04-09 21:13:47 +08:00
sugunav14	64abb01a36	Fix failing DSV3 unit tests (#3385 ) * Skipping DSV3 module patch unit tests Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> * update tested Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> * Fixed failing unit test Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> --------- Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>	2025-04-09 11:57:05 +08:00
Iman Tabrizian	8401722245	test: Add single gpu disaggregated tests (#3295 ) * test: Add single gpu disaggregated tests Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Add deepseek with overlap tests Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Use updated prompt Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Move test to disaggregated folder Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-04-09 09:34:45 +08:00
Mike Iovine	5bdf997963	Add Llama 4 (#3302 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-04-09 03:35:21 +08:00
yuxianq	7225bd8b91	chore: Refine attention backend interface. (#3271 ) Refine attention backend interface. Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-09 02:34:53 +08:00
wili	54ad95eaa8	Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338 ) * feat/Variable-Beam-Width-Search-Part3, v1.0 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat/Variable-Beam-Width-Search-Part3, v1.1 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat/Variable-Beam-Width-Search-Part3, v1.2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>	2025-04-08 23:51:27 +08:00
sugunav14	84fc07b011	feat: [TRTLLM-3510] DeepseekV3 support in AutoDeploy (#3281 ) Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>	2025-04-08 21:47:57 +08:00
pcastonguay	02f446a9ff	chore: Adding DS V3-lite tests with overlap + cuda graph (#3342 ) * chore: Adding DS V3-lite tests with overlap + cuda graph Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing pre-commit Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-08 09:36:09 -04:00
yuxianq	7b03350527	Add thread leak check and fix thread/memory leak issues. (#3270 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-08 19:03:18 +08:00
liji-nv	dca6397d1e	feat: Introduce UB allocator for pytorch flow (#3257 ) * Instead of allocating UserBuffers at beginning of runtime, UB buffers are now managed with global allocator. The allocator will dynamically assign free UB buffer or allocate new buffer for torch tensor. It makes userbuffers easier to use. * In common usecase, the Userbuffers will be allocated correctly during warm up stage. There is no dynamic allocation during inference. * UB fusion pattern is rewroten using the new UB Allocator. It contains following passes: 1. Fuse Quant with allreduce, replace with UB impl, and insert a copy_to_userbuffers. Currently the normal allreduce still does not support FP8 quant. So this need to be done in UB pass 2. Convert all supported allreduce with UB and insert copy_to_userbuffers. 3. Fuse op before ar with the copy_to_userbuffers. So the op directly writes to the userbuffer 4. Remove userbuffers finalize if the output is connect to another UB allreduce. Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-04-08 18:39:49 +08:00
Chuang Zhu	cdb0906be4	disagg test single h100 (#3353 )	2025-04-08 17:45:35 +08:00
amirkl94	e04f6a1b9b	fix: Fix p-tuning test bug (#3326 ) * fix: Fix p-tuning test bug * A change in the vocab_size calculation for T5Tokenizer, introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning. In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added. Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-04-08 17:14:00 +08:00
Enwei Zhu	8ee019f8c4	test: Accuracy test improvement (Part 3.4): Move LLaMA tests (#3350 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-08 15:07:57 +08:00
Yukun He	c678774c99	feat: Apply the new torch-flow compatible AutoTuner to both Fused MoE and NVFP4 Linear operators. (#3151 ) * Several optimizations and fixings on the Autotuner. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Apply the new Python side Autotuner on current linear for nvFP4 data type. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Apply the new Python side Autotuner on MoE op * Remove routers from cache key to improve inference perf * Prevent unnecessary code profiling. Use do_preparation keyword to select which part should be executed during before evaluating any tactic. * Remove try-catch inside moe profiling process. * Move default tactic -1 to 0 transforms in cpp runner. * Revise relavant tests. * Predefined the bucketizing strategy for fused_moe Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Add specific_profile support for AutoTuner to bypass the standard cache search process for perf optimization * Add specific_profile for moe * Add specific profile for linear Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Fixing and revising according to reviewer's suggestions. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Use lru_cache for inference pref optimization. * Revert gen_custom_cache_key feature Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Replace runner with runner id to achieve a serializable cache. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Code clean up and minor fixings. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Move all tunable runners and custom ops into torch_custom_ops. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Treat min_latency_mode as a independent dynamic tensor. Modify get_valid_tactics to suit for it. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> --------- Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-04-08 14:28:36 +08:00
Gabriel Wu	f1655afb0d	feat: enable DeepGEMM by default (#3341 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>	2025-04-08 13:58:57 +08:00
Fanrong Li	62e0876e39	Waive unittest/trt/model/test_mamba.py::TestMamba::test_loaders_mamba_130m_hf_from_checkpoint. Will fix it later. (#3356 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-04-07 22:36:35 -07:00
MinaHuai	31422e7e46	add tp=2 ci test for vision encoder (#3319 ) Signed-off-by: mhuai <mhuai@nvidia.com>	2025-04-07 21:46:08 -07:00
Gabriel Wu	42c8574e93	fix: revert extra cmake var (#3351 ) Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-08 11:57:16 +08:00
pcastonguay	add5e5cd93	feat: Add option to run disaggregated serving without ctx servers,… (#3243 ) * feat: Add option to run disaggregated serving without ctx servers, to benchmark gen only Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing comment in sanity check Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-07 21:56:03 -04:00
Void	efe2ecfb37	fix: runtime error in est_deepseek_allreduce.py (#3226 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-04-08 09:19:47 +08:00
Enwei Zhu	ba019a43d6	test: Accuracy test improvement (Part 3.3): Move DeepSeek tests (#3260 ) add skip fix fix update update test list fixqa list move bf16 to postmerge Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-08 07:19:04 +08:00
Gabriel Wu	376731013d	feat: use NVRTC for DeepGEMM JIT compilation (#3239 ) * feat: use NVRTC for DeepGEMM JIT compilation Signed-off-by: Zihua Wu * fix: add license Signed-off-by: Zihua Wu * feat: store NVRTC JIT results in memory by default Signed-off-by: Zihua Wu * feat: refinement Signed-off-by: Zihua Wu * feat: refinement Signed-off-by: Zihua Wu * test: set timeout to 7200 Signed-off-by: Zihua Wu --------- Signed-off-by: Zihua Wu	2025-04-07 20:29:23 +08:00
YueWeng	aab6214801	test: fix conflicting test names (#3316 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-04-07 20:10:01 +08:00
pansicheng	ef1ba468a1	feat: support abort disconnected requests (#3214 ) Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>	2025-04-07 16:14:58 +08:00
QI JUN	a2fad51011	chore: waive a timeout multi-GPU test case (#3310 ) * debug CI timeout issue Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * waive timeout case Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-07 14:04:54 +08:00
brb-nv	017361c26c	test: Waive non-Llama Eagle tests (#3309 )	2025-04-07 09:25:41 +08:00
tburt-nv	7a659885e3	chore: remove usernames from comments (#3291 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-04-05 13:44:28 +08:00
Yan Chunwei	b21cfcfed1	chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025 ) * make LlmArgs Pydantic Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * amending doc fix api_stability fix tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * restore yaml groups refine StackTrace singleton clean tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix trtllm-bench fix pytorch Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix serve distagg Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-05 13:31:48 +08:00
qixiang-99	0d4d50a745	feat: no-cache attention in PyTorch workflow (#3085 ) * init trtllm attn no cache Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: fix the seq_len issue and attn metadata prepare for qwen reward model test fix: fix minor bugs after rebase Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: remove unnecessary debug logs and clean up commented code refactor: update max_seq_len documentation and remove max_seq_len for decoder model contructor in PyTorchModelEngine Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: update calculate_ref_result function to accept tensor inputs and mask type, enhance test_attention_no_cache to support FULL and CAUSAL masks Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: remove unused BERT attention metadata conversion method and add type assertion for no cache attention in PyTorchModelEngine Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: remove use_kv_cache parameter from attention function and related classes, update documentation for KV cache handling Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: implement setAttentionMaskType method for better mask type handling and remove unused conversion function Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: streamline KV cache handling by replacing direct member access with useKVCache method and simplify token per block assignment remove Debug code. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: Resolve comments for Python code Simplify no cache attention metadata preparation and streamline related attributes in TrtllmAttentionMetadata Removed the private method for converting to no cache attention metadata and integrated its logic into the prepare method. Updated the test for BERT sequence classification to reflect these changes and ensure proper handling of attention metadata. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * docs: Add is_dummy_attention field to attention metadata for simulation operations Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * refactor: add KVCacheParams to attention backend interface and import relevant metadata classes Updated the attention backend interface to include KVCacheParams and imported TrtllmAttentionMetadata and VanillaAttentionMetadata in model_engine.py for enhanced functionality. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: fix rebase format issue Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: extend attention mask type handling in MHARunnerFixedParams Added support for additional attention mask types (BIDIRECTIONAL, BIDIRECTIONALGLM, BLOCKSPARSE) in the MHARunnerFixedParams structure to fix the mapping issue between ContextAttentionMaskType and AttentionMaskType Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> * fix: enhance attention mask type handling in TllmGenFmhaRunnerParams Updated the setAttentionMaskType method to include a switch-case structure for better handling of attention mask types, ensuring proper mapping and error handling for invalid types. Signed-off-by: Qixiang Lin <qixiangl@nvidia.com> --------- Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>	2025-04-05 01:54:32 +08:00
QI JUN	059a34468c	fix deepseek multi gpu tests timeout (#3285 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-04 16:19:02 +08:00
yuanjings-nvda	5776b99b70	fix vila test (#3042 ) Signed-off-by: Yuanjing Shi <yuanjings@nvidia.com>	2025-04-04 14:30:06 +08:00
shaharmor98	ee4aab72ec	feat: Support PeftCacheManager in Torch (#3186 ) * Add PeftCacheManager implementation Signed-off-by: Shahar Mor <smor@nvidia.com>	2025-04-04 12:38:08 +08:00
Pengyun Lin	f25c7cefb4	doc: refactor trtllm-serve examples and doc (#3187 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-04 11:40:43 +08:00
Tracin	bb6c338730	AWQ support Modelopt ckpts. (#3258 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-04 08:10:35 +08:00
pcastonguay	b763051ba4	chore: Refactor disaggregated serving scripts (#3073 ) * chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Updating README, removing launch script Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing integration tests Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding scripts to populate urls section of disagg config based on SLURM env vars Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-03 14:55:05 -04:00
Yibin Li	32ae1564bd	update FP4 quantize layout (#3045 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-04-03 13:13:54 -04:00
Yukun He	d138795485	Fix minor issues in test_autotuner.py and loose the cache check for test gemms. (#3261 ) This test can cause nondeterministic failures on CI with unexpected kernel profiling results. Given longer delay time or cache clear will not solve the issue. Thus, loose the test checks to avoid these false alarms. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-04-03 18:24:08 +08:00
xinhe-nv	2005e5aaaf	remove tests from qa test lists (#3256 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-03 16:06:39 +08:00
xiweny	174a5af779	doc: refine integration test guide (#3215 ) * doc: refine integration test guide Signed-off-by: Xiwen Yu <xiweny@nvidia.com>	2025-04-03 15:36:13 +08:00
Zhanrui Sun	7f03125098	test: [TRTLLM-3994] Support only run pytorch tests (#3013 ) * [TRTLLM-3994] Support only run pytorch tests Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Move perf test to TensorRT backend Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-03 13:46:09 +08:00
pcastonguay	b5b83009ff	chore: Reenabling get_stats_async test which seems to have been fixed by recent commit (#3246 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-04-02 20:57:31 -07:00
Jinyang Yuan	2fdfa39ea8	fix: Fix an error related to dummy request when MTP is used (#3146 )	2025-04-03 11:08:12 +08:00
Chuang Zhu	f5bf74bc7f	enable some disagg test (#3203 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-03 06:10:48 +08:00
Enwei Zhu	3cf7066350	test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) (#3219 ) * remove test_llm_models_multi_gpu.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * qwen 2.5 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * upgrade Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:29:57 +08:00
Zongfei Jing	8d48b96545	reduce test cases for deepseek (#3211 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-02 13:57:55 +08:00
wili	34e63d07e6	feat: Variable-Beam-Width-Search (VBWS) Part2 (#3133 ) * feat: Variable-Beam-Width-Search Part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part2, fix CPP tests Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part3, simplify CPP tests Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search Part4, move beam_width_array param Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search, fix CI error Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2 Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2, fix pre-commit Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> * feat: Variable-Beam-Width-Search part2, fix review Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>	2025-04-02 12:31:28 +08:00

1 2 3 4

171 Commits