TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-07 11:41:47 +08:00

Author	SHA1	Message	Date
Yan Chunwei	174c5188a2	fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428 ) * add test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-20 20:16:14 +08:00
Yan Chunwei	5b1c88de8d	chore: cleanup perf_evaluator code (#3833 ) * chore: cleanup perf_evaluator code Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * up Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-19 13:21:36 +08:00
Pengyun Lin	039f7e3118	[https://nvbugspro.nvidia.com/bug/5243740 ][fix] deduce default max_tokens for trtllm-serve (#4265 ) * Deduce default max_tokens for trtllm-serve Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> * Improve executor_config.max_seq_len assignment in TRT workflow Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> * Enhance error message Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> * Add deduced max_tokens test Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> --------- Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-05-19 00:34:40 +08:00
shaharmor98	27afcb9928	add changes for fp8, nemotron-nas, API (#4180 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-05-18 23:27:25 +08:00
Yechan Kim	c6e2111f4e	feat: enhance trtllm serve multimodal (#3757 ) * feat: enhance trtllm serve multimodal 1. made the load_image and load_video asynchronous 2. add image_encoded input support to be compatible with genai-perf 3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix bandit Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * trimming uils Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * trimming for test Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * genai perf command fix Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * command fix Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * refactor chat_utils Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * stress test genai-perf command Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-05-15 16:16:31 -07:00
Kaiyu Xie	b4e5df0ee0	Breaking change: perf: Enable scheduling overlap by default (#4174 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-15 14:27:36 +08:00
pcastonguay	9643be5f20	[TRTLLM-5050][feat] Enable per-request stats with PyT backend (#4156 ) * feat: Add per-request stats support with PyT backend Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding unit test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing stats unit test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing test with overlap Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-12 21:35:15 -04:00
Yixin Dong	c90ebadd84	feat: Support the Structural Tag in guided decoding (#4066 ) * finish Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * exc overlap scheduler Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix api ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Ubospica <ubospica@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 17:24:50 +08:00
QI JUN	f021afa241	[CI] waive two multi-gpu test cases (#4206 ) waive two multi-gpu test cases Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-12 08:04:48 +08:00
pcastonguay	836c142e1b	[feat] Allow overriding cli args with yaml file in trtllm-serve (#4164 ) feat: Allow overriding cli args with yaml file in trtllm-serve Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-08 21:19:05 -04:00
shaharmor98	7d94c9561f	feat: support multi lora adapters and TP (#3885 ) * support multi lora, tp Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-05-08 23:45:45 +08:00
Enwei Zhu	19eeef1ddd	test: Waive test_llm cases (#4136 ) * waive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-08 11:53:16 +08:00
Yan Chunwei	0c26059703	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 ) * beam_width and max_new_token Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove beam_width Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove min_length Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * remove return_num_sequences Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-07 13:20:25 +08:00
Erin	cba1793cda	cleanup logprob params (#4039 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-05-07 00:50:16 +08:00
pansicheng	e84dc6b3c7	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 ) * add deepseek-r1 reasoning parser Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> * fix test Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> --------- Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-05-06 08:13:04 +08:00
Erin	8fe7bdeacf	feat: LogitsProcessor in PyTorch backend (#3145 ) * support lp in pytorch backend Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> * fix tp Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> --------- Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-05-01 14:15:30 -07:00
Erin	83f37614ef	feat: Support Top-K logprobs and prompt_logprobs in LLMAPI (#3388 ) * support return logprob in llmapi Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> update and add test Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> stability test Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> * revert removal of old flag Signed-off-by: Erin Ho <erinh@nvidia.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> --------- Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Erin Ho <erinh@nvidia.com>	2025-05-01 12:47:14 -04:00
Erin	941e82faa6	waive test_tinyllama_guided_decoding (#3997 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-04-30 12:46:32 -07:00
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
Yan Chunwei	ad4226d946	fix: trtllm-bench build trt engine on slurm (#3825 ) * add submit_sync to RemoteMpiSessionClient Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> add barrier Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> fix comment Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> disable test Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-27 22:26:23 +08:00
Yiqing Yan	238fefc659	[infra] Waive L0 tests (#3853 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-25 17:32:21 +08:00
Kaiyu Xie	dfbcb543ce	doc: fix path after examples migration (#3814 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-24 02:36:45 +08:00
QI JUN	257abfbc51	move pytorch tests of LLM API into separate test files (#3745 ) * move pytorch tests of LLM API into separate test files Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * polish Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-22 14:36:59 -07:00
Yan Chunwei	231b39015c	unwaive multi_node test (#3715 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-21 21:26:07 +08:00
QI JUN	d51ae53940	move the reset models into `examples/models/core` directory (#3555 ) * move rest models to examples/models/core directory Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update multimodal readme Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix example path Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix cpp test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix tensorrt test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-19 20:48:59 -07:00
Yechan Kim	5460d18b10	feat: trtllm-serve multimodal support (#3590 ) * feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-19 05:01:28 +08:00
Zheng Duan	bce7ea8c38	test: add kv cache event tests for disagg workers (#3602 )	2025-04-18 18:30:19 +08:00
Yan Chunwei	2a09826ec4	fix hmac in remote mpi session (#3649 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-18 17:47:51 +08:00
Erin	4fedf0be5c	unwaive test for nvbug_5150466 (#3552 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-04-18 15:15:58 +08:00
rakib-hasan	ff3b741045	feat: adding multimodal (only image for now) support in trtllm-bench (#3490 ) * feat: adding multimodal (only image for now) support in trtllm-bench Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * fix: add in load_dataset() calls to maintain the v2.19.2 behavior Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * re-adding prompt_token_ids and using that for prompt_len Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * updating the datasets version in examples as well Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * api changes are not needed Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * moving datasets requirement and removing a missed api change Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * addressing review comments Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * refactoring the quickstart example Signed-off-by: Rakib Hasan <rhasan@nvidia.com> --------- Signed-off-by: Rakib Hasan <rhasan@nvidia.com>	2025-04-18 07:06:16 +08:00
QI JUN	91660939fd	tests: waive test_llm_multi_node (#3664 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-18 01:59:16 +08:00
QI JUN	fac1a905e9	waive test_llm_multi_node_with_postproc (#3628 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-16 05:49:39 -07:00
Yan Chunwei	63f3fba679	waive test_llm_multi_node_pytorch (#3592 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-16 10:49:07 +08:00
Pengyun Lin	1899e71364	doc: add genai-perf benchmark & slurm multi-node for trtllm-serve doc (#3407 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-04-16 00:11:58 +08:00
xinhe-nv	5cfa927132	update waive list (#3503 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-15 16:53:53 +08:00
xinhe-nv	863d023fd0	test: fix memory leak of tests (#3392 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-10 14:31:40 +08:00
yuxianq	7b03350527	Add thread leak check and fix thread/memory leak issues. (#3270 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-08 19:03:18 +08:00
pansicheng	ef1ba468a1	feat: support abort disconnected requests (#3214 ) Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>	2025-04-07 16:14:58 +08:00
Yan Chunwei	b21cfcfed1	chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025 ) * make LlmArgs Pydantic Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * amending doc fix api_stability fix tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * restore yaml groups refine StackTrace singleton clean tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix trtllm-bench fix pytorch Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix serve distagg Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-05 13:31:48 +08:00
Pengyun Lin	f25c7cefb4	doc: refactor trtllm-serve examples and doc (#3187 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-04-04 11:40:43 +08:00
pcastonguay	b5b83009ff	chore: Reenabling get_stats_async test which seems to have been fixed by recent commit (#3246 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-04-02 20:57:31 -07:00
Enwei Zhu	3cf7066350	test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) (#3219 ) * remove test_llm_models_multi_gpu.py Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * qwen 2.5 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * upgrade Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-02 17:29:57 +08:00
bhsueh_NV	322ac565fc	chore: clean some ci of qa test (#3083 ) * move some models to examples/models/contrib Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * update the document Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove arctic, blip2, cogvlm, dbrx from qa test list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove tests of dit, mmdit and stdit from qa test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * remove grok, jais, sdxl, skywork, smaug from qa test list Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * re-organize the glm examples Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix issues after running pre-commit Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix some typo in glm_4_9b readme Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-03-31 14:30:41 +08:00
xiweny	6979afa6f2	test: reorganize tests folder hierarchy (#2996 ) 1. move TRT path tests to 'trt' folder 2. optimize some import usage	2025-03-27 12:07:53 +08:00
Yuan Tong	53adb3cb4e	test: waive flaky test_kv_cache_event_async_api (#3062 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-03-25 18:41:30 +08:00
Yan Chunwei	531b98ed62	feat: Add several pure python configs to LlmArgs (#2997 ) * add SchedulerConfig * add PeftCacheConfig	2025-03-24 16:16:17 +08:00
Kaiyu Xie	2631f21089	Update (#2978 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-03-23 16:39:35 +08:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00

48 Commits