TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Erin	4becf32360	fix: reshape token_ids for lp in torch backend (#4239 ) reshape token_ids Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-05-13 08:43:47 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
yuxianq	a4c3359513	fix: Reset planned states to avoid memory leak in TrtllmAttentionWrapper (#4227 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-12 23:25:54 +08:00
Fridah-nv	3dbb087292	[TRTLLM-5188] fix: [AutoDeploy] update output shape of prepare_fused_mha_metadata_fake (#4199 ) * update output shape of fake kernel prepare_fused_mha_metadata_fake Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> * minor Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> --------- Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-05-12 11:11:40 -04:00
Robin Kobus	b1bee9c394	Revert "Add initial list of CODEOWNERS (#4105 )" (#4234 ) This reverts commit `aa7300e040`. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-12 16:53:49 +02:00
Yiteng Niu	31a2e2d08d	doc: update switcher.json config (#4220 )	2025-05-12 20:40:55 +08:00
Enwei Zhu	c31ca1688c	[https://nvbugs/5214229 ] [fix] Unwaive lm_head quantization case (#4222 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 20:23:06 +08:00
yuxianq	b35f9a67f9	refactor: Allow models to override apply_qk_norm. (#4078 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-12 19:38:24 +08:00
Zheng Duan	c9e2a963e0	feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-05-12 07:23:57 -04:00
Yixin Dong	c90ebadd84	feat: Support the Structural Tag in guided decoding (#4066 ) * finish Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * exc overlap scheduler Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix api ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Ubospica <ubospica@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 17:24:50 +08:00
Yechan Kim	3e9bda3a09	[feat] Support HyperCLOVAX-SEED-Text language part (#3902 ) * feat: support HyperCLOVAX-SEED-Text language part Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add Pytorch flow and remove test file Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * revert summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove from pytorch example Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-05-12 16:05:14 +08:00
Martin Marciniszyn Mehringer	33977dbd42	infra: [TRTLLM-325] Prepare for NGC release - multiplatform build (#4191 ) * infra: [TRTLLM-325] Prepare for NGC release - prepare multiplatform build Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-05-12 00:38:45 -07:00
Perkz Zheng	3f29d2f006	Feat: support exporting softmax statistics and update the kernel-selection heuristic (#4155 ) * update cubins Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> * support exporting softmax statistics and update the kernel-selection heuristic Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> --------- Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-05-12 15:31:46 +08:00
Zhenhuan Chen	9212e9a740	[TRTLLM-4911] feat(scaffolding): make sampling_params only setable by controller (#4151 ) feat(scaffolding): make sampling_params only setable by controller Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-05-12 15:29:09 +08:00
Ivy Zhang	ee92edf2b4	[https://nvbugspro.nvidia.com/bug/5270564 ][test] skip per-hopper for llama4 (#4211 ) skip per-hopper for llama4 Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-12 15:27:15 +08:00
Robin Kobus	ba13b51a58	chore: Update CODEOWNERS (#4221 ) Remove @funatiq and @dcampora Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-11 23:55:20 -07:00
ruodil	9c03a7ab74	test: add llama_3.2_1B model and fix for test lora script issue (#4139 ) * test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add llama_3.2_1B model and fix for lora script issue Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-12 14:51:59 +08:00
xinhe-nv	849d9c343c	tests: https://nvbugs/5219534 remove failed tests from test list (#4113 ) remove unsupported tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-12 14:13:40 +08:00
xinhe-nv	186e2b8c38	[TRTQA-2802][fix]: add --host for mgmn serve examples script (#4175 ) remove prepare data Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-12 13:28:42 +08:00
Chuang Zhu	1333f4f5d5	remove cache_transceiver_prealloc_size (#4153 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-12 11:53:53 +08:00
Yiqing Yan	3c54e84e47	[Infra] Waive L0 test (#4212 ) Waive L0 test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-12 11:37:49 +08:00
nv-guomingz	420048205f	chore:update modelopt to 0.29 (#4150 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-05-12 10:32:19 +08:00
QI JUN	b050e70779	[CI] update pytorch only file list (#4210 ) update pytorch only file list Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-12 10:06:25 +08:00
QI JUN	f021afa241	[CI] waive two multi-gpu test cases (#4206 ) waive two multi-gpu test cases Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-12 08:04:48 +08:00
Enwei Zhu	7db368c72c	test: Remove CNN Dailymail tasks in favor of GSM8K (#4187 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-10 09:02:07 +08:00
mayani-nv	fe3a993234	chore: PR to fix the formatting errors (#4200 ) * updating the run_dtm_pld.py to handle logits correctly * following correct code formatting * Update run_dtm_pld.py to account for correct code formatting Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com> * correct formatting for the multimodal README PR --------- Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com> Co-authored-by: Ubuntu <Azureuser@mayani-nch100-vm1.42fmcfe2wyrepoque0x2bs4uue.jx.internal.cloudapp.net> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-09 16:31:36 -07:00
Kevin Chen	aa7300e040	Add initial list of CODEOWNERS (#4105 ) Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2025-05-09 16:16:48 -07:00
mayani-nv	5c1c69cf9c	fix: draft target README and assertion for logits-based acceptance (#4167 ) * updating the run_dtm_pld.py to handle logits correctly * following correct code formatting * Update run_dtm_pld.py to account for correct code formatting Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com> --------- Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com> Co-authored-by: Ubuntu <Azureuser@mayani-nch100-vm1.42fmcfe2wyrepoque0x2bs4uue.jx.internal.cloudapp.net> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-09 16:08:47 -07:00
mayani-nv	25533a7736	Updating the multimodal models README to add steps for running phi-4-multimodal instruct (#3932 ) * Update run.py for draft_target_model This change makes the draft target model works without mismatch in the vocab size Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com> * updating README with phi-4-multimodal-instruct steps * adding ENGINE_DIR, HF_DIR and CKPT_DIR as per review * addressing review comments on PR * updating readme --------- Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com> Co-authored-by: rakib-hasan <rhasan@nvidia.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-05-09 15:42:58 -07:00
Dom Brown	2d0f93a054	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 ) * Refactor: Restructure C++ tests for better modularisation of non-shared code Start cleanup of pytest code for C++ tests Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Clean up names and remove references to test_cpp.py Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Move multi-GPU code Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Update doc and try un-waiving Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Update multi GPU file check Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Address minor multi-GPU setup bug Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> --------- Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-09 19:16:51 +01:00
Frank	0dcf47f1c2	[TRTLLM-4717][perf] Set CUDA graph max batch size and padding in throughput benchmark. (#3875 ) * Set cuda graph max batch size. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Set padding. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> --------- Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-05-09 23:20:52 +08:00
Mike Iovine	4b8ba7ad61	[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue (#4069 ) [fix] Fix llama 4 test lists Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-09 22:45:14 +08:00
Tracin	446f62bbab	chore: Deprecate evaltool (#4173 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-05-09 20:31:53 +08:00
zhhuang-nv	0a36db0aa4	[fix] trtllm-gen mla kernel warnings (#4119 ) fix trtllm-gen mla kernel warnings Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-05-09 20:21:28 +08:00
Martin Marciniszyn Mehringer	d0e672f96d	chore: [TRTLLM-325][infra] Prepare for NGC release - reduce size of the docker images (#3990 ) * chore: reduce size of the docker images Signed-off-by: Martin Marciniszyn Mehringer <11665257+martinmarciniszyn@users.noreply.github.com> * Finish the renaming script and run with new images. Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> * Fix installation of GCC toolset for Rocky Linux Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> * Upgrade to new docker images Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> --------- Signed-off-by: Martin Marciniszyn Mehringer <11665257+martinmarciniszyn@users.noreply.github.com> Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-05-09 19:31:29 +08:00
ruodil	bf5b2a2e0a	test: amend regex match for perf throughput (#4186 ) amend regex match for perf throughput Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 17:33:25 +08:00
chenfeiz0326	ffc13bd325	Cherry-pick: Use multi-threading to load MoE expert weights (#4137 ) * Use multi-threading to load MoE expert weights Signed-off-by: Po-Han Huang <pohanh@nvidia.com> * Update code formatting Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> * Update code formatting Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> --------- Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Co-authored-by: Po-Han Huang <pohanh@nvidia.com>	2025-05-09 17:29:24 +08:00
WeiHaocheng	0f01826dde	feat: support task collection for to collect information (#3328 ) (#3824 ) Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>	2025-05-09 17:09:01 +08:00
xinhe-nv	9082411a50	test: [CI] Add failed cases into waives.txt (#4165 ) wavie oom tests Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-05-09 16:56:30 +08:00
Fanrong Li	0cf0fce5d3	[fix] Fix add_dummy_requests for spec decoding cases (#4084 ) * fix add_dummy_requests. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * add max_seq_len to eagle3 test and fix add_dummy_requests. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix prompt_len in add_dummy_requests. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * add prepare_resource condition in add_dummy_requests. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * add some description of token_nums to add_dummy_requests and fix token_nums in torch compile warmup. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix available_tokens. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-09 16:52:51 +08:00
ruodil	5ce5b81281	test: amend default pytorch extra-llm-api-config.yml in perf test (#4176 ) * amend default pytorch extra-llm-api-config.yml Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> * add print info to separate cases in output log Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> --------- Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-09 16:46:48 +08:00
Shi Xiaowei	87f0f79554	fix: library path of nixl (#4184 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-05-09 16:31:55 +08:00
Zhanrui Sun	e30c76c530	infra: Fix pipeline step error in post merge (#3948 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-05-09 15:26:15 +08:00
xinhe-nv	1d26a3fd7c	test: skip tests on b200 (#3913 ) * skip tests on b200 Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * skip phi-3-128k Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-09 14:51:55 +08:00
Fanrong Li	77f8e43592	[fix] Fix relaxed acceptance to support enabling it in context phase (#4126 ) * fix relaxed acceptance to support enable this feature in context phase. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix sample_and_accept_draft_tokens unit test. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-05-09 14:11:14 +08:00
Bo Li	e3cf3fd15f	test: Add fp8kv to DS-v3-lite integration tests. (#3950 ) * Add fp8 kv cache tests to DSV3-Lite integration tests. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update gsm8k. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update CI list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update TestDeepSeekR1. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix test list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Need quant_config besides pytorch_config. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list (bug 5239087). Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Correct test name. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Bo Li <bobboli0202@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-09 13:35:04 +08:00
Ivy Zhang	c91d03fa0a	test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440 ) * add mistral-7b-v0.1 torch flow test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mistral Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mixtral case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove api function test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mistral nemo cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mixtral cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove awq llmapi test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix partial comments Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix path Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update thres Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove duplicate test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:32:02 +08:00
Ivy Zhang	c2d4c2adb6	[https://nvbugspro.nvidia.com/bug/5260676 ]test: skip fp8 quantization case for pre-ada (#4095 ) skip pre ada Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:30:16 +08:00
Yukun He	c9cac432dc	chore: Fix pipeline break caused by previous PR (#4081 ) rebase + pipeline reuse (#4169 ) Fix import break caused by rebase. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-09 12:51:02 +08:00

1 2 3 4 5 ...

823 Commits