TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Lizhi Zhou	33b0b945c7	[https://nvbugs/5582277 ][fix] rework DisaggPPTerminationHandler to fix hang issue (#8519 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
Pengyun Lin	81fd9be87d	[https://nvbugs/5575829 ][fix] Unwaive gpt-oss test (#8576 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
Bo Deng	4ca6fe83d8	[https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… (#8500 ) Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
Guoming Zhang	af3900a195	[https://nvbugs/5504095 ][fix] Unwaive test_user_specify_workspace case. (#8316 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
Simeng Liu	9286223288	[https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. (#8440 ) Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
JunyiXu-nv	ee6944bfa2	[https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 (#8429 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
yufeiwu-nv	0e746fad45	[https://nvbugs/5667454 ][test] Fix Test Case as Chunked Attention not Supported on sm_120 (#9260 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2025-11-20 00:58:42 -08:00
Liao Lanyu	04ad9f96fa	[https://nvbugs/5667687 ][fix] Set correct lm_head_tp_size_upper_bound (#9300 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-11-20 00:41:00 -08:00
Emma Qiao	b018b2698d	[TRTLLM-9164][infra] Enable checking duplicate items in waives.txt in pre-commit (#9265 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-20 15:47:23 +08:00
mpikulski	a39e8c5567	[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema (#9305 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-20 08:32:23 +01:00
QI JUN	1bdd3ba173	[None][ci] waive test_disagg_server_restart (#9326 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-19 22:34:03 -08:00
Yechan Kim	d5622b2689	[None][fix] Multimodal InputProcessor dummy builder fix (#8916 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-19 22:32:21 -08:00
Chang Liu	79a6c9742b	[None][fix] Use fp32 for indexer weight_proj GEMM (#9243 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-19 21:52:38 -08:00
Chenghao Zhang	cd44f80abd	[#9316 ][feat] AutoDeploy: Add the accuracy test for Nemotron MOE models (#9317 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-19 21:48:50 -08:00
Bo Deng	2128f73d58	[TRTLLM-9247][infra] Upgrade NIXL to 0.7.1 (#9055 ) Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com>	2025-11-20 11:01:02 +08:00
Yukun He	b6bced83c0	[TRTLLM-7963][feat] Use CUDAGraph to improve the tuning accuracy for AutoTuner. (#9089 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-20 08:54:29 +08:00
brb-nv	f6ec6e2222	[None][chore] Waive tests timing out on main (#9315 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-19 13:10:06 -08:00
NVShreyas	1eae941d77	[#9237 ][feat] enable iter stats in autodeploy (#9278 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2025-11-19 19:29:29 +01:00
Neta Zmora	7ab02ad7b5	[None][feature] AutoDeploy: tighter MoE UT thresholds (#9195 ) Scale down the weights in the MoE test so that the output has reasonable magnitude, allowing for tighter atol and rtol Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-19 08:37:51 -08:00
Bo Li	d8b05894ee	[None][perf] Adjust select_alltoall_method_type. (#8950 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-19 07:43:55 -08:00
mpikulski	46dd9886bb	[https://nvbugs/5661877 ][fix] fix test regression in TestBatchedSampling::test_samples (#9215 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-19 01:44:44 -08:00
xinhe-nv	0f77fec932	[None][chore] Add failed cases into waives.txt (#9289 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-19 17:03:43 +08:00
CarstyYou	ee941ac779	[https://nvbugs/5456493 ][feat] add fp8 dense for sm120 (#9174 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-19 14:40:34 +08:00
nvxuanyuc	a79c0dfb43	[None][fix] Update GLM model accuracy test (#9286 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-11-18 21:59:01 -08:00
Emma Qiao	67d3eb26af	[None][infra] Waive failed cases for main branch on 11/17 (#9266 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-18 20:07:03 -08:00
ChristinaZ	941a54c66a	[None][feat] Update the indexer topK (#9255 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-19 11:49:00 +08:00
xinhe-nv	286ace22ed	[None][chore] Add failed cases into waives.txt (#9242 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-18 19:27:55 -08:00
Ivy Zhang	782dfca7e8	[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error (#9172 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 18:26:32 -08:00
Patrice Castonguay	9b0f45298f	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-18 20:59:17 -05:00
xinhe-nv	35658eab55	[None][chore] Add failed cases into waives.txt (#9193 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-18 17:47:55 -08:00
Enwei Zhu	7c4777a571	[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-18 17:40:12 -08:00
Lizhi Zhou	c789000a62	[https://nvbugs/5649010 ][fix] increase status-checking interval to avoid instability (#9203 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-19 08:55:42 +08:00
Bo Deng	34f845bf69	[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-11-18 14:46:20 -08:00
Ajinkya Rasane	8d7cda2318	[None][chore] Update the Flux autodeploy example (#8434 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-11-18 14:16:04 -08:00
Ziyi Xiong	7c4344b92e	[https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-18 15:41:56 -05:00
mpikulski	04fb481da3	[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-18 09:41:59 -08:00
Kaiyu Xie	d076aa44d3	[None] [tests] Unwaive wide ep related tests (#9204 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-18 08:54:46 -08:00
Zheyu Fu	c4e02d7f04	[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-11-18 11:13:39 -05:00
Ivy Zhang	160b361588	[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check (#9088 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 05:55:00 -08:00
Ivy Zhang	ca41a71f92	[TRTLLM-8948][test] Add long bench case (#9165 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 04:41:48 -08:00
Tri Dao	fc088e642c	[None][feat] Support Glm4MoeForCausalLM (#8256 ) Signed-off-by: Tri Dao <daominhtri0503@gmail.com> Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-11-18 09:43:21 +08:00
QI JUN	c3376fa114	[None][ci] split speculative test case into several small cases (#9209 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-17 17:02:25 -08:00
Robin Kobus	df41f220a2	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-17 18:07:13 +01:00
Kaiyu Xie	04be5a704e	[None] [fix] Fix missing ActivationType issue (#9171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-17 10:43:25 +08:00
Anthony Chang	86cfb3ea7e	[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-11-17 10:04:29 +08:00
Emma Qiao	d16b1a84c5	[None][infra] Waive a failed case in pre-merge stage 11/16 (#9192 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-17 09:36:56 +08:00
sunnyqgg	7862b15a65	[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-11-17 09:01:53 +08:00
Emma Qiao	2854f0cf3d	[None][infra] Waive failed tests for main branch 11/15 (#9187 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-16 01:48:25 -08:00
brb-nv	63237494db	[None][chore] Waive failing tests blocking pre-merge (#9189 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-16 01:06:03 -08:00
Erin	fe69243157	[None][chore] Add placement test for ray executor (#9122 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-11-14 23:10:59 -08:00
Chang Liu	bed4e95e9f	[https://nvbugs/5629887 ][fix] Add missing device count guard for DSv32 multiGPU tests (#9159 )	2025-11-14 07:52:23 -08:00
xinhe-nv	49b7e6301a	[None][chore] Add failed cases into waives.txt (#9156 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-14 06:28:22 -08:00
mpikulski	80bf840e69	[TRTLLM-9295][fix] unflake test_overlap_scheduler.py::test_overlap_scheduler_consis… (#9146 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-14 11:36:22 +01:00
yuanjingx87	d72321a32e	[None][ci] Waive unittest/_torch/sampler/test_torch_sampler.py::TestBatchedSampling (#9161 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-11-14 01:49:26 -08:00
Suyog Gupta	d12cb9436d	[None][feat] Autodeploy add triton configs and optimize mamba prefill (#9083 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-13 19:15:43 -08:00
QI JUN	3c950910a0	[None][ci] waive test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0] (#9162 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-13 18:56:37 -08:00
heyuhhh	f07e9977c6	[None] [feat] Use triton kernels for RocketKV prediction module (#8682 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-11-13 18:51:09 -08:00
Tailing Yuan	cc4c980e03	[None][feat] Add Qwen3-Next to layer-wise benchmarks (#9065 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-11-14 10:03:00 +08:00
JunyiXu-nv	fdb0787e85	[None][chore] Support json_schema in response_format (#8934 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-14 09:43:13 +08:00
Erin	44d1c75701	[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server (#8765 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-11-13 17:21:24 -08:00
Neta Zmora	34dc6869f3	[#8732 ][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011 ) Update TRTLLM Cutlass MoE kernels with ReLU2 activation. Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function. The PR adds this and adds an API to set the activation function, in general. The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954. The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-13 16:54:45 -08:00
dongxuy04	a370643b26	[None][fix] support topk autotuner input for expert slot per group larger than 32 (#9087 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-11-14 08:37:20 +08:00
Frida Hou	e96a3d294d	[None][autodeploy] minor refactor to rmsnorm transforms (#8657 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-13 13:13:58 -08:00
Ziyi Xiong	a7aaf50541	[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding (#8706 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-13 10:20:16 -05:00
William Zhang	121140cfec	[None][fixes] Add tool call parsing fixes and Qwen3 coder parser (#8817 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-11-13 04:34:38 -08:00
Lizhi Zhou	48a27c7bef	[https://nvbugs/5633340 ][chore] unwaive test_auto_scaling.py::test_disagg_server_restart (#9131 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-13 01:45:36 -08:00
Emma Qiao	d0ea417ec8	[None][infra] Waive failed tests for main 11/13 (#9132 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-13 01:00:40 -08:00
xinhe-nv	548f5ce4bc	[None][fix] waive failed tests (#9090 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-12 23:40:00 -08:00
xinhe-nv	8fa3c55c76	[None][chore] Remove closed bugs (#9114 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-12 22:49:37 -08:00
ruodil	c86e36fe38	[None][test] add deepseek and qwen cases for rtx series (#8839 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-11-12 22:28:02 -08:00
HuiGao-NV	cde18c12da	[https://nvbugs/5640873 ][fix] Move thop tests to pre-merge (#9094 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-13 13:08:13 +08:00
Zhang Ge	49df731b96	[#6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels (#6917 ) Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-13 12:14:58 +08:00
Yan Chunwei	4fd93bdc2c	[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming (#9118 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-12 19:55:09 -08:00
Yan Chunwei	8a8883bc73	[None][chore] Waive test_llm_rpc_streaming (#9113 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-13 11:06:26 +08:00
Zhenhuan Chen	943b05e2d3	[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-13 10:34:17 +08:00
QI JUN	3416efbc29	[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill (#9111 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-13 10:06:32 +08:00
dongxuy04	9241ccaf27	[None][feat] Enable EPLB for trtllm-gen and cutlass backend (#8886 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-11-12 12:30:27 -08:00
Chenghao Zhang	5f26c31954	[https://nvbugs/5636912 ][fix] AutoDeploy: Unwaive the test (#9018 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-12 12:26:38 -08:00
Patrice Castonguay	8a751a0e56	[None][chore] Remove is_disaggregated param in executor request queue (#9049 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-12 13:37:15 -05:00
Fanrong Li	780d4f9dc5	[None][feat] Add MTP>1 support for DS-v3.2 (#9045 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-12 09:56:12 -08:00
Iman Tabrizian	cdde15b275	[TRTLLM-8540][feat] Add support for disagg in DSv3.2 (#8735 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-11-12 08:21:11 -08:00
mpikulski	264d38e6c5	[TRTLLM-9175][test] ensure sampling is async (#9076 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-12 15:27:52 +01:00
yufeiwu-nv	b7a2574c60	[https://nvbugs/5568991 ][test] Remove Phi-3 models (#9066 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2025-11-12 03:16:36 -08:00
QI JUN	4003dc7574	[None][ci] waive some test cases of disaggregated serving (#9085 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-12 15:06:21 +08:00
Emma Qiao	bb6eb9510d	[None][infra] Waive a failed case of disaggregated/test_disaggregated.py (#9074 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 19:38:32 -08:00
QI JUN	fd703fbb7b	[None][ci] run speculative unit tests serially (#9080 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 19:06:44 -08:00
Lucas Liebenwein	aca56097cb	[None][fix] AutoDeploy: update nano3 accuracy test (#9061 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-11 12:26:31 -08:00
QI JUN	524754b6fd	[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 10:13:45 -08:00
Chenghao Zhang	ec9cf715a2	[None][feat] AutoDeploy: Perf improvement for mamba layers (#8991 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-11 08:27:07 -08:00
Wanli Jiang	ebdd1cc8e0	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-11 07:48:23 -08:00
mpikulski	b151de4a8f	[TRTLLM-8377][test] unit tests for TorchSampler batched sampling (#9012 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-11 07:16:42 -08:00
HuiGao-NV	23c388c58b	[https://nvbugs/5616189 ][fix] Make more cases use local cached models (#8935 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-11 03:14:05 -08:00
QI JUN	0ce22ce928	[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] (#9069 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 02:11:15 -08:00
Yiqing Yan	b7d51c5549	[None][chore] Remove duplicated waive test (#9067 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-11-11 16:49:49 +08:00
Emma Qiao	da1f0e2465	[None][infra] Waive failed tests on main 11/11 (#9058 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-11 13:19:30 +08:00
xinhe-nv	fac522056c	[None][chore] Add failed cases into waives.txt (#8998 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-11-11 12:40:59 +08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
shuyixiong	1ccb799c9a	[None][chore] Relocate rlhf_utils.py (#8938 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-10 19:03:23 -08:00
dongfengy	972c21c142	[None][chore] Clean up unused and confusing code in moe test (#9019 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-11-10 18:52:21 -08:00
Yechan Kim	0938a3ad2a	[https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-11 10:24:31 +09:00
Frida Hou	f40e1f7496	[https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp (#9047 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-10 16:25:58 -08:00
xiweny	50c486367a	[https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-10 08:12:14 -08:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
xinhe-nv	f848d844d9	[None][chore] Add failed cases into waives.txt (#9030 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-09 23:36:05 -08:00
Fanrong Li	a7033a9193	[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-10 12:16:01 +08:00
Yiqing Yan	78fac1f665	[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 (#9006 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-10 10:34:06 +08:00
Bo Li	67af7c15a5	[https://nvbugs/5637037 ][fix] Update unwaive list. (#9001 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-10 08:53:07 +08:00
Emma Qiao	183778d58a	[None][infra] Waive failed tests for main 11/07 (#9008 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-08 08:51:35 -08:00
Emma Qiao	2af6a537ad	[TRTLLM-8999][infra] Reduce gb200 multi-node test stages (#8778 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-08 06:34:24 -08:00
mpikulski	533add5056	[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend (#8951 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 17:47:35 -08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
Patrice Castonguay	d8ea0b967f	[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-07 07:33:51 -08:00
Yuxian Qiu	7b82ba90da	[https://nvbugs/5629790 ][chore] unwaive test. (#8967 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-07 18:41:32 +08:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
QI JUN	1c6e490894	[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-06 22:37:03 -08:00
Lizhi Zhou	b26e1617f2	[https://nvbugs/5633340 ][fix] kill processes properly after test (#8970 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-06 21:45:38 -08:00
xiweny	ee20e679a9	[https://nvbugs/5636986 ][fix] Fix DeepGemmMoe get_buffer calls (#8939 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-06 19:57:19 -08:00
Simeng Liu	9f8d93f89a	[https://nvbugs/5606136 ][ci] Remove tests for deprecating triton multimodal models. (#8926 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-11-06 17:58:42 -08:00
jthomson04	fcae852cef	[None][fix] Fix KV cache clearing with KV Connector API (#8750 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-11-06 14:28:27 -08:00
Chenghao Zhang	ddf2d010e2	[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear (#8820 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-06 11:00:10 -08:00
DylanChen-NV	b275635a9a	[https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill (#8910 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-11-06 07:41:21 -08:00
shuyixiong	c73efe12e7	[None][chore] Use cached model in all ray tests (#8962 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-06 15:14:15 +01:00
Fanrong Li	d246f62868	[https://nvbugs/5630345 ] [chore] skip deepseek-v3.2 fp8 kv tests on pre-Blackwell architectures (#8973 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-06 03:41:37 -08:00
yunruis	51545560da	[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-11-06 17:39:57 +08:00
Yilin Fan	b7798bfab8	[None][feat] Add `trtllm_` prefix for exposed metrics (#8845 ) Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>	2025-11-06 15:27:18 +08:00
xinhe-nv	e822184cd7	[None][feat] add waive by sm version (#8928 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-05 19:20:43 -08:00
Lucas Liebenwein	7a552c450a	[https://nvbugs/5606166 ][fix] AutoDeploy: unwaive test for use tuples for cudagraph shape lookup (#8957 ) also updated test waive for another nvbug Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-05 16:27:00 -08:00
Lucas Liebenwein	b181568d6f	[TRTLLM-8201][feat] Nemotron H MoE Sharding (#8744 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-05 12:35:29 -08:00
Chang Liu	e57d83c5dc	[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt (#8771 )	2025-11-05 07:57:09 -08:00
Fanrong Li	c2feed798a	[https://nvbugs/5630345 ][chore] unwaive DS-v32 nvfp4 and fp8 tests (#8887 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-05 03:49:23 -08:00
Chuang Zhu	595f78078c	[https://nvbugs/5624367 ][fix] Fix disagg GPT-OSS test (#8870 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-11-05 01:47:09 -08:00
Emma Qiao	31116825b3	[None][infra] Waive failed cases on main 11/05 (#8936 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-04 22:54:45 -08:00
xinhe-nv	cc4aa29523	[None][chore] Add failed cases into waives.txt (#8865 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-04 19:26:50 -08:00
Shiyu Li	eeb56c2848	[None][feat] MNNVLAllreduce Kernel Refactor (#8018 ) Signed-off-by: Shiyu Li <timlee0212@outlook.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-11-05 08:49:47 +08:00
Yechan Kim	ed81173c55	[None][ci] Add test on waives (#8915 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-05 08:42:08 +08:00
Patrice Castonguay	782824533e	[https://nvbugs/5587574 ][fix] Increase server timeout to wait for weight loading (#8806 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-04 12:11:08 -08:00
Frida Hou	11ded113cd	[#8389 ][fix] Update group attention matching to first map to custom torch attention (#8638 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-04 12:00:43 -08:00
shuyixiong	70e4d72ffa	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com>	2025-11-04 10:19:24 -08:00
Yanchao Lu	e2b2675120	[None][fix] Remove duplicated test waives (#8914 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-04 23:04:33 +08:00
Bo Li	e4bf29bc66	[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. (#8728 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-04 21:36:29 +08:00
Robin Kobus	7e4b87b17c	[None][ci] Remove outdated test entries (#8909 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-04 05:32:46 -08:00
Cao Dong	dddfcdd3bf	[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-04 19:32:59 +08:00
xiweny	cae468cc8e	[https://nvbugs/5596343 ] [test] Waive flaky GPT-OSS cases (#8904 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-04 03:00:00 -08:00
Zhanrui Sun	4de31bece2	[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-04 18:59:34 +08:00
CarstyYou	4296c9553d	[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-04 18:10:36 +08:00
Ivy Zhang	23717cdb3f	[TRTLLM-8580][test] save runtime report periodically (#8312 ) (#8455 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
xiweny	ce23e24123	[https://nvbugs/5565565 ] [fix] Remove waiver (#8450 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yukun He	6c8ba3be27	[None][chore] Remove duplicate log outputs in test_perf.py (#8418 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
ruodil	102e556863	[None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yukun He	2225745782	[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 ) Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it. Implemented new AllreduceOp heuristic: - Added Linear programming-based heuristic implementation. - Added LUT-based heuristic implementation and corresponding code generation script. AllreduceOp minor fixing: - Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set. - Fixed a minor TWOSHOT kernel perf issue. - Cleaned up Dispatching code in AllReduceOp. This PR will fix the perf gaps reported in: https://nvbugspro.nvidia.com/bug/5517023 For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Patrice Castonguay	65c138108e	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Ivy Zhang	9bcd2e6c0a	[None][chore] Update nim test list (#8356 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Stanley Sun	def9c0004d	[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357 ) Signed-off-by: Stanley Sun <stsun@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
xiweny	fcac2022e2	[https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 (#8228 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yueh-Ting (eop) Chen	bd1c9c0af4	[https://nvbugs/5625990 ][chore] Add test coverage for current incapability of the KV cache manager (#8829 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-11-04 16:35:45 +08:00
Yechan Kim	67208f1512	[None][fix] InputProcessor config naming convention fix (#8705 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 22:29:21 -08:00
Emma Qiao	4fe47faf47	[None][infra] Waive failed tests for main branch (#8897 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-03 22:21:28 -08:00
Zhanrui Sun	9ec6a6b68f	[None][infra] waive failed test on main 11/4 (#8896 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-03 21:37:09 -08:00
Matthias Jouanneaux	d0f107e4dd	[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-11-04 09:06:58 +08:00
Mike Iovine	5e6f1bcd24	[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-03 10:12:10 -08:00
Yechan Kim	f48968b6cc	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 06:01:07 -08:00
Emma Qiao	14bc8571ae	[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d (#8319 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-03 05:26:17 -08:00
Emma Qiao	d7176768cd	[None][infra] Waive the failed test for main on 11/3 (#8875 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-11-03 02:52:52 -08:00
Tailing Yuan	8303cfa477	[None][fix] Fix import issues in layer-wise benchmarks (#8827 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-11-03 02:32:48 -08:00
xinhe-nv	4873ca04cc	[https://nvbugs/5521799 ][fix] add harmony channel validation (#8837 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-03 02:31:54 -08:00
xinhe-nv	64540451e7	[None][chore] Add failed cases into waives.txt (#8872 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-03 01:19:04 -08:00
Fanrong Li	e9f78c687a	[https://nvbugs/5625962 ][chore] unwaive DS-v32-fp4 tests (#8853 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-03 00:34:52 -08:00
Yechan Kim	00c0e6c440	[https://nvbugs/5523315 ][fix] Fix serve benchmark test (#8255 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 00:30:13 -08:00
chenfeiz0326	cc4ab8d9d1	[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-11-03 16:23:13 +08:00
yufeiwu-nv	b4d17d1a4c	[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-11-03 13:34:06 +08:00
Chang Liu	f57dc01e6f	[https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input (#8846 )	2025-11-02 17:44:08 -08:00
dongfengy	6d6797c792	[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-11-02 16:44:02 -08:00
Yan Chunwei	1551ed8e5f	[https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests (#8567 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-11-01 06:49:33 -07:00
QI JUN	89e0117097	[TRTLLM-8836][chore] Create ModelEngine from LlmArgs (#8600 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-01 05:26:06 -07:00
dongxuy04	bba2519726	[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests (#8720 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-10-31 16:39:51 -07:00
Fanrong Li	f0dc746738	[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-31 14:38:31 -07:00
dongfengy	0edba5a7e2	[https://nvbugs/5474119 ][fix] Re-enable test (#8809 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-10-31 10:17:58 -07:00
Patrice Castonguay	afa75c9494	[https://nvbugs/5614506 ][chore] Adding e+p+d e2e test (#8801 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-31 09:52:42 -07:00
Anthony Chang	852e5060aa	[https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json (#8617 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-31 04:41:44 -07:00
Tailing Yuan	98453d2bb7	[None][fix] Waive layer-wise benchmark tests (#8823 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 22:51:31 -07:00
Chang Liu	3a79d03874	[https://nvbugs/5617275 ][fix] Extract py files from prebuilt wheel for editable installs (#8738 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-30 21:40:22 -07:00
Emma Qiao	aecc9655a0	[None][info] Waive failed case for main (#8826 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-30 20:43:59 -07:00
HuiGao-NV	1a338e1a05	[None][chore] use cached vila model (#8788 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-30 20:26:45 -07:00
Yuxian Qiu	025d2926df	[https://nvbugs/5599515 ][fix] Fix PP bubbles. (#8687 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-31 10:13:56 +08:00
Yilin Fan	f3224ccd32	[None][feat] Add disagg relay time to time breakdown tool (#8465 ) Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>	2025-10-30 18:21:45 -07:00
Mike Iovine	b87448b009	[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-30 15:47:04 -04:00
Tailing Yuan	ec31363a86	[None][fix] Layer wise benchmarks: use local models, lint (#8799 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 09:47:46 -07:00
Emma Qiao	9112cffaf3	[None][infra] Waive failed case for main branch (#8797 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-30 07:57:35 -07:00
Tailing Yuan	f9c7786dc8	[None][feat] Add layer wise benchmarks (#8777 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 20:29:34 +08:00
Anthony Chang	f666ad2f6b	[None][feat] Autotuner can iterate through all tactics for test purposes (#8663 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-30 13:11:25 +01:00
Emma Qiao	a5cc9fe0aa	[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. (#6256 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-10-30 01:56:04 -07:00
ChristinaZ	13cfd70f57	[None][feat] Add unit tests and revision in block_level kernel for invalid input (#8718 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-30 16:42:18 +08:00
WeiHaocheng	cc286687c4	[None][feat] Refactor scaffolding streaming feature and fix openai wo… (#8622 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-10-30 16:02:40 +08:00
xinhe-nv	a4f75399b9	[https://nvbugs/5481206 ][fix] update waives (#8774 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-10-30 00:43:38 -07:00
Leslie Fang	2072185d76	[https://nvbugs/5608461 ][fix] exclude InductorSubproc from thread leak check (#8704 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-10-30 15:35:15 +08:00
Emma Qiao	7d3cebf34e	[None][infra] Unwaive the tests passed in latest CI and disable a perf stage (#8775 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-30 12:48:23 +08:00
Emma Qiao	db99a936b0	[TRTLLM-8971][infra] Update gpu key for B300/GB300 (#8724 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-29 20:36:44 -07:00
Yuxian Qiu	3176bd3815	[None][fix] Fix UnboundLocalError. (#8756 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-29 19:41:37 -07:00
HuiGao-NV	ae57738bae	[https://nvbugs/5547414 ][fix] Use cached models (#8755 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-29 19:10:10 -07:00
Iman Tabrizian	ae6875fe10	[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-10-29 08:04:26 -07:00

... 2 3 4 5 6 ...

2188 Commits