TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-12 14:03:48 +08:00

Author	SHA1	Message	Date
JunyiXu-nv	eefe5f2093	[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-09-02 07:08:22 -04:00
HuiGao-NV	7279297717	[None][infra] waive test case failed on post-merge (#7471 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-02 06:20:08 -04:00
aalanwyr	c3c95736a1	[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-09-02 17:21:27 +08:00
Ivy Zhang	3799e5d460	[None][test] auto reuse torch empty cache on qa test (#7421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-02 04:44:47 -04:00
Yan Chunwei	f90375f37c	[https://nvbugs/5476580 ][fix] unwaive test_nvfp4_4gpus (#7454 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-09-02 04:17:14 -04:00
Mike Iovine	b3c57a7042	[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-01 14:37:44 -04:00
Emma Qiao	01dfd3af1b	[None][infra] Waive failed case on main 0901 (#7447 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-01 23:27:24 +08:00
bhsueh_NV	16e9d1121c	[https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker (#7332 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-01 16:22:45 +08:00
nvamyt	efaefca2c8	[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302 ) Signed-off-by: nvamyt <amyt@nvidia.com>	2025-09-01 12:59:21 +08:00
Yiqing Yan	21291f3d8e	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Emma Qiao	09bca7ca82	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
peaceh-nv	f4dc1ed39c	[https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf (#6937 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	29cdcdb56a	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	d5bc5cd4f2	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
William Zhang	d15dcdc4ae	[https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests (#6909 ) This commit lowers the GPU memory allocated for KV cache in accuracy tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	ac07418968	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	b4d41d6604	[TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Yan Chunwei	612c26be22	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	cf0c47ca2d	[None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
2ez4bz	2480aedb73	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Guoming Zhang	3e99744201	[https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case (#6838 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	deba2885c1	[None][fix] fix Llama3 eagle3 test case OOM (#6832 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
xinhe-nv	7841ea6255	[None][chore] waive GB300 known issues (#6812 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Ivy Zhang	c7147d25dc	[TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
Yuan Tong	ccb800f909	[TRTLLM-7457][ci] Update unittest parallel config (#7297 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-29 09:28:04 +08:00
Emma Qiao	1e644fa28a	[None][infra] Waive failed tests on main branch 08/26 (#7346 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 00:24:08 +08:00
Neta Zmora	08f935681d	[https://nvbugs/5474453 ][fix] fix path to tested model (#7272 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-08-28 08:01:48 -04:00
Zongfei Jing	53163bf1df	[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-28 18:26:16 +08:00
QI JUN	ae89163368	[None][ci] skip TestGPTOSS (#7333 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-28 05:01:49 -04:00
William Zhang	4541655e5f	[https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests (#7274 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-28 00:03:32 -04:00
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Eran Geva	462169bfc9	[https://nvbugs/5458798 ][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-08-27 07:57:46 -07:00
QI JUN	d09add5ede	[None][ci] parallelize unit tests of auto deploy in B200 (#7291 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:32:11 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
bhsueh_NV	f167b1fd99	[https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 15:26:10 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Yuan Tong	6c7813e821	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-08-27 00:45:58 -04:00
Zhenhuan Chen	d0d8903a7f	[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config (#7089 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-26 20:58:33 -07:00
Shunkangz	ff4047414b	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-27 11:16:12 +08:00
Zhou Yuxin	ccb6aadea8	[https://nvbugs/5412456 ][fix] Remove from waives.txt (#7248 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-27 10:05:53 +08:00
Jin Li	028235404b	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-26 18:31:33 -04:00
Fridah-nv	0f947c64cb	[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-08-26 10:47:57 -07:00
Void	040f4c70d3	[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-08-27 00:13:13 +08:00
QI JUN	baef70e67e	[None][ci] move qwen3 tests from b200 to gb200 (#7257 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-26 11:50:53 -04:00
xinhe-nv	80043affb5	[None][chore] Add failed cases into waives.txt (#7251 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 17:13:44 +08:00
amitz-nv	23ed0c892d	[https://nvbugs/5477332 ][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking (#7215 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-26 10:48:58 +03:00
Zheng Duan	cf50ba2980	[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 15:34:44 +08:00
Zheng Duan	1a929a1490	[https://nvbugs/5457504 ][fix] fix kv cache event test in disaggregated worker tests (#7028 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 14:25:10 +08:00
nvamyt	d8bd8843fc	[None][test] Update qwen3 timeout to 60 minutes (#7200 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 14:18:42 +08:00
qixiang-99	b165f8bc97	fix/improve kvcache allocation in PyTorch runtime (#5933 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-08-26 12:40:22 +08:00
William Zhang	92576488d3	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-25 23:56:21 -04:00
Leslie Fang	20922b7d1f	[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-26 10:42:01 +08:00
ruodil	b845eb7a3a	[None][test] add kv cache size in bench metric and fix failed cases (#7160 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 10:10:02 +08:00
Grzegorz Kwasniewski	2101d46d68	[TRTLLM-6342][feat] TP Sharding read from the model config (#6972 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-25 15:41:27 -07:00
chenfeiz0326	6a44e5b9d1	[https://nvbugs/5440241 ][fix] Fix 70B GSM8K Accuracy drop (#6967 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-08-25 22:09:30 +08:00
Emma Qiao	200db3b809	[None][infra] Waive failed tests on main branch (#7201 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-25 09:04:37 -04:00
QI JUN	bea5e07fb7	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-25 20:52:05 +08:00
amitz-nv	a1e03af0f4	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-25 10:37:40 +03:00
Ivy Zhang	f61b74f796	[None][test] add l20 specific qa test list (#7067 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-25 12:44:08 +08:00
QI JUN	630e67b845	[None][ci] waive test_mamba2_chunk_scan_combined_prefill_chunking[seqlens1-8] (#7194 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-24 23:52:59 -04:00
Yukun He	9c5b464fe0	[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113 ) Because deep_gemm.gp8_gemm_nt will trigger many JIT processes during the inference phase, we need to sweep these shapes ahead of time. Apply the AutoTuner framework to achieve this and retain the potential capability to tune the swap_ab flag. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-25 10:48:31 +08:00
Bo Deng	c038fb3ef4	[None][chore] cherry-pick 6940 (#7097 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-25 10:28:45 +08:00
xinhe-nv	3ba9afcc7b	[None][feat] add gpt-osss tests to sanity list (#7158 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-25 10:22:07 +08:00
Bo Deng	6e131602b2	[TRTLLM-7096][infra] Testing cache transmission functionality in Python (#7025 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-25 09:47:39 +08:00
Yiqing Yan	486bc763c3	[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-24 21:09:04 -04:00
Robin Kobus	31979aefac	[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-24 20:53:17 +02:00
ajrasane	068056677f	[None][chore] Enable auto deploy accuracy test in CI (#7179 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-24 08:42:30 -07:00
Yanchao Lu	ec35481b0a	[None][infra] Prepare for single GPU GB200 test pipeline (#7073 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-24 21:46:39 +08:00
dongxuy04	19a0ea363b	[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: Dongxu Yang <dongxuy@nvidia.com> Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-08-24 08:15:29 -04:00
Iman Tabrizian	96ff82e77a	[None][fix] Waive test (#7185 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-24 10:45:11 +08:00
Izzy Putterman	b36460d7b5	[None][feat] Deepseek: Start Eagle work (#6210 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Mike Iovine <miovine@nvidia.com>	2025-08-22 12:57:17 -04:00
tomeras91	c232ba8157	[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-08-22 12:15:20 -04:00
Suyog Gupta	e3de5758a3	[#7136 ][feat] trtllm-serve + autodeploy integration (#7141 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-22 08:30:53 -07:00
QI JUN	1388e84793	[None][ci] move all B200 TensorRT test cases to post merge (#7165 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-22 06:47:23 -04:00
xinhe-nv	b8b2bd4a0a	[TRTLLM-7245][feat] add test_multi_nodes_eval tests (#7108 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-22 17:17:27 +08:00
Linda	898f37faa0	[None][feat] Enable nanobind as the default binding library (#6608 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-08-22 09:48:41 +02:00
Daniel Cámpora	099f081e03	[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-22 08:09:30 +02:00
xinhe-nv	4017f7cd6b	[None][chore] Add failed cases into waives.txt (#7109 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-22 10:39:25 +08:00
Wanli Jiang	07c711eb1f	[TRTLLM-6825][fix] Update lora for phi4-mm (#6817 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-21 22:00:04 -04:00
dominicshanshan	6f245ec78b	[None][chore] Mass integration of release/1.0 (#6864 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-22 09:25:15 +08:00
Daniel Stokes	f7c597ec40	[None][perf] Make finalize fusion part of the tactic selection logic (#6915 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-08-21 14:08:03 -07:00
Fridah-nv	e18dacc931	[#4403 ][refactor] Move fusion, kvcache, and compile to modular inference optimizer (#7057 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>	2025-08-21 10:30:36 -07:00
Emma Qiao	344bc4575d	[None][infra] Waive failed case for main branch (#7129 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-22 00:08:55 +08:00
Dimitrios Bariamis	f49dafe0da	[https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend (#6714 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-21 18:08:38 +02:00
bhsueh_NV	ba0a86e0bb	[https://nvbugs/5437405 ][fix] qwen3 235b eagle3 ci (#7000 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 01:17:32 -04:00
xinhe-nv	21f4434404	[None][chore] waive failed cases on H100 (#7084 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-21 11:15:23 +08:00
Chang Liu	75b8a90816	[None][fix] Fix llama4 multimodal by skipping request validation (#6957 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-20 21:58:53 -04:00
Yechan Kim	0893afae3d	[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-21 08:54:12 +08:00
bhsueh_NV	73d2daa386	[https://nvbugs/5457489 ][fix] unwaive some tests (#6991 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 08:49:57 +08:00
QI JUN	a918de710a	[None][ci] move some tests of b200 to post merge (#7093 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-20 19:43:40 -04:00
Emma Qiao	f84dd64250	[None][infra] Waive failed tests on main branch 8/20 (#7092 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-20 06:33:44 -04:00
Robin Kobus	b95cab2a7c	[None][ci] move unittests to sub-directories (#6635 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-20 05:42:22 -04:00
Iman Tabrizian	e27088421e	[None][infra] "[TRTLLM-6960][fix] enable scaled_mm tests (#6936 )" (#7059 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-08-20 01:45:09 -04:00
xinhe-nv	9e71b4fda4	[TRTLLM-7205][feat] add llama4 tp4 tests (#6989 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-20 13:22:05 +08:00
Leslie Fang	3f6a9267f1	[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-20 13:14:34 +08:00
Chang Liu	ce53832610	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-19 21:42:50 -07:00
Ivy Zhang	fc85e3db1c	[None][fix] fix llmapi import error (#7030 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-19 22:58:13 -04:00
Bo Deng	30da5d3cc4	[None][chore] unwaive test_disaggregated_genbs1 (#6944 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-20 09:57:35 +08:00
Yanchao Lu	d26a5a93ad	[https://nvbugs/5451296 ][bug] Cherry-pick #7017 from release/1.0 branch (#7043 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-19 11:25:05 -04:00
pcastonguay	e07fcc3a22	[https://nvbugs/5444937 ][chore] Fixing KV events tests (#7004 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-19 11:18:04 -04:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
Emma Qiao	8f95f35503	[None][infra] Waive failed tests on main (#7037 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-19 09:31:07 -04:00
Yiqing Yan	07506bccbe	[None][chore] Remove duplicate test waives (#7044 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-19 21:04:31 +08:00
Fanrong Li	655d0f48d0	[https://nvbugs/5455140 ][fix] unwaive DSR1-fp4 throughput_tp8 (#7022 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 20:48:05 +08:00
tomeras91	f0bfb49219	[https://nvbugs/5458874 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6996 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-08-19 15:45:06 +03:00
xinhe-nv	2c86cee38c	[None][chore] Remove closed bugs (#6969 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-19 16:01:33 +08:00
Shunkangz	54ec2c1af1	[None][opt] Add batch wait timeout in fetching requests (#6923 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-19 03:50:08 -04:00
Eran Geva	636c622bb8	[https://nvbugs/5458798 ][fix] Relaxed test threshold, added documentation (#6997 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-19 00:24:03 -07:00
Ivy Zhang	bff5fdf6df	[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-19 13:59:14 +08:00
William Zhang	daa2a65d37	[https://nvbugs/5454875 ][ci] Unwaive Mistral Small 3.1 test (#7011 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-19 00:32:14 -04:00
fredricz-20070104	e90280a84d	[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-19 00:13:04 -04:00
Fanrong Li	816a120af6	[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 00:03:03 -04:00
Zhenhuan Chen	2bb90ba002	[TRTLLM-6960][fix] enable scaled_mm tests (#6936 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-19 10:18:04 +08:00
Yi Zhang	a15af879ec	[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic (#6615 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-08-19 09:58:44 +08:00
Lizhi Zhou	71e28eab36	[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-19 09:58:22 +08:00
Wanli Jiang	dabebb2c7a	[https://nvbugs/5371480 ][fix] Enable test_phi3_small_8k (#6938 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-19 09:42:35 +08:00
Leslie Fang	e76e5c640f	[None][infra] Enable accuracy test for mtp and chunked prefill (#6314 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-19 07:42:52 +08:00
Yiqing Yan	1ce23545fc	[None][chore] Remove duplicate test waives (#6998 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-18 21:15:49 +08:00
Emma Qiao	69ff32f9b1	[None][infra] Waive failed tests on main 0818 (#6992 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-18 20:34:52 +08:00
Shi Xiaowei	5ec15b98f0	[TRTLLM-7030][fix] uppercase def value in pd-config (#6981 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-18 02:33:23 -04:00
Leslie Fang	ce0b13ea02	[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-18 09:18:17 +08:00
Naveassaf	d6322f70b7	[https://nvbugs/5451028 ][fix] Constrain NemotronSuper test parameters to prevent OOMs (#6970 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com>	2025-08-17 13:38:36 -04:00
amitz-nv	3a49b47081	[https://nvbugs/5390853 ][fix] Fix _test_openai_lora.py - disable cuda graph (#6965 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-17 16:56:16 +03:00
Emma Qiao	cc6d763824	[None][infra]Waive failed cases in main branch (#6951 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-17 14:27:59 +03:00
bhsueh_NV	85cbd0263b	[None][feat] Support Yarn on Qwen3 (#6785 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-17 07:21:29 +08:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
brb-nv	9505727d31	[https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests (#6952 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-15 16:35:02 -07:00
Yuening Li	1f8ae2b2db	[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow (#6629 ) Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>	2025-08-15 17:15:49 -04:00
dongfengy	0ad0b967bb	[None][fix] Make TP working for Triton MOE (in additional to EP we are using) (#6722 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-08-15 16:58:42 -04:00
ajrasane	4162d2d746	[None][test] Add accuracy evaluation for AutoDeploy (#6764 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-15 13:46:09 -04:00
yifeizhang-c	4127d77678	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-15 09:52:06 -07:00
liji-nv	18ccd053d3	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6858 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-15 11:14:20 -04:00
peaceh-nv	1c1d5d2495	[https://nvbugs/5451373 ][fix] : Fix the accuracy issue when using FP8 context MLA (#6881 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-15 16:53:56 +08:00
xinhe-nv	b23fdfc62f	[None][chore] Add failed cases into waives.txt (#6914 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-08-15 14:00:16 +08:00
Yanchao Lu	3a987891d8	[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-15 11:16:07 +08:00
Bo Deng	e54ba75dac	[None][fix] Update tests to use standardized uppercase backend identifiers (#6921 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-15 11:14:15 +08:00
Frank	2cc59aacb3	[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-08-14 22:46:20 -04:00
Aurelien Chartier	b13a5a99b2	[None][chore] Add tests for non-existent and completed request cancellation (#6840 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-14 15:57:01 -07:00
Raayan Dhar	8b237b943b	[https://nvbugs/5441714 ][chore] remove skip on disagg n-gram test (#6872 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-08-14 15:45:00 -07:00
Bo Li	26f413ad90	[https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case (#6882 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-08-14 17:46:54 -04:00
Matthias Jouanneaux	69574ad730	[TRTLLM-5966][feat] Helix: extend mapping to support different CP types (#6816 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-08-14 09:00:02 -07:00
Emma Qiao	96339c69a9	[None][infra] Waive failed cases on main (#6902 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-14 23:59:44 +08:00
Pengbo Wang @ NVIDIA	ffc976ceaf	[https://nvbugs/5445466 ][fix] fix deepseek r1 hang by not enabling mnnvl by default (#6860 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-14 22:36:56 +08:00
Shi Xiaowei	1095dfd03c	[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#6323 )	2025-08-14 03:48:57 -04:00
chenfeiz0326	5cd8c0f6cc	[None][test] Add perf-sweep scripts (#6738 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-14 14:04:47 +08:00
NVJiangShao	a700646132	[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE (#6784 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-08-14 13:35:37 +08:00
Yan Chunwei	0132c1db84	[https://nvbugs/5427043 ][fix] request length exceeds max_num_tokens (#6821 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-14 13:31:12 +08:00
Bo Deng	d8acca495b	[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-14 04:36:38 +00:00
jmydurant	4200fa46d1	[None][feat] Add support for Hopper MLA chunked prefill (#6655 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-14 10:39:26 +08:00
Izzy Putterman	ef53de8eef	[None][feat] Add test for speculative rejection sampler (2-model) (#6542 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-08-13 22:09:35 -04:00
Mike Iovine	7cba883932	[https://nvbugs/5410399 ][chore] Unwaive mtp llmapi test (#6833 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-13 17:38:45 -04:00
Emma Qiao	c7e6145409	[None][infra] Waive failed cases on main (#6863 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-13 09:50:14 -04:00
Anthony Chang	2198587b35	[https://nvbugs/5378031 ] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend (#6200 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-08-13 21:24:40 +08:00
Yukun He	bc5f766e0e	[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. (#6545 ) * Generalize the definition of tactics so that users can implement more customizable tactic types, making the configurations clearer for each kernel run. * Allow the user not to specify the `gen_tuning_buckets` or the `map_to_tuning_buckets` function. * Other code refactoring. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-13 16:25:22 +08:00
Mike Iovine	f68e03e646	[https://nvbugs/5452167 ][fix] Fix ngram padding issue (#6837 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-13 11:23:16 +08:00
Yechan Kim	12102e2d48	[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-12 19:34:02 -07:00
rakib-hasan	2923eb88a1	[None][fix] Refactoring input prep to allow out-of-tree models (#6497 ) Signed-off-by: Rakib Hasan <rhasan@nvidia.com>	2025-08-12 20:29:10 -04:00
xinhe-nv	e35fca4272	[TRTQA-2920][chore] improve hang tests (#6781 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-12 18:26:51 +08:00
Sergey Klevtsov	27fc35175e	[None][feat] CUTLASS MoE FC2+Finalize fusion (#3294 ) Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>	2025-08-12 15:56:48 +08:00
Fridah-nv	0dc4b4e699	[#4403 ][autodeploy] Refactor: Move more transformations to new inf optimizer, Add quantization_source to factory interface (#6760 ) Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>	2025-08-11 22:02:46 -07:00
Enwei Zhu	7c686ba8de	[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill (#6774 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-12 09:30:06 +08:00
Ziyi Xiong	b4fcd5f592	[https://nvbugs/5441438 ][fix] Set correct draft length for the cuda graph dummy request (#6701 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-08-12 09:28:47 +08:00
Jinyang Yuan	ead89a0e40	[None][perf] Improve the performance of online EPLB on Hopper by better overlapping (#6624 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-08-12 09:25:13 +08:00
Chang Liu	be9dd4713c	[https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version (#6673 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-11 17:16:49 -07:00
Aurelien Chartier	56bfc3a6d2	[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically (#6763 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-11 15:18:19 -07:00
rakib-hasan	7ab8112450	[None][fix] Refactoring to avoid circular import when importing torch models (#6720 ) Signed-off-by: Rakib Hasan <rhasan@nvidia.com>	2025-08-11 18:00:42 -04:00
Emma Qiao	5145e9d40e	[None][infra] Unwaive an updated case to test (#6791 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 06:47:33 -04:00
Emma Qiao	d6ad4a9d5b	[None][infra] Waive failed tests on main 0811 (#6778 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 03:16:25 -04:00
xinhe-nv	9c358c26e4	[None][chore] remove closed bugs (#6772 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-11 14:39:58 +08:00
Eran Geva	b3e8fa2960	[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu (#6487 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2025-08-11 08:33:13 +03:00
Tracin	49bcaa4e95	Add gpt-oss GSM8K test. (#6732 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-08-10 22:45:43 -04:00
Chuang Zhu	c566a8d2a2	[None][fix] fix same pp disagg (#6730 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-08-10 22:45:15 -04:00
Bo Deng	767879ef85	[https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6736 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-11 10:05:10 +08:00
Yechan Kim	60073a7ad9	[None][feat] Support SharedTensor on MultimodalParams (#6254 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-10 17:48:24 -07:00
pcastonguay	4142320e53	[https://nvbugs/5444937 ][fix] Fixing kv_cache_event unit test (#6753 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-10 16:45:38 -07:00
shaharmor98	14b36e07d7	[TRTLLM-6174][feat] Enable FP32 mamba ssm cache (#6574 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-08-10 16:27:51 -04:00
Gal Hubara-Agam	3c5aec19c2	[#5048 ][enhance] AutoDeploy: Optimize prepare_inputs (#6634 ) Optimize prepare_inputs routine in AutoDeploy, as part of the effort to reduce the performance gap compared to the default backend. This PR includes two major fixes, and some other minor tweaks: 1. Avoid back and forth data copies 2. Optimize position ids update by separating the implementation for generation mode and context mode. Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-10 13:55:04 +03:00
Emma Qiao	ee19ca5e58	[None][infra] Waive test main 0808 (#6751 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-09 23:54:07 -04:00
Ye Zhang	bcf5ec0c9a	[None][feat] Core Metrics Implementation (#5785 ) Signed-off-by: Ye Zhang <zhysishu@gmail.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-09 02:48:53 -04:00
Stefan Niebler	b8f036f264	[TRTLLM-6650][fix] Enhance CUDA graph + Beam search to correctly handle padding (#6665 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-08-08 14:00:33 +02:00
Leslie Fang	294e0d3dab	[https://nvbugs/5436461 ][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI (#6631 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-08 15:30:47 +08:00
Li Min	d913955952	[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-08-08 15:03:48 +08:00
ruodil	b15d6fb145	[None][test] fix yml condition error under qa folder (#6734 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-08 15:59:01 +10:00
2ez4bz	064eb7a70f	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-08 01:50:36 -04:00
Enwei Zhu	aee828d98a	[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-08 12:10:36 +08:00
ruodil	22f45a0e19	[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 22:57:04 -04:00
xinhe-nv	88ced50ca7	[TRTQA-2920][fix] Add failed cases into waives.txt (#6719 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-08-08 12:54:13 +10:00
Daniel Cámpora	efca359b66	[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-07 22:19:37 -04:00
Iman Tabrizian	82276167e6	[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-07 17:28:14 -07:00
Haohang Huang	980929e1a9	[https://nvbugs/5410687 ][fix] Hopper w4a8 groupwise MoE interleave (#6708 ) Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-08-07 15:30:16 -07:00
Yuan Tong	db8dc97b7b	[None][fix] Migrate to new cuda binding package name (#6700 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-07 16:29:55 -04:00
Raayan Dhar	4055b764db	[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489 ) Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>	2025-08-07 11:18:02 -04:00
pcastonguay	453a06e6ab	[TRTLLM-6881][feat] Include attention dp rank info with KV cache events (#6563 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-07 14:17:07 +02:00
Enwei Zhu	1b9781e8e7	[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-07 05:53:48 -04:00
peaceh-nv	8ec3b1de10	[None][feat] : Add FP8 context MLA support for SM120 (#6059 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-07 16:16:34 +08:00
xinhe-nv	0a467b00cc	[https://nvbugs/5409414 ][fix] fix Not registered specs (#6660 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 17:55:53 +10:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
ruodil	6c1f7d8b91	[None][test] correct test-db context for perf yaml file (#6686 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 02:47:10 -04:00
amitz-nv	85af62184b	[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6510 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-07 09:05:36 +03:00
YueWeng	157ea77549	[https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-08-07 10:25:17 +08:00
ruodil	780d7507f9	[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:02:13 +10:00
ruodil	f30398470d	[None][chore] update readme for perf release test (#6664 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:00:45 +10:00
Yan Chunwei	5eae3184fa	[None][chore] add missing tests to test list (#6590 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-06 22:12:27 +08:00
Yechan Kim	1aed7511fe	[https://nvbugs/5430124 ][fix] Mistral mixture_text_image test case fix (#6648 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-06 06:58:58 -07:00
Iman Tabrizian	13ecb4aced	[https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests (#6644 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-06 09:08:29 -04:00
Pengyun Lin	79fc2f48c0	[None][chore] Enhance trtllm-serve example test (#6604 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-06 20:30:35 +08:00
Zongfei Jing	0ff8df95b7	[https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA (#6588 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-06 16:44:21 +08:00
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
Iman Tabrizian	43bd861ce1	Update allreduce benchmark for torch (#6271 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-05 23:25:23 -07:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
yunruis	3ff4f503ad	[None][opt] ADP schedule balance optimization (#6061 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-06 09:38:02 +08:00
Yechan Kim	c17f4984e2	[None][feat] Refactor Llava-Next (#6478 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-05 17:53:53 -07:00
Aurelien Chartier	6da95f29a9	[None][feat] Add support for fused gate_up_proj scales for FP8 blockwise (#6496 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-05 11:22:32 -07:00
ixlmar	1ebceb790d	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-08-05 18:27:43 +01:00
liji-nv	dcbfa7e509	[https://nvbugs/5252313 ][fix] Fix torch compile + MTP (#6554 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-05 10:31:29 -04:00
Venky	61da2daeb4	[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 07:14:24 -07:00
Emma Qiao	78a75c2990	[None][Infra] - Split gb200 stages for each test (#6594 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-05 07:10:00 -04:00
xinhe-nv	c32584125e	[TRTQA-2920][fix] Add failed cases into waives.txt (#6600 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA	c289880afb	[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-05 18:05:33 +08:00
Ivy Zhang	08ed9d7305	[None][doc] add introduction doc on qa test (#6535 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 17:02:17 +08:00
Ivy Zhang	d101a6cebc	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 16:39:25 +08:00
Haohang Huang	c9eebcb454	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 ) Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com> Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>	2025-08-05 07:47:41 +00:00
Leslie Fang	164acfa31e	[None][infra] Skip test_eagle3 test with device memory check (#6617 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-05 02:36:03 -04:00
ruodil	7625845365	test: add README_release_test.md for perf test (#6443 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-05 02:07:42 -04:00
xinhe-nv	a178cea324	[TRTLLM-6856][feat] add disaggregated serving tests to QA list (#6536 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 12:47:53 +10:00
xinhe-nv	fe3d607c4b	[TRTQA-2920][fix] Add failed cases into waives.txt (#6581 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-05 12:41:23 +10:00
brb-nv	6135f75f87	[None][chore] Update Gemma3 closeness check to mitigate flakiness (#6591 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-04 10:10:58 -04:00
Olya Kozlova	13cc1c4878	[TRTLLM-5271][feat] best_of/n for pytorch workflow (#5997 ) Signed-off-by: Olya Kozlova <okozlova@nvidia.com>	2025-08-04 14:08:06 +02:00
Ivy Zhang	f3651adea8	[None][test] update invalid test name (#6596 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 08:01:05 -04:00
Emma Qiao	5d8a5a0cb8	[None][Infra]Waive failed case in post-merge on main (#6602 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-04 19:39:44 +08:00
brb-nv	87e4e9f468	[None][chore] Add unit test for Gemma3 lora (#6560 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-04 04:56:57 -04:00
Pengyun Lin	a15e33351d	[None][fix] Revert commit `48ddc3d` & add test for disagg server with different max_num_tokens (#6259 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-04 15:09:51 +08:00
xinhe-nv	a54972e463	[None][fix] remove closed bugs (#6576 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:52:11 +10:00
Yuan Tong	a2f271c8e0	[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-04 13:51:01 +08:00
Leslie Fang	b9fe0fa7ec	[None][infra] Enable test of chunked prefill with logit post processor (#6483 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-04 01:46:07 -04:00
Leslie Fang	a60190836c	[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-04 01:45:24 -04:00
ruodil	6459725bf9	test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) (#6533 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 15:22:39 +10:00
Ivy Zhang	5eefdf2c75	tests: Add llama4 functional cases (#6392 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-04 11:19:58 +08:00
ruodil	8d82ccca63	test: modify max_lora_rank of phi4_multimodal to 320 (#6474 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-04 12:20:22 +10:00
Yechan Kim	ee6ab5be96	chore: add EXAONE4 accuracy test (#6397 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-04 10:14:16 +08:00
Ivy Zhang	7547a7d0a2	[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list (#6436 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-03 22:11:26 -04:00
Yiqing Yan	3f7abf87bc	[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 (#5678 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-03 11:18:59 +08:00
Jhao-Ting Chen	4da5cfc511	[None][infra] add eagle3 one model accuracy tests (#6264 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-08-02 16:07:46 -07:00
Shunkangz	67a3fd858b	[None][feat] Add support of scheduling attention dp request (#6246 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-01 20:38:01 -04:00
Richard Huo	31802de0b0	[None][fix] Serialize the window_size in the kv event (#6526 ) Signed-off-by: richardhuo-nv <rihuo@nvidia.com>	2025-08-01 15:25:18 -07:00
Lizhi Zhou	6f34f3489b	[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-01 13:33:34 -04:00
xinhe-nv	263c6c0ad0	test: skip post blackwell (#6357 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-01 13:10:14 -04:00
Lucas Liebenwein	5247df6ae2	[AutoDeploy] merge feat/ad-2025-07-22 (#6520 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Gal Agam <ghubaraagam@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: haoguo <67671475+h-guo18@users.noreply.github.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Gal Agam <ghubaraagam@cw-dfw-h100-004-328-012.cm.cluster> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-08-01 08:51:08 -07:00
Emma Qiao	16febefee0	[None][Infra] - Skip failed tests in post-merge (#6558 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-01 22:21:23 +08:00
brb-nv	7447d6ed85	[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-01 09:19:54 -04:00
liji-nv	1daa8c3232	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-01 07:38:06 -04:00
xinhe-nv	fca0d37798	[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 (#6552 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-01 20:27:22 +10:00
chenfeiz0326	ba5bdbb138	[None][chore] Disable add special tokens for Llama3.3 70B (#6482 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-08-01 17:03:27 +08:00
Yukun He	90856bf97d	[https://nvbugs/5419069 ][fix] Fix the mismatched layer name components. (#6417 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-01 16:32:39 +08:00
Yang Li	ac23f4a80d	[TRTLLM-4279] fix: Add a protection test for checking trtllm custom ops (#6515 ) Signed-off-by: Yang Li <56944310+yali-arch@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-08-01 15:59:09 +08:00
Ivy Zhang	71524a1a48	[https://nvbugs/5419066 ][fix] Use trt flow LLM (#6467 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-01 03:33:07 -04:00
Venky	ad5742b105	[fix] Update get_trtllm_bench_build_command to handle batch size and tokens (#6313 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-01 00:08:09 -04:00
Zongfei Jing	7bb0a78631	Deepseek R1 FP8 Support on Blackwell (#6486 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-01 10:26:28 +08:00
brb-nv	2eca0d5925	fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-31 17:18:23 -07:00
Simeng Liu	8cf3faa26a	[feat] Auto-enable ngram with concurrency <= 32. (#6232 ) Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Mike Iovine <mike.iovine7@gmail.com> Co-authored-by: Mike Iovine <miovine@nvidia.com> Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>	2025-07-31 18:45:51 -04:00
Ziyi Xiong	8062e0fe7c	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-31 15:31:39 -04:00
tomeras91	6d5da9f7c2	[https://nvbugs/5404046 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6485 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-31 21:35:10 +03:00
shaharmor98	0c42f54a39	Bugfix/fix nemotron nas lora support (#6380 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-07-31 13:39:35 -04:00
amitz-nv	1ee7a08d2b	[5830][feat] Improve LoRA cache memory control (#6220 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-07-31 09:26:38 +03:00
Faraz	8e84df74b5	Fix e2e test failure for RTX6000 Pro (#6420 ) Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>	2025-07-30 23:32:44 -04:00
xinhe-nv	ca534e4798	test: add accuracy reference (#6479 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-31 12:27:29 +10:00
bhsueh_NV	ae3a5fc918	[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-31 09:37:23 +08:00
brb-nv	0e16d1f070	test: Add time logging for lora tests (#6466 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 14:02:43 -07:00
Anurag Mukkara	fac186e3b5	[nvbug/5409417] Unwaive llava test case (#6460 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-07-30 14:38:47 -04:00
brb-nv	f6287e4498	Unwaive Gemma2 LoRA test on H100 (#6461 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 12:56:12 -04:00
Bo Deng	24e7f4eece	[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-31 00:41:37 +08:00
Wanli Jiang	9632dba02e	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-30 09:20:16 -07:00
pcastonguay	0f083b9daf	fix: Unwaive triton cpp test [nvbug 5401088] (#6412 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-30 11:25:18 -04:00
nv-guomingz	03e38c9087	chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 11:11:06 -04:00
Chang Liu	b4065d8ca6	[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-07-30 10:00:15 -04:00
pcastonguay	e7ae5e2824	feat: Add support for disaggregation with pp with pytorch backend (#6369 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-07-30 09:42:13 -04:00
tomeras91	a2514d93fc	[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-30 07:22:32 -04:00
Yechan Kim	22b29df38c	[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 17:29:55 +08:00
xinhe-nv	d9ab3fd35e	tests: add TestNemotronH cuda graph tests (#6390 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 18:45:58 +10:00
nv-guomingz	a5540acfce	chore: add trtllm-serve json schema example into doc. (#6418 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 04:33:08 -04:00
2ez4bz	d6eed1b624	[fix] Switch placement of image placeholder for mistral 3.1 (#6435 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-30 14:10:36 +08:00
xinhe-nv	c00d6763b2	test: [CI] Add failed cases into waives.txt (#6457 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-30 12:36:58 +10:00
Venky	ab40369053	[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 10:53:43 +10:00
Yechan Kim	d6eb8e2366	fix: support mixture of text & multimodal prompts (#6345 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 08:52:31 +08:00
Yan Chunwei	ad662ddcdd	chore: disallow arbitrary in llm_args.Configs (#6367 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-29 16:16:52 -04:00
Yan Chunwei	1a6930986a	chore: remove unused kv_cache_dtype in api reference (#6444 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-29 14:57:20 -04:00
Michal Guzek	7efe3cb0cd	[fix] Add detokenization-based stop word logic to LLM API (#5948 ) Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-07-29 10:16:59 -07:00
xinhe-nv	f1086e7d4f	test: [CI] remove closed bugs (#6381 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-29 19:01:23 +10:00
xinhe-nv	4fbb344caf	test: [CI] Add failed cases into waives.txt (#6423 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-29 19:00:30 +10:00
Yukun He	0eee2e2850	[5385981] fix: Update the usage of VisionAttention init API. (#6413 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-29 16:41:48 +08:00
ruodil	e11255e9d0	test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-29 15:52:45 +10:00
Michal Guzek	2573bb729d	feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-28 14:02:14 -07:00
Aurelien Chartier	738ab61593	[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs (#6339 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-28 12:36:44 -07:00
2ez4bz	cdca541148	[test] Unwaive mistral3.1 small E2E test (#6352 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 14:37:42 -04:00
2ez4bz	60e4d3a9d4	[test] Add accuracy regression test for Mistral3.1 (#6322 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 09:41:44 -07:00
ruodil	03632a679f	test: organize perf cases and add missing perflab cases in qa test list (#6283 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-28 20:33:32 +10:00
xinhe-nv	971be1fe86	test: waive failed cases (#6394 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-28 20:31:43 +10:00
Yan Chunwei	45d441e60c	[TRTLLM-5061] chore: add status tags to LLM API reference (#5707 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-28 15:57:07 +08:00
Ivy Zhang	2945817cae	[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-28 15:33:30 +08:00
Emma Qiao	b3ca159787	[Infa] - waive failed cases and fix a typo (#6384 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-28 02:06:57 -04:00
Chang Liu	dc757799e1	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
Yan Chunwei	908f49a4ad	[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-28 09:01:10 +08:00
Michal Guzek	08d57123f9	[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-25 18:10:40 -04:00
Iman Tabrizian	c35c78ff58	[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-25 12:47:01 -07:00
nv-guomingz	b8d4cb8beb	feat: Support JSON Schema in OpenAI-Compatible API (#6321 ) Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>	2025-07-25 12:55:56 -04:00
pcastonguay	3805976e90	fix: Fixing kv_cache_events unit tests [nvbug 5362412] (#6265 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-25 08:55:44 -04:00
xiaoqi	a0aecf0476	[feat]: support logit_bias (#5354 ) Signed-off-by: xq25478 <xq25478@qq.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-25 09:37:41 +00:00
xinhe-nv	470544cf17	test: [CI] Add failed cases into waives.txt (#6333 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-25 17:18:06 +10:00
xinhe-nv	6268a60ab3	tests: add test_chunked_prefill for llama4 (#5549 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 23:02:00 -04:00
xinhe-nv	2dcfa90e99	test: skip llama3.3 70b test on cg4 (#6293 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 19:29:56 -07:00
Mike Iovine	0f2f11f90b	[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#6104 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-24 21:50:11 -04:00
Shiyu Li	375f74ecb2	[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-07-25 08:01:40 +08:00
Stefan Niebler	0df758ec9f	[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration (#6217 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-24 18:04:41 +02:00
bhsueh_NV	7b6aadc800	[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-24 21:47:37 +08:00
Emma Qiao	0cc1f8c03d	[Infra] - Wiave failed tests in post-merge (#6331 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-24 21:18:06 +08:00
Ivy Zhang	f290108cd8	tests: only get timeout value from pytest marker (#6287 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-24 20:51:02 +08:00
liji-nv	14d94a3856	feat: Add non UB AR + Residual + Norm + Quant fusion (#6320 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-24 05:51:43 -04:00
Iman Tabrizian	5fceaa6153	Revert "tests: add timeout_manager to tensorrt flow test cases (#5942 )" (#6309 )	2025-07-23 23:58:10 -04:00
Emma Qiao	82d03ca979	[Infra] - Increase unittest execution time since some test exceeds 1600 (#6277 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-24 10:02:28 +08:00
Iman Tabrizian	7740bfa31d	Waive tests (#6312 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 18:15:07 -07:00
Lucas Liebenwein	cf4f4e8d73	[AutoDeploy] disable flaky MoE nvfp4 test (#6302 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-07-23 13:13:01 -04:00
Emma Qiao	cb737a5fcd	[Infra] - Skip failed cases (#6299 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-23 21:26:31 +08:00
Stefan Niebler	2486eb778e	[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLLM Sampler (#6223 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-23 12:30:50 +02:00
xinhe-nv	2b0fa24175	test: [CI] Add failed cases into waives.txt (#6289 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-23 19:04:21 +10:00
YueWeng	ed62a06eef	[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue (#6136 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-07-23 14:53:37 +08:00
Yechan Kim	83c3ed128b	chore: set default device to cpu on Multimodal models (#5994 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-22 21:45:31 -07:00
Venky	9538c8d0e5	Add basic Nemo Ckpt Lora Loading in pytorch flow (#6019 )	2025-07-22 19:42:45 -07:00
wili	8ecdeee300	[refactor] Simplification of Speculative decoding configs - Part 2 (#5936 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-23 09:20:27 +08:00
Iman Tabrizian	bc2fb29c5e	[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support (#6224 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 05:27:16 +08:00
Lucas Liebenwein	41fb8aa8b1	[AutoDeploy] merge feat/ad-2025-07-07 (#6196 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com> Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-07-23 05:11:04 +08:00
2ez4bz	ab7434ac62	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-22 11:06:41 -07:00
John Calderon	b7c8a672da	[Issue 6193] Fix gemma3vl weight loader (#6233 ) Signed-off-by: John Calderon <johncalesp@gmail.com>	2025-07-22 10:32:18 -07:00
Linda	60073731ca	fix: bindings unit tests for nanobind (#6221 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-22 14:51:43 +01:00
Stanley Sun	04f2d4b2eb	test: update test list for RTX6KD (#6213 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-22 18:55:24 +08:00
Pengyun Lin	48ddc3d4b9	[fix]: Revert commit `388b491` (#6143 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
pcastonguay	310bdd9830	fix: Fix triton backend build [nvbug 5396469] (#6098 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Yi Zhang	eb7d0f84b5	[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-22 12:48:00 +08:00

... 5 6 7 8 9 ...

1675 Commits