TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
yufeiwu-nv	b4d17d1a4c	[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>	2025-11-03 13:34:06 +08:00
Chang Liu	f57dc01e6f	[https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input (#8846 )	2025-11-02 17:44:08 -08:00
qsang-nv	0f42a24f45	[None][feat] Fix attention sink load in xqa (#8836 ) Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>	2025-11-03 09:39:45 +08:00
dongfengy	6d6797c792	[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-11-02 16:44:02 -08:00
Eran Geva	f8778230e3	[#8781 ][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang (#8803 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-02 15:30:39 +02:00
Yanchao Lu	da73410d3b	[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package (#8857 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-02 09:57:37 +08:00
Robin Kobus	1b3ad7259d	[None][feat] Use ruff for formatting and linting new files by default (#8629 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-01 16:11:40 +01:00
Yan Chunwei	1551ed8e5f	[https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests (#8567 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-11-01 06:49:33 -07:00
Bo Li	4c5a8f4ec6	[None][fix] Rename: slot_count -> invalid_expert_id (#8783 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-01 21:36:59 +08:00
QI JUN	89e0117097	[TRTLLM-8836][chore] Create ModelEngine from LlmArgs (#8600 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-01 05:26:06 -07:00
brb-nv	d798d66976	[TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP (#8145 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-10-31 17:32:39 -07:00
dongxuy04	bba2519726	[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests (#8720 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-10-31 16:39:51 -07:00
Fanrong Li	f0dc746738	[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-31 14:38:31 -07:00
Matt Lefebvre	da2dca58aa	[TRTINFRA-7215][infra] Add support for enroot SLURM clusters (#8770 ) Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-10-31 12:22:21 -07:00
dongfengy	0edba5a7e2	[https://nvbugs/5474119 ][fix] Re-enable test (#8809 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-10-31 10:17:58 -07:00
dongfengy	6424f7e55f	[None][doc] Clarify the perf best practice and supported hardware for gptoss (#8665 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-10-31 10:11:59 -07:00
Patrice Castonguay	afa75c9494	[https://nvbugs/5614506 ][chore] Adding e+p+d e2e test (#8801 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-31 09:52:42 -07:00
Suyog Gupta	3d0e38e074	[None][perf] AutoDeploy optimize _get_unique_value (#8822 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-10-31 04:57:10 -07:00
Anthony Chang	852e5060aa	[https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json (#8617 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-31 04:41:44 -07:00
Tailing Yuan	98453d2bb7	[None][fix] Waive layer-wise benchmark tests (#8823 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 22:51:31 -07:00
Chang Liu	3a79d03874	[https://nvbugs/5617275 ][fix] Extract py files from prebuilt wheel for editable installs (#8738 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-30 21:40:22 -07:00
Emma Qiao	aecc9655a0	[None][info] Waive failed case for main (#8826 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-30 20:43:59 -07:00
HuiGao-NV	1a338e1a05	[None][chore] use cached vila model (#8788 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-30 20:26:45 -07:00
Yukun He	1d4a186ace	[https://nvbugs/5623960 ][fix] Compress the warning log of AutoTuner when encountering tactic failures. (#8793 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-31 11:09:14 +08:00
Zhanrui Sun	a6a3de8e35	[TRTLLM-9003][infra] Add python OpenSearchDB query / push. (#8506 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-10-30 19:43:51 -07:00
Yuxian Qiu	025d2926df	[https://nvbugs/5599515 ][fix] Fix PP bubbles. (#8687 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-31 10:13:56 +08:00
Yilin Fan	f3224ccd32	[None][feat] Add disagg relay time to time breakdown tool (#8465 ) Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>	2025-10-30 18:21:45 -07:00
Zhenhuan Chen	603ec03fb1	[https://nvbugs/5575687 ][fix] fix moe_gemm's preexit position that cause illegal memory access (#8786 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-10-31 09:08:23 +08:00
yuanjingx87	fe670af65f	[None][infra] Update allow list 20251030 (#8808 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-10-30 16:41:52 -07:00
Mike Iovine	b87448b009	[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-30 15:47:04 -04:00
Chenghao Zhang	71c5576a44	[TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE (#8737 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-10-30 12:33:08 -07:00
Tailing Yuan	ec31363a86	[None][fix] Layer wise benchmarks: use local models, lint (#8799 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 09:47:46 -07:00
Emma Qiao	9112cffaf3	[None][infra] Waive failed case for main branch (#8797 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-30 07:57:35 -07:00
Zhanrui Sun	547d799111	[TRTLLM-8930][infra] Force Blossom perf test stages to use 'tensorrt/test_type: perf' in the K8S template (#8752 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-10-30 06:30:10 -07:00
Tailing Yuan	f9c7786dc8	[None][feat] Add layer wise benchmarks (#8777 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-10-30 20:29:34 +08:00
Anthony Chang	f666ad2f6b	[None][feat] Autotuner can iterate through all tactics for test purposes (#8663 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-30 13:11:25 +01:00
Emma Qiao	a5cc9fe0aa	[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. (#6256 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com>	2025-10-30 01:56:04 -07:00
ChristinaZ	13cfd70f57	[None][feat] Add unit tests and revision in block_level kernel for invalid input (#8718 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-30 16:42:18 +08:00
WeiHaocheng	cc286687c4	[None][feat] Refactor scaffolding streaming feature and fix openai wo… (#8622 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-10-30 16:02:40 +08:00
xinhe-nv	a4f75399b9	[https://nvbugs/5481206 ][fix] update waives (#8774 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-10-30 00:43:38 -07:00
Leslie Fang	2072185d76	[https://nvbugs/5608461 ][fix] exclude InductorSubproc from thread leak check (#8704 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-10-30 15:35:15 +08:00
Void	6b755fd9f8	[None][fix] fix runtime error that bf16 input is not quantized to nvfp4 when use bf16 dispatch (#8507 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-10-30 15:06:54 +08:00
yuanjingx87	e689a73c83	[None][infra] fix slurm results path (#8751 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-10-30 13:09:46 +08:00
Emma Qiao	7d3cebf34e	[None][infra] Unwaive the tests passed in latest CI and disable a perf stage (#8775 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-30 12:48:23 +08:00
Yi Zhang	496b419791	[None][doc] Add doc for torch.compile & piecewise cuda graph (#8527 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2025-10-29 21:15:46 -07:00
Emma Qiao	db99a936b0	[TRTLLM-8971][infra] Update gpu key for B300/GB300 (#8724 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-29 20:36:44 -07:00
Yuxian Qiu	3176bd3815	[None][fix] Fix UnboundLocalError. (#8756 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-29 19:41:37 -07:00
HuiGao-NV	ae57738bae	[https://nvbugs/5547414 ][fix] Use cached models (#8755 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-29 19:10:10 -07:00
Sharan Chetlur	a2e964d9a8	[None][doc] Minor doc update to disagg-serving (#8768 ) Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-10-29 17:38:06 -07:00
Simeng Liu	834a780655	[https://nvbugs/5599086 ][fix] Fix FP8 Linear module for spark (#8707 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-10-29 13:58:19 -07:00

1 2 3 4 5 ...

3447 Commits