TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
ChristinaZ	c5fb692a7d	Refactor the rest routing part for the routing kernels in the MoE TRT-LLM backend (#5771 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-07-11 16:37:56 +08:00
Shi Xiaowei	37293e4dfd	blog: add qwen3 disagg perf metrics (#5822 )	2025-07-11 16:41:45 +09:00
William Tambellini	fbb4cc7379	[TRTLLM-4770][feat] Enhance cpp executor cmake to listen to ENABLE_MU… (#5104 ) ...LTI_DEVICE Signed-off-by: William Tambellini <wtambellini@sdl.com>	2025-07-11 10:59:44 +08:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
Void	854655f2f7	deepEP fp4 post quant all2all dispatch (#5881 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-07-11 08:18:54 +08:00
Frank	aa4eebe973	[enhance] Add the ability to write a request timeline. (#5258 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>	2025-07-10 17:15:30 -07:00
Zhihan Jiang	682acd40da	[nvbugs/5321981] Cherrypick fix: Fix the Llama3.1 405B hanging issue. (#5698 ) (#5925 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-11 07:51:43 +08:00
2ez4bz	c19840235d	[fix] Fix mistral unit tests due to transformers upgrade (#5904 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-10 10:45:27 -07:00
Iman Tabrizian	c32c9e2fad	doc: Add instructions for running gemma in disaggregated serving (#5922 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-10 10:21:19 -07:00
Linda	4d071eb2d1	feat: binding type build argument (pybind, nanobind) (#5802 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-11 00:48:50 +09:00
wili	2e3cf42e03	[refactor] Simplification of Speculative decoding configs (#5639 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-10 11:37:30 -04:00
Zhanrui Sun	67a39dbd63	infra: [TRTLLM-6054][TRTLLM-5804] Fix two known NSPECT high vulnerability issues and reduce image size (#5434 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-07-10 23:24:46 +09:00
narutolhy	41ef1ade19	feat:enable kvcache to be reused during request generation (#4028 ) Signed-off-by: narutolhy <582909902@qq.com>	2025-07-10 22:18:01 +09:00
Kaiyu Xie	7b09a415c1	fix: Make the bench serving script compatible with different usages (#5905 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-10 19:36:26 +08:00
Jinyang Yuan	8b9a030a5c	[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr (#5900 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-07-10 20:07:32 +09:00
Yiqing Yan	3aa53ec36c	[None] - Waive L0 tests (#5915 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-10 18:33:17 +08:00
Enwei Zhu	055c4a9fe6	[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-10 16:30:00 +08:00
CarstyYou	dc32f9ae73	[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-07-10 15:16:18 +08:00
Anthony Chang	7d21b55b5a	[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-07-10 14:06:50 +08:00
Aurelien Chartier	3ec3ff1d82	chore: remove support for llmapi + TRT backend in Triton (#5856 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-09 21:30:34 -07:00
QI JUN	e289a98d5a	avoid nesting NCCL group in allgather and reduce scatter OPs (#5866 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-10 12:32:59 +09:00
Yan Chunwei	07f6da763d	[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-10 11:31:35 +08:00
Hanjun Cho	6490a27ad7	[feat] Add TensorRT-Engine Qwen3 (dense) model support (#5650 ) Signed-off-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal> Signed-off-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>	2025-07-10 10:26:06 +08:00
Venky	f57b3d6829	Waive unittest failures introduced by PR#5345 (removal of `ScaffoldingOutput` class) (#5886 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-10 09:53:31 +08:00
peaceh-nv	76c3a12bcb	[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-10 09:20:30 +08:00
brb-nv	3209b31665	feat: Custom masking utils for Gemma3 VLM (#5853 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 06:18:04 +09:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
Chang Liu	b61a717275	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
Wanli Jiang	3f7cedec7c	Update transformers to 4.53.0 (#5747 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:32:24 -07:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00
peaceh-nv	52684d79f7	Fix : fix moe regression for sm120 (#5823 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-09 21:25:11 +08:00
tomeras91	5aa958a11a	[TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H (#5371 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-09 11:30:15 +03:00
ixlmar	10e686466e	fix: use current_image_tags.properties in rename_docker_images.py (#5846 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-09 17:07:52 +09:00
Omer Ullman Argov	a32f7083b4	[ci] parallelize torch unittests (#5714 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-09 11:05:57 +03:00
Dom Brown	3e3b1769ad	[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-07-09 08:21:58 +01:00
dongxuy04	dd3c736c7e	chore: some refactor on WideEP (#5727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-07-09 14:26:57 +08:00
chenfeiz0326	64fd64fcf2	[TRTLLM-6262] Fix Llama4 Scout FP4 crash issue (#5834 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-07-09 14:23:21 +08:00
Chang Liu	4df5f96c8d	[Bugfix] LLama4: fix for llama4 multimodal support (#5809 )	2025-07-09 13:03:40 +09:00
Erin	e277766f0d	chores: merge examples for v1.0 doc (#5736 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-08 21:00:42 -07:00
Xianjie Qiao	5ab1cf5ae6	Remove unnecessary benchmarking results (#5852 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-07-09 11:19:06 +08:00
Lucas Liebenwein	d14dd2f597	[AutoDeploy] re-enable waive for flaky AD test (#5867 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-07-09 11:47:48 +09:00
Bo Li	9d894bc0cb	fix: [https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. (#5842 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-09 10:17:05 +08:00
brb-nv	2bd09ed2d4	fix: Skip rope scaling for local layers in Gemma3 VLM (#5857 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-09 10:10:33 +08:00
jiahanc	c24eb67054	Doc: fix link in llama4 Maverick example (#5864 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-07-09 11:09:58 +09:00
Wanli Jiang	e1fb1de4d9	feat: TRTLLM-6224 update xgrammar version to 0.1.19 (#5830 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:59:14 +08:00
Jhao-Ting Chen	e4c777df7d	Add is_fp8_output key to XQA kernel cubin hashing (solves Eagle3-one-engine Hopper fp8 bug) (#5813 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-07-09 09:26:27 +08:00
Venky	e27215ca03	test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 18:16:21 -07:00
jiahanc	607bf4c395	Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide (#5810 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-07-09 10:10:02 +09:00
Omer Ullman Argov	d6d2ab2c99	[fix] Catch inference failures in `trtllm-bench` (#5841 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-09 03:53:03 +03:00
xavier-nvidia	b6013da198	Fix GEMM+AR fusion on blackwell (#5563 ) Signed-off-by: xsimmons <xsimmons@nvidia.com>	2025-07-09 08:48:47 +08:00

1 2 3 4 5 ...

1764 Commits