TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Iman Tabrizian	c32c9e2fad	doc: Add instructions for running gemma in disaggregated serving (#5922 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-10 10:21:19 -07:00
Linda	4d071eb2d1	feat: binding type build argument (pybind, nanobind) (#5802 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-11 00:48:50 +09:00
wili	2e3cf42e03	[refactor] Simplification of Speculative decoding configs (#5639 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-10 11:37:30 -04:00
Zhanrui Sun	67a39dbd63	infra: [TRTLLM-6054][TRTLLM-5804] Fix two known NSPECT high vulnerability issues and reduce image size (#5434 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-07-10 23:24:46 +09:00
narutolhy	41ef1ade19	feat:enable kvcache to be reused during request generation (#4028 ) Signed-off-by: narutolhy <582909902@qq.com>	2025-07-10 22:18:01 +09:00
Kaiyu Xie	7b09a415c1	fix: Make the bench serving script compatible with different usages (#5905 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-10 19:36:26 +08:00
Jinyang Yuan	8b9a030a5c	[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr (#5900 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-07-10 20:07:32 +09:00
Yiqing Yan	3aa53ec36c	[None] - Waive L0 tests (#5915 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-10 18:33:17 +08:00
Enwei Zhu	055c4a9fe6	[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-10 16:30:00 +08:00
CarstyYou	dc32f9ae73	[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-07-10 15:16:18 +08:00
Anthony Chang	7d21b55b5a	[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-07-10 14:06:50 +08:00
Aurelien Chartier	3ec3ff1d82	chore: remove support for llmapi + TRT backend in Triton (#5856 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-09 21:30:34 -07:00
QI JUN	e289a98d5a	avoid nesting NCCL group in allgather and reduce scatter OPs (#5866 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-10 12:32:59 +09:00
Yan Chunwei	07f6da763d	[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-10 11:31:35 +08:00
Hanjun Cho	6490a27ad7	[feat] Add TensorRT-Engine Qwen3 (dense) model support (#5650 ) Signed-off-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal> Signed-off-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>	2025-07-10 10:26:06 +08:00
Venky	f57b3d6829	Waive unittest failures introduced by PR#5345 (removal of `ScaffoldingOutput` class) (#5886 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-10 09:53:31 +08:00
peaceh-nv	76c3a12bcb	[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-10 09:20:30 +08:00
brb-nv	3209b31665	feat: Custom masking utils for Gemma3 VLM (#5853 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 06:18:04 +09:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
Chang Liu	b61a717275	[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396 )	2025-07-10 05:12:53 +09:00
Wanli Jiang	3f7cedec7c	Update transformers to 4.53.0 (#5747 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:32:24 -07:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00
peaceh-nv	52684d79f7	Fix : fix moe regression for sm120 (#5823 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-09 21:25:11 +08:00
tomeras91	5aa958a11a	[TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H (#5371 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-09 11:30:15 +03:00
ixlmar	10e686466e	fix: use current_image_tags.properties in rename_docker_images.py (#5846 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-09 17:07:52 +09:00
Omer Ullman Argov	a32f7083b4	[ci] parallelize torch unittests (#5714 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-09 11:05:57 +03:00
Dom Brown	3e3b1769ad	[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-07-09 08:21:58 +01:00
dongxuy04	dd3c736c7e	chore: some refactor on WideEP (#5727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-07-09 14:26:57 +08:00
chenfeiz0326	64fd64fcf2	[TRTLLM-6262] Fix Llama4 Scout FP4 crash issue (#5834 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-07-09 14:23:21 +08:00
Chang Liu	4df5f96c8d	[Bugfix] LLama4: fix for llama4 multimodal support (#5809 )	2025-07-09 13:03:40 +09:00
Erin	e277766f0d	chores: merge examples for v1.0 doc (#5736 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-08 21:00:42 -07:00
Xianjie Qiao	5ab1cf5ae6	Remove unnecessary benchmarking results (#5852 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-07-09 11:19:06 +08:00
Lucas Liebenwein	d14dd2f597	[AutoDeploy] re-enable waive for flaky AD test (#5867 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-07-09 11:47:48 +09:00
Bo Li	9d894bc0cb	fix: [https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. (#5842 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-09 10:17:05 +08:00
brb-nv	2bd09ed2d4	fix: Skip rope scaling for local layers in Gemma3 VLM (#5857 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-09 10:10:33 +08:00
jiahanc	c24eb67054	Doc: fix link in llama4 Maverick example (#5864 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-07-09 11:09:58 +09:00
Wanli Jiang	e1fb1de4d9	feat: TRTLLM-6224 update xgrammar version to 0.1.19 (#5830 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:59:14 +08:00
Jhao-Ting Chen	e4c777df7d	Add is_fp8_output key to XQA kernel cubin hashing (solves Eagle3-one-engine Hopper fp8 bug) (#5813 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-07-09 09:26:27 +08:00
Venky	e27215ca03	test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 18:16:21 -07:00
jiahanc	607bf4c395	Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide (#5810 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-07-09 10:10:02 +09:00
Omer Ullman Argov	d6d2ab2c99	[fix] Catch inference failures in `trtllm-bench` (#5841 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-09 03:53:03 +03:00
xavier-nvidia	b6013da198	Fix GEMM+AR fusion on blackwell (#5563 ) Signed-off-by: xsimmons <xsimmons@nvidia.com>	2025-07-09 08:48:47 +08:00
Fridah-nv	a79b73f577	fix: [5376140] [AutoDeploy] Update unit tests: skip all_close assert for dropout in attention, increase tolerance for rope op test (#5855 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-07-09 09:13:31 +09:00
Iman Tabrizian	c508b994b6	Fix lost requests for disaggregated serving (#5815 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-09 08:42:45 +09:00
Yan Chunwei	e50d95c40d	chore [TRTLLM-6161]: add LLM speculative decoding example (#5706 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-09 07:33:11 +08:00
Pamela Peng	da8c7372d4	[TRTLLM-5366][feat]Add support for sm121 (#5524 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.	2025-07-08 14:27:00 -07:00
Chang Liu	08a3dfeb2b	[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava (#5814 )	2025-07-08 09:53:11 -07:00
Dom Brown	e3ccca06e1	test: reduce redundant test cases for TRTLLM Gen FP8 MoE (#5845 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-07-09 00:40:33 +09:00
Kaiyu Xie	bb5b16fcb9	feat: Return context response immediately when stream_interval > 1 (#5836 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-09 00:19:57 +09:00
Yiteng Niu	3079e8cf0c	[TRTLLM-5878] update nspect version (#5832 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-07-08 22:00:09 +08:00

1 2 3 4 5 ...

1756 Commits