TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Omer Ullman Argov	d6d2ab2c99	[fix] Catch inference failures in `trtllm-bench` (#5841 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-09 03:53:03 +03:00
xavier-nvidia	b6013da198	Fix GEMM+AR fusion on blackwell (#5563 ) Signed-off-by: xsimmons <xsimmons@nvidia.com>	2025-07-09 08:48:47 +08:00
Fridah-nv	a79b73f577	fix: [5376140] [AutoDeploy] Update unit tests: skip all_close assert for dropout in attention, increase tolerance for rope op test (#5855 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-07-09 09:13:31 +09:00
Iman Tabrizian	c508b994b6	Fix lost requests for disaggregated serving (#5815 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-09 08:42:45 +09:00
Yan Chunwei	e50d95c40d	chore [TRTLLM-6161]: add LLM speculative decoding example (#5706 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-09 07:33:11 +08:00
Pamela Peng	da8c7372d4	[TRTLLM-5366][feat]Add support for sm121 (#5524 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.	2025-07-08 14:27:00 -07:00
Chang Liu	08a3dfeb2b	[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava (#5814 )	2025-07-08 09:53:11 -07:00
Dom Brown	e3ccca06e1	test: reduce redundant test cases for TRTLLM Gen FP8 MoE (#5845 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-07-09 00:40:33 +09:00
Kaiyu Xie	bb5b16fcb9	feat: Return context response immediately when stream_interval > 1 (#5836 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-09 00:19:57 +09:00
Yiteng Niu	3079e8cf0c	[TRTLLM-5878] update nspect version (#5832 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-07-08 22:00:09 +08:00
Raayan Dhar	e3268a4221	[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg (#5732 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-07-08 09:39:58 -04:00
Yukun He	e104f8bbb5	[5305318] fix: Fix the accuracy issue when reduce_fusion is enabled for GEMMA model. (#5801 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-08 19:51:05 +08:00
Yegor	b01d1c28f7	[feat] Detokenize option in /v1/completions request (#5382 ) Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com> Signed-off-by: Yegor Yershov <yegor6741@gmail.com>	2025-07-08 19:36:04 +08:00
Tailing Yuan	ba0aea1da6	Fix a quote error introduced in #5534 (#5816 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-07-08 18:48:32 +08:00
Yiteng Niu	541ab77189	update namelist in blossom-ci (#5838 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-07-08 18:07:07 +08:00
Yiqing Yan	ec0d7e64b9	[Infra] - Waive L0 test (#5837 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-08 17:54:06 +08:00
xinhe-nv	89bbb230cc	tests: waive failed cases on main (#5781 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-08 19:44:12 +10:00
Tailing Yuan	035155df7c	Fix: ignore nvshmem_src_*.txz from `confidentiality-scan` (#5831 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-07-08 17:17:29 +09:00
xiweny	eaf8bec88b	fix: Disaggregate serving with attention DP (#4993 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-07-08 16:15:03 +08:00
nv-guomingz	c8fa08da5c	doc: update cuda_graph_config usage part in DS R1 docs (#5796 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-08 16:54:46 +09:00
Yiqing Yan	5203a0f6df	chore: bump version to 1.0.0rc3 (#5819 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-08 16:04:40 +09:00
Enwei Zhu	55f86ce7ab	[NvBug 5362426] fix: Fix prompt adapter TP2 case (#5782 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-08 16:01:36 +09:00
Venky	9258187e98	Waive some `test_llama_eagle3` unittests (#5811 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-08 15:35:27 +09:00
Po-Wei (Vincent)	864de5b8b2	[None][infra] Set the label community action to only run on upstream TRTLLM (#5806 ) Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>	2025-07-08 15:33:42 +09:00
Zhenhuan Chen	dee6644ed9	feat(scaffolding): add streaming scaffolding_llm.generate_async support (#5345 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-08 15:08:40 +09:00
JieXin Liang	664bf95892	[fix] improve fp4_block_scale_moe_runner type check (#5681 ) Signed-off-by: JieXin Liang <Alcanderian@users.noreply.github.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>	2025-07-08 14:32:14 +09:00
liji-nv	95978e3044	[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage (#5700 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-08 12:42:15 +08:00
nv-guomingz	0be41b6524	Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818 )	2025-07-08 13:15:30 +09:00
Yechan Kim	5bc3a15f10	feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-07 18:03:12 -07:00
nv-guomingz	5a8173c121	chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-08 08:52:36 +08:00
davidclark-nv	a1235ee978	[feat] Adds optional module cache for TRT-LLM Gen Gemm interfaces (#5743 ) Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com> Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-07-07 13:34:55 -07:00
Omer Ullman Argov	1191555cce	[ci] speedup fused moe tests (#5726 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-07 18:03:15 +03:00
Robin Kobus	30a19fcf7c	[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-07 16:30:43 +02:00
Tailing Yuan	85b4a6808d	Refactor: move DeepEP from Docker images to wheel building (#5534 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-07-07 22:57:03 +09:00
Daniel Cámpora	1260e2f33f	feat: Optimize TRTLLM Sampler perf single beam single step (#5550 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-07-07 15:44:47 +02:00
DylanChen-NV	5ca2b9bb15	[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-07 18:04:57 +08:00
Yi Zhang	ed1b3c884a	fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1 tests (#5774 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-07 18:38:54 +09:00
Yan Chunwei	dfce61f4b9	[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler (#5751 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-07 17:05:14 +08:00
xinhe-nv	ded38ebdbd	test: [CI] remove closed bugs (#5770 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-07 18:06:07 +10:00
ChristinaZ	12d8c7d129	Refactor the topk parallelization part for the routing kernels (#5567 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-07-07 15:53:25 +08:00
Bo Li	9db2e9ee47	fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-07 14:58:32 +08:00
Zheng Duan	de10774c2e	chore: log stack trace on error in openai server (#5749 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-07 14:54:36 +08:00
Yanchao Lu	092e0eb86a	[Infra] - Fix a syntax issue in the image check (#5775 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-07 11:19:59 +09:00
bhsueh_NV	85e934a7fe	[Doc] update the document of qwen3 and cuda_graph usage (#5703 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-07 09:44:25 +08:00
Daniel Stokes	ec6c7dff1a	feat: Add support for MXFP8xMXFP4 in pytorch (#5535 ) Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>	2025-07-06 15:32:06 -07:00
Yiteng Niu	66f299a205	[TRTLLM-5878] add stage for image registration to nspect (#5699 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-06 23:52:54 +08:00
Yanchao Lu	2013034948	[Test] - Waive or fix few known test failures (#5769 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-06 21:14:16 +08:00
Robin Kobus	ae27261094	refactor: decoding inputs (#5679 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-06 08:21:02 +02:00
Yanchao Lu	d95ae1378b	[Infra] - Always use x86 image for the Jenkins agent and few clean-ups (#5753 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-06 10:25:57 +08:00
Julien Debache	6bddaf6df6	chore: Improve documentation of Kv_block_array (#5765 ) Signed-off-by: Julien Debache <julien.debache@hotmail.com>	2025-07-05 22:25:27 +02:00

... 3 4 5 6 7 ...

1916 Commits