TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 19:52:38 +08:00

Author	SHA1	Message	Date
Yukun He	e07fa9ddc5	[https://nvbugs/5496960 ][fix] Fix Gemma model forward. (#7509 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-04 19:09:43 +08:00
Guoming Zhang	cabda243f1	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-04 05:29:43 -04:00
dongxuy04	9eecdf2ee9	[TRTLLM-7008][fix] cherrypick fix to 1.0 Add automatic shared memory delete if already exist (#7433 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-02 11:23:53 +08:00
Yuxian Qiu	559762f185	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-01 16:00:43 +08:00
HuiGao-NV	860589aa0c	[https://nvbugs/5474169 ][fix]Adjust max seq len for kvcache for memory estimation (#7391 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-01 14:40:58 +08:00
Chang Liu	050db0e46f	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) (#7379 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-30 17:44:24 +08:00
Bo Li	ef0f65b353	[https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. (#7298 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-08-29 12:19:03 +08:00
amitz-nv	66f0657716	[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7203 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-28 16:06:32 +08:00
Venky	6cc168a5d3	[https://nvbugs/5463720 ][fix] tp-split the inferred `mlp_hidden_size` for nemotron-nas (#7231 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-27 15:04:42 +03:00
Lizhi Zhou	0fa49c5e2b	[https://nvbugs/5448767 ][fix] fix mpi4py deadlocks in pp event-loop (#6976 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-27 02:01:48 -04:00
Jin Li	877e1f44d3	[https://nvbugs/5451426 ][fix] Avoid torch compile on full eagle3 worker (#7245 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-27 09:59:06 +08:00
William Zhang	34c1e9c341	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7225 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-26 09:40:17 -07:00
Jiagan Cheng	85b4ae26b7	[https://nvbugs/5451342 ][fix] Use runtime max_batch_size when cuda_graph_config.max_batch_size is not provided in trtllm-bench (#7031 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-08-26 08:10:35 -04:00
Yuxian Qiu	2fb16ad328	[None][fix] fix log_once usage (#7210 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-26 19:13:03 +08:00
Shi Xiaowei	d010b2043a	[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#7191 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-25 20:21:43 +08:00
Wanli Jiang	b76c987913	[https://nvbugs/5467232 ][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value (#7168 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-25 15:37:57 +08:00
peaceh-nv	030598a497	[https://nvbugs/5448426 ][fix] Fix illegal memory access in cuda graph (#7127 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-25 10:04:34 +08:00
Wanli Jiang	036c3dd0ea	[TRTLLM-6825][fix] Update lora for phi4-mm (#7149 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-23 20:57:00 +08:00
Dom Brown	3f2eb4d2e8	[https://nvbugs/5461712 ] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-23 05:26:12 -04:00
Shi Xiaowei	3ee8523829	[https://nvbugs/5450074 ][fix] Reduce the device memory requirements for testing (#6990 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-22 17:33:30 +08:00
HuiGao-NV	253af9f9af	[https://nvbugs/5410391 ][bug] Support to share device buffers in attention meta (#6557 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-08-22 13:19:27 +08:00
Pamela Peng	1e5a6be55d	[https://nvbugs/5448442 ][fix] Skip trtllm moe backend for sm120 (#7010 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-08-21 13:34:07 -04:00
Venky	9eac744d72	[https://nvbugs/5464088 ] [fix] dequantize fp8 activation input to lora forward; update perf test config (#7014 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-21 08:28:54 -04:00
ChristinaZ	a875e50321	[https://nvbugs/5392414 ] [fix] For release 1.0 cherry pick. Add customized default routing method (#7068 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-08-21 20:06:50 +08:00
Yan Chunwei	caf73f5bab	[https://nvbugs/5383702 ][fix] test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus (#6889 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:56:42 +08:00
Yan Chunwei	e77ec061db	[https://nvbugs/5451296 ][fix] zmq nonblock bug with retry (#7019 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:34:46 +08:00
yifeizhang-c	5959d72d74	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6975 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-20 16:32:27 +08:00
Jin Li	69846c6586	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6978 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-20 15:06:56 +08:00
Yan Chunwei	fae43e7b46	[None][doc] add status labels to LLM class's api reference (#6899 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-19 21:50:04 -04:00
QI JUN	cd1b809d6e	[https://nvbugs/5374016 ][fix] improve error message (#6893 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-19 10:29:08 +08:00
Aurelien Chartier	fef2f1f55d	[https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 (#6913 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-19 10:25:43 +08:00
Liao Lanyu	d9b9b5d053	[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights (#6927 ) Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>	2025-08-18 10:20:09 +08:00
Venky	550faa9554	[https://nvbugs/5453667 ] [fix] reverting a breaking change: make trtllm-bench `enable_chunked_context` defaults backend-dependent (#6956 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-16 00:29:02 -04:00
Mike Iovine	9e02f6b9f4	[https://nvbugs/5455836 ][fix] Fix llama 4 FP4 (#6911 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-15 10:09:09 -04:00
Pengbo Wang @ NVIDIA	f26db3b934	[TRTLLM-6481][fix] Fix deepseek r1 accuracy issue (#6868 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-15 15:56:35 +08:00
brb-nv	a00ca11673	[None][chore] Add docs for Gemma3 VLMs (#6880 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-14 18:23:32 -07:00
Yukun He	d62b9c0ed7	[None][fix] Complete the last missing allreduce op in Llama3/4. (#6850 ) The allreduce op of the last decoder layer is missing in some circumstances for the models Llama3 and Llama4. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-15 09:07:09 +08:00
Anurag Mukkara	a8618b2d14	[None][fix] Revert phi4-mm aggregate mode (#6907 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-08-14 15:45:45 -04:00
2ez4bz	7ebb770dce	[None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-14 02:15:44 -04:00
Wanli Jiang	b4167cce68	[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#6820 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-13 21:45:22 -07:00
2ez4bz	ccb62ef97e	[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731 ) This commit adds some level of FP8 support to Mistral Small 3.1 by: * disabling quantization for the vision sub-model since `modelopt` does support quantizing it (yet). * extending existing accuracy tests to use a modelopt produced FP8 checkpoint. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-13 21:25:55 -04:00
Yan Chunwei	a32a2e4d82	[https://nvbugs/5383702 ][fix] error propagation in GenerationExecutor (#6793 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-12 12:28:06 +08:00
amitz-nv	64c878818b	[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6786 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-11 14:31:39 -04:00
2ez4bz	efd0a51508	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) (#6765 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-11 10:13:10 -07:00
dominicshanshan	864ddb3289	[https://nvbugs/5429689 ][fix] Fix mllama model structure update with transformers issue (#6699 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-08-11 10:48:35 +08:00
Yan Chunwei	21e4f51139	[TRTLLM-4721][test] Add qa test for llm-api (#6727 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-11 08:03:16 +08:00
Bo Deng	d289d85bff	[TRTLLM-6675][infra] Nixl test completion (#6623 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-08 10:15:54 +08:00
brb-nv	4adde41632	[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-07 13:14:04 -04:00
Yiqing Yan	5664605277	[None][chore] Bump version to 1.0.0 (#6652 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-07 14:15:34 +08:00
Izzy Putterman	7e0158b583	Qwen3: Fix eagle hidden states (#6199 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-08-06 17:05:18 -04:00

1 2 3 4 5 ...

1011 Commits