TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-24 12:42:54 +08:00

Author	SHA1	Message	Date
Dom Brown	3f2eb4d2e8	[https://nvbugs/5461712 ] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-23 05:26:12 -04:00
Michal Guzek	7ea53ff516	[https://nvbugs/5433545 ][fix] TestPhi4MiniInstruct::test_auto_dtype - Use max_seq_len=4096 to fallback to the short RoPE factor (#6895 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-08-22 10:28:09 -07:00
Dimitrios Bariamis	4b6cca0662	[https://nvbugs/5474037 ][fix] Fix building tritonbuild/tritonrelease images (#7157 ) Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com> Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2025-08-22 16:47:38 +02:00
Shi Xiaowei	3ee8523829	[https://nvbugs/5450074 ][fix] Reduce the device memory requirements for testing (#6990 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-22 17:33:30 +08:00
milesial	c1eefa8735	[https://nvbugs/5467062 ][fix] pass logitsPostProcessorBatched by reference (#7110 ) Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>	2025-08-22 04:42:06 -04:00
HuiGao-NV	253af9f9af	[https://nvbugs/5410391 ][bug] Support to share device buffers in attention meta (#6557 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-08-22 13:19:27 +08:00
brb-nv	79f1e6c867	[https://nvbugs/5449032 ][fix] Add more llm-args to llm_mgmn_trtllm_bench.sh (#7144 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-22 12:24:35 +08:00
Pamela Peng	1e5a6be55d	[https://nvbugs/5448442 ][fix] Skip trtllm moe backend for sm120 (#7010 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-08-21 13:34:07 -04:00
Emma Qiao	441edf1eeb	[None][infra] Skip failed tests for release branch (#7130 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-22 00:09:51 +08:00
Venky	9eac744d72	[https://nvbugs/5464088 ] [fix] dequantize fp8 activation input to lora forward; update perf test config (#7014 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-21 08:28:54 -04:00
ChristinaZ	a875e50321	[https://nvbugs/5392414 ] [fix] For release 1.0 cherry pick. Add customized default routing method (#7068 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-08-21 20:06:50 +08:00
Yan Chunwei	caf73f5bab	[https://nvbugs/5383702 ][fix] test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus (#6889 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:56:42 +08:00
Yan Chunwei	e77ec061db	[https://nvbugs/5451296 ][fix] zmq nonblock bug with retry (#7019 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-21 08:34:46 +08:00
chenfeiz0326	5acf213a15	[https://nvbugs/5440241 ][fix] Fix 70B GSM8K Accuracy drop (#7075 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-08-20 18:11:00 -04:00
Erin	9a8f9b338f	[None][doc] update v1.0 doc for trtllm-serve (#7056 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-08-20 10:57:21 -07:00
Dimitrios Bariamis	a343e8535e	[None][fix] Fix build of tritonbuild/tritonrelease image (#7003 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2025-08-20 16:57:39 +02:00
amitz-nv	3efe1d918a	[TRTLLM-7263][fix] Prevent recreation of cublas handles in lora_grouped_gemm every call (#7053 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-20 06:41:20 -04:00
yifeizhang-c	5959d72d74	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6975 ) Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>	2025-08-20 16:32:27 +08:00
Jin Li	69846c6586	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6978 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-20 15:06:56 +08:00
Bo Deng	df00c81aea	[https://nvbugs/5448437 ][fix] fix some nixl tests (#6940 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-20 14:19:48 +08:00
Yan Chunwei	fae43e7b46	[None][doc] add status labels to LLM class's api reference (#6899 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-19 21:50:04 -04:00
Emma Qiao	c4535e6c3a	[None][infra] Waive failed tests for release branch (#7036 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-19 20:42:47 +08:00
Iman Tabrizian	520117ece0	[https://nvbugs/5451296 ][bug] Fix a thread leak in test_llm_args.py (#7017 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-19 07:19:53 -04:00
brb-nv	da91256503	[None][chore] Waive E2E GB200 tests for Gemma3 27B (#6916 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-19 05:19:34 -04:00
Yechan Kim	d6c2a6a81f	[https://nvbugs/5448579 ][fix] EXAONE-4.0 accuracy test bugfix (#6888 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-19 09:29:32 +02:00
Nave Assaf	d4dd5b4f4d	[https://nvbugs/5451028 ][fix] Constrain NemotronSuper test parameters… (#6987 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com>	2025-08-19 09:19:50 +02:00
Perkz Zheng	20f7df25ac	[https://nvbugs/5394685 ][fix] proper fix for the accuracy issue in 2CTA MLA kernels (release 1.0) (#6946 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-19 03:10:29 -04:00
QI JUN	cd1b809d6e	[https://nvbugs/5374016 ][fix] improve error message (#6893 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-19 10:29:08 +08:00
Aurelien Chartier	fef2f1f55d	[https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 (#6913 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-19 10:25:43 +08:00
William Zhang	790a105563	[https://nvbugs/5462007 ][ci] Unwaive Mistral Small 3.1 FP8 test (#7008 ) The error was fixed by #6909. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-18 19:50:03 -04:00
Yanchao Lu	6fda8ddac9	[None][infra] Cherry-pick #6836 from main branch and improve SSH connection (#6971 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-19 01:11:11 +08:00
Yiqing Yan	28c30e1bf8	[None][chore] Remove duplicate test waives (#6999 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-18 22:04:43 +08:00
Emma Qiao	2992e9cd58	[None][infra] Waive failed tests for release branch 0818 (#6993 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-18 20:31:50 +08:00
peaceh-nv	28526fe2b1	[https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf (#6937 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-18 15:58:53 +08:00
Ivy Zhang	055fdd9e31	[None][fix] update skip config (#6891 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-18 13:50:46 +08:00
Guoming Zhang	96bda14fbd	[https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 (#6847 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-17 23:22:01 -04:00
William Zhang	c16aff5e3f	[https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests (#6909 ) This commit lowers the GPU memory allocated for KV cache in accuracy tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-18 11:17:37 +08:00
Liao Lanyu	d9b9b5d053	[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights (#6927 ) Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>	2025-08-18 10:20:09 +08:00
Yilin Fan	7f7a301f6e	[https://nvbugs/5412562 ][feat] Allocate MoE workspace only when necessary (release/1.0 retargeted) (#6955 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>	2025-08-18 08:50:35 +08:00
Xianjie Qiao	33fce8ece5	[https://nvbugs/5405041 ][fix] Update wide ep doc (#6950 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-08-16 22:09:00 -04:00
Venky	550faa9554	[https://nvbugs/5453667 ] [fix] reverting a breaking change: make trtllm-bench `enable_chunked_context` defaults backend-dependent (#6956 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-16 00:29:02 -04:00
Venky	2c016f8369	[None][infra] update CODEOWNERS for release (#6905 )	2025-08-15 12:34:29 -04:00
Mike Iovine	9e02f6b9f4	[https://nvbugs/5455836 ][fix] Fix llama 4 FP4 (#6911 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-15 10:09:09 -04:00
Yan Chunwei	6d65b63b8d	[None][ci] unwaive test_ptp_star_attention_example (#6943 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-15 05:33:25 -04:00
Pengbo Wang @ NVIDIA	f26db3b934	[TRTLLM-6481][fix] Fix deepseek r1 accuracy issue (#6868 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-15 15:56:35 +08:00
Iman Tabrizian	96be46f3f1	[https://nvbugs/5451434 ][fix] Fix triton docker build (#6898 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-15 02:08:39 -04:00
xinhe-nv	c03ea1ba2d	[TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-08-15 14:01:05 +08:00
Yan Chunwei	54ffc6a250	[None][doc] add legacy section for tensorrt engine (#6724 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-15 11:08:38 +08:00
brb-nv	a00ca11673	[None][chore] Add docs for Gemma3 VLMs (#6880 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-14 18:23:32 -07:00
Yukun He	d62b9c0ed7	[None][fix] Complete the last missing allreduce op in Llama3/4. (#6850 ) The allreduce op of the last decoder layer is missing in some circumstances for the models Llama3 and Llama4. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-15 09:07:09 +08:00

1 2 3 4 5 ...

2294 Commits