TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
nv-guomingz	b959618579	refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field from PytorchConfig (#5031 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-13 16:34:24 +08:00
yunruis	30c5b4183a	refactoring: port customized kernels with public cutlass version (#5027 ) Signed-off-by: yunruis Merge this to unblock others since the full CI has been run through	2025-06-13 16:19:31 +08:00
Yao Yao	12e075eb70	[nvbug 5333996 ][fix] Unload XQA cubins early to avoid static lifetime (#5133 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2025-06-13 15:53:29 +08:00
Matthias Jouanneaux	514baf1287	[fix] Fix comment to pass guardwords check (#5191 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-06-13 15:49:59 +08:00
Zheng Duan	4d0a5ad384	chore: gracefully exit disagg process in tests; better startup and logging (#5109 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-13 14:03:55 +08:00
Ivy Zhang	28cd536bd6	[test] Update timeout params in QA test list (#5124 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-06-13 13:40:03 +08:00
Iman Tabrizian	01bd4c00b4	Add two MTP disaggregated test (#4546 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-06-13 12:17:45 +08:00
Daniel Cámpora	dec326ba7d	[fix] Reenable test return logits (#5160 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-13 06:07:22 +02:00
Yibin Li	b79eb34bfe	[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-06-13 11:37:50 +08:00
xinhe-nv	d9be419f45	tests: update tests for b200 (#5180 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-13 11:25:33 +08:00
ruodil	fa582cbe9a	test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-06-13 11:09:15 +08:00
zhhuang-nv	a891013e3c	[feat] Optimize KV Cache Reuse for MLA (#4869 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-06-13 11:03:05 +08:00
Yuxian Qiu	4ae46b6714	fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor. (#4930 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-13 10:21:32 +08:00
Fanrong Li	38a907aaca	[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance (#5119 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-13 08:58:44 +08:00
Matthias Jouanneaux	a0b6c635b1	[feat] trtllmGen MoE routing: added support for top groups and top K bounds (#4063 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com> Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-06-13 06:00:02 +08:00
Xiaodong (Vincent) Huang	cc2a1344be	None: fix OOM because of unnecessary mha workspace (#5056 ) Signed-off-by: Vincent Huang <vincenth@nvidia.com>	2025-06-12 21:56:05 +02:00
pcastonguay	3a04c9fa7b	chore: Include prompt_token_ids only for context-only disagg requests (#5055 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-06-12 15:00:08 -04:00
Omer Ullman Argov	655bce0b19	[fix][test] report individual unittests results to jenkins (#5116 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-13 01:52:09 +08:00
Mike Iovine	690873ba1a	[nvbug/5334370][fix] Fix one model EAGLE3 (#5134 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-06-12 10:28:14 -04:00
HuiGao-NV	dfeeaf6746	Move allreduce_strategy from committed api to reference (#5147 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-12 21:00:20 +08:00
brb-nv	8cfb567182	fix: Updates to yarn implementation (#5105 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-06-12 20:45:34 +08:00
nv-guomingz	cf35a079f9	fix:https://nvbugs/5298661 (#5022 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-12 20:41:44 +08:00
nv-guomingz	58d4ca2385	fix:remove duplicated trust_remote_code knob from trtllm-serve (#5143 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-12 19:48:24 +08:00
Daniel Cámpora	22281cfc55	doc: Added documentation for enable_trtllm_sampler. (#4990 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>	2025-06-12 18:34:15 +08:00
Venky	59c9588e9a	enh(doc): Add `ci-overview` in `docs/source/reference/` (#5137 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-06-12 17:48:13 +08:00
Shi Xiaowei	88cba5f354	test: waive the NIXL related tests (#5153 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-06-12 17:02:27 +08:00
nv-guomingz	b563696dee	doc:fix invalid links for trtllm-serve doc (#5145 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-12 16:17:32 +08:00
Zhanrui Sun	a97f4581d2	infra: upload imageTag info to artifactory and add ngc_staging to save ngc image (#4764 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-06-12 15:38:47 +08:00
liji-nv	10ab9791ec	[fix] Do not reuse dummy request KVCache (#4804 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-12 15:24:50 +08:00
Fanrong Li	4d070d3862	chore: fix typo in tests (#5092 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-12 15:11:26 +08:00
Daniel Cámpora	e46267765f	Fix logprobs issues. (#5136 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-12 15:07:01 +08:00
Michal Guzek	53983ad273	[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-12 15:06:28 +08:00
ruodil	d021cc5126	test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-12 14:59:16 +08:00
tomeras91	06d9f1e2f6	[test] Use LLM API for Nemotron-H correctness test (#5097 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-06-12 09:54:46 +03:00
bhsueh_NV	505678a286	update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114 ) Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>	2025-06-12 14:40:57 +08:00
Michal Guzek	0daa70999a	Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#4961 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-06-12 14:32:04 +08:00
Venky	c3b2eb6dab	test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ (#5066 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-06-12 14:19:15 +08:00
Lucas Liebenwein	49d7268acc	[nvbugs/5331013] fix AutoDeploy for PyTorch 25.05 dependency upgrade (#5106 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-06-12 13:07:27 +08:00
Netanel Haber	e692779ead	Solve underallocation in VSWA+/VGQA (#4667 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-06-12 12:12:46 +08:00
HuiGao-NV	43192379af	Use backend to replace macro to control enablement of MNNVL all reduce (#4635 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-06-12 11:22:49 +08:00
Zheng Duan	c592798f64	fix: limit process pool size when prefetching (#5088 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-12 10:52:52 +08:00
Zheng Duan	ee44fa00f8	chore: rename IOFormatter to BaseCacheFormatter (#5068 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-06-12 10:50:14 +08:00
Po-Wei (Vincent)	ad99a08fa2	[TRTLLM-5581][infra] Update Module Owners (#5052 ) Signed-off-by: Po-Wei Wang (Vincent)	2025-06-12 09:38:42 +08:00
tburt-nv	ddfe4fceb3	[chore] 2025-06-10 update allowlist (#5102 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-06-11 18:02:18 +08:00
xinhe-nv	11b94feff8	test: skip disaggregated tests on arm (#5070 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-06-11 17:00:10 +08:00
Yiqing Yan	a90dd571f8	[TRTLLM-5082] - Add a bot run option for detailed logs (#4390 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-06-11 15:44:57 +08:00
liji-nv	8282d6c1a7	[fix] Fix llama4 min latency (#5117 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-06-11 15:44:38 +08:00
ruodil	56abae0835	test: add more llama_v3.3_70b cases in perf test (#4979 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-06-11 15:44:22 +08:00
Zhanrui Sun	e2863a3159	chore: bump version to 0.21.0rc2 (#5112 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-06-11 15:08:14 +08:00
Daniel Cámpora	fdf1c47d1d	[TRTLLM-4995][feat] TRTLLM Sampler log probs support (#4836 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-06-11 08:18:13 +02:00

1 2 3 4 5 ...

1329 Commits