TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yechan Kim	0893afae3d	[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-21 08:54:12 +08:00
bhsueh_NV	73d2daa386	[https://nvbugs/5457489 ][fix] unwaive some tests (#6991 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-21 08:49:57 +08:00
QI JUN	a918de710a	[None][ci] move some tests of b200 to post merge (#7093 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-20 19:43:40 -04:00
Jin Li	e5e417019b	[None][chore] Only check the bindings lib for current build (#7026 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-20 14:17:17 -04:00
Dom Brown	92daec1115	[TRTLLM-7348] [feat] Enable Cross-Attention to use XQA kernels for Whisper (#7035 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-20 10:11:25 -04:00
Yuhao Yao	8ac7dec623	[None][fix] Fix W4A8 MoE kernel issue (#7072 ) Signed-off-by: yuhyao <827623970@qq.com>	2025-08-20 06:52:47 -04:00
Emma Qiao	f84dd64250	[None][infra] Waive failed tests on main branch 8/20 (#7092 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-20 06:33:44 -04:00
Robin Kobus	b95cab2a7c	[None][ci] move unittests to sub-directories (#6635 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-20 05:42:22 -04:00
Kanghwan	983fb8e607	[None][chore] Update namelist in blossom-ci (#7015 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-08-20 09:29:01 +02:00
Zhenhuan Chen	20f54cb272	[None][fix] fix scaffolding dynasor test (#7070 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-20 15:20:46 +08:00
Yueh-Ting (eop) Chen	020fed97b6	[TRTLLM-6341][chore] Preliminary refactors on the kv cache manager before supporting swa kv cache reuse (#6767 ) This MR is a preliminary MR for implementing the SWA reuse mechanism for the kv cache manager. Please be aware that no functional change is intended in this merge request. The purpose of the clean-up is to decouple and remove existing functions for the up-coming SWA KV cache reuse change to be more natural and easier to review. Right now, (1) streamLLM, and (2) beam search with SWA, are broken. We do not want to complicate the code base by stacking more features upon something that does not work. This MR prunes out the logic and add assertions so we can come back and re-support the broken feature and remove the assertion. Since streamLLM (sink attention) is broken now, assertion is added under `KVCacheManager` ctor to guard for the value of `mSinkBlockTokenLength` and `mSinkBubbleLength`. Compute logics relate to it are pruned. The beam search with SWA will still be broke when introducing the SWA KV cache reuse. We will revisit this problem in the future. On top of this, we should make an effort to update the [supporting matrix](https://github.com/NVIDIA/TensorRT-LLM/blob/feat/1.0_doc_dev/docs/source/1.0/features/feature-combination-matrix.md) of the kv cache manager after merging the support of SWA KV cache reuse. Changes are listed as following: - Separate `KVCacheManager::updateToken` into `KVCacheManager::addToken` and `KVCacheManager::removeToken`. The functionality should be decoupled. - Push utility `cacheSequenceBlockOffsets` and `cacheNewBlockOffset` from `KVCacheManager` down to `WindowBlockManager`. `KVCacheManager`-exposed functions should be real utilities that users of the structure can leverage. Implementation-detailed function calls should not exist at this level. - Simplify "is shared last context block" logic under `KVCacheManager::addSequence`. Since no functional change is intended in this merge request, no test case is added. Several comments are added for future test coverage reminder. For `LlmRequestTest.ParamTest`, `streaming=True` is commented out because we guard sink attention with assertion now. In `capacitySchedulerTest`, `addToken` action to `crossKVCacheManager` is removed because in encoder-decoder model, generation tokens are added only to the decoder and not to the encoder. Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-08-20 13:57:57 +08:00
Iman Tabrizian	e27088421e	[None][infra] "[TRTLLM-6960][fix] enable scaled_mm tests (#6936 )" (#7059 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-08-20 01:45:09 -04:00
xinhe-nv	9e71b4fda4	[TRTLLM-7205][feat] add llama4 tp4 tests (#6989 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-20 13:22:05 +08:00
Leslie Fang	3f6a9267f1	[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-20 13:14:34 +08:00
Chang Liu	ce53832610	[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-19 21:42:50 -07:00
Ivy Zhang	fc85e3db1c	[None][fix] fix llmapi import error (#7030 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-19 22:58:13 -04:00
Bo Deng	30da5d3cc4	[None][chore] unwaive test_disaggregated_genbs1 (#6944 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-20 09:57:35 +08:00
Fridah-nv	c02592d051	[None][autodeploy] Add group attention pattern for solar-pro-preview (#7054 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-08-19 18:57:09 -04:00
Jinyang Yuan	0e30fe4372	[None][fix] Fix assertion errors of quantization when using online EPLB (#6922 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-08-19 11:28:36 -07:00
Michal Guzek	7334f9390c	[None][fix] Accommodate Phi3/4 to work with ModelOpt's FP8 ckpts in Torch (#6761 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-08-19 09:22:46 -07:00
Yanchao Lu	d26a5a93ad	[https://nvbugs/5451296 ][bug] Cherry-pick #7017 from release/1.0 branch (#7043 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-08-19 11:25:05 -04:00
pcastonguay	e07fcc3a22	[https://nvbugs/5444937 ][chore] Fixing KV events tests (#7004 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-08-19 11:18:04 -04:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
Emma Qiao	8f95f35503	[None][infra] Waive failed tests on main (#7037 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-19 09:31:07 -04:00
Yiqing Yan	07506bccbe	[None][chore] Remove duplicate test waives (#7044 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-19 21:04:31 +08:00
Fanrong Li	655d0f48d0	[https://nvbugs/5455140 ][fix] unwaive DSR1-fp4 throughput_tp8 (#7022 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 20:48:05 +08:00
tomeras91	f0bfb49219	[https://nvbugs/5458874 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6996 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-08-19 15:45:06 +03:00
amitz-nv	a54c53652b	[TRTLLM-7263][fix] Prevent recreation of cublas handles in lora_grouped_gemm every call (#6968 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-19 15:39:56 +03:00
Xianjie Qiao	19667304b5	[None] [chore] Update wide-ep genonly scripts (#6995 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-19 07:44:07 -04:00
Kaiyu Xie	9a74ee9dae	[None] [doc] Add more documents for large scale EP (#7029 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-19 19:04:39 +08:00
Zero Zeng	953f4fd69e	[None][fix] acceptance rate calculation fix in benchmark_serving (#6746 ) Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>	2025-08-19 17:29:36 +08:00
xinhe-nv	2c86cee38c	[None][chore] Remove closed bugs (#6969 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-19 16:01:33 +08:00
Shunkangz	54ec2c1af1	[None][opt] Add batch wait timeout in fetching requests (#6923 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-19 03:50:08 -04:00
Eran Geva	636c622bb8	[https://nvbugs/5458798 ][fix] Relaxed test threshold, added documentation (#6997 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-19 00:24:03 -07:00
Ivy Zhang	bff5fdf6df	[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-19 13:59:14 +08:00
William Zhang	daa2a65d37	[https://nvbugs/5454875 ][ci] Unwaive Mistral Small 3.1 test (#7011 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-19 00:32:14 -04:00
fredricz-20070104	e90280a84d	[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-19 00:13:04 -04:00
Fanrong Li	816a120af6	[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-19 00:03:03 -04:00
Zhenhuan Chen	2bb90ba002	[TRTLLM-6960][fix] enable scaled_mm tests (#6936 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-19 10:18:04 +08:00
Venky	06911c0173	[None] [infra] stricter coderabbit pr title generation instructions (#6918 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-18 22:11:36 -04:00
Yi Zhang	a15af879ec	[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic (#6615 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-08-19 09:58:44 +08:00
Lizhi Zhou	71e28eab36	[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-19 09:58:22 +08:00
Wanli Jiang	dabebb2c7a	[https://nvbugs/5371480 ][fix] Enable test_phi3_small_8k (#6938 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-19 09:42:35 +08:00
Fridah-nv	97ba0eb879	[None][autodeploy] Doc: fix link path in trtllm bench doc (#7007 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-08-19 08:43:28 +08:00
Leslie Fang	e76e5c640f	[None][infra] Enable accuracy test for mtp and chunked prefill (#6314 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-19 07:42:52 +08:00
Daniel Cámpora	d16af87d03	[TRTLLM-7158][feat] Introduce sampler options in trtllm bench (#6855 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-18 18:10:05 -04:00
Yanchao Lu	d1d17dbeba	[None][infra] Cherry-pick #6836 from main branch and improve SSH connection (#6971 ) (#7005 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-19 01:35:30 +08:00
Martin Marciniszyn Mehringer	425dad01fd	[None][fix] Clean up linking to CUDA stub libraries in build_wheel.py (#6823 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-08-18 11:20:51 -04:00
Yiqing Yan	1ce23545fc	[None][chore] Remove duplicate test waives (#6998 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-18 21:15:49 +08:00
Emma Qiao	69ff32f9b1	[None][infra] Waive failed tests on main 0818 (#6992 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-18 20:34:52 +08:00

1 2 3 4 5 ...

2415 Commits