Chang Liu
be9dd4713c
[ https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version ( #6673 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
Aurelien Chartier
56bfc3a6d2
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically ( #6763 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-11 15:18:19 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
Venky
c9fe07ede6
[TRTLLM-6812][feat] Add standardized GitHub issue templates and disable blank issues ( #6494 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-11 13:08:48 -04:00
Zhenhua Wang
7e33ed6d61
[None][chore] always try-catch when clear build folder in build_wheel.py ( #6748 )
...
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-11 14:02:17 +02:00
Emma Qiao
5145e9d40e
[None][infra] Unwaive an updated case to test ( #6791 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 06:47:33 -04:00
Liao Lanyu
a2e9153cb0
[None][doc] Add K2 tool calling examples ( #6667 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-11 16:25:41 +08:00
bhsueh_NV
83dbc6c75d
[TRTLLM-5532][feat] store the block of context request into kv cache ( #6683 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-11 16:14:52 +08:00
Martin Marciniszyn Mehringer
9a8195ef88
fix: Ensure that Python stub generation works against libnvidia-ml stubs ( #6188 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-08-11 09:18:17 +02:00
Emma Qiao
d6ad4a9d5b
[None][infra] Waive failed tests on main 0811 ( #6778 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:16:25 -04:00
xinhe-nv
9c358c26e4
[None][chore] remove closed bugs ( #6772 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-11 14:39:58 +08:00
Yiqing Yan
62d6c98d68
[TRTLLM-5633][infra] Force set changed file diff to empty string for post-merge CI ( #6777 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-11 02:38:05 -04:00
Eran Geva
b3e8fa2960
[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu ( #6487 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-08-11 08:33:13 +03:00
Tracin
49bcaa4e95
Add gpt-oss GSM8K test. ( #6732 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-08-10 22:45:43 -04:00
Chuang Zhu
c566a8d2a2
[None][fix] fix same pp disagg ( #6730 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-10 22:45:15 -04:00
Bo Deng
767879ef85
[ https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper ( #6736 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 10:05:10 +08:00
Zero Zeng
4b4b91ab51
[None][feat] improve dataloading for benchmark_dataset by using batch… ( #6548 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-08-11 09:50:41 +08:00
Yechan Kim
60073a7ad9
[None][feat] Support SharedTensor on MultimodalParams ( #6254 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-10 17:48:24 -07:00
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs ( #6592 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
pcastonguay
4142320e53
[ https://nvbugs/5444937 ][fix] Fixing kv_cache_event unit test ( #6753 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-10 16:45:38 -07:00
shaharmor98
14b36e07d7
[TRTLLM-6174][feat] Enable FP32 mamba ssm cache ( #6574 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 16:27:51 -04:00
Yueh-Ting (eop) Chen
199f306984
[None][chore][kv cache manager] Dead code elimination, we no longer record/fetch through WindowBlockManager:: mContextBlocksByHash ( #6249 )
...
No functional change is intended in this MR.
`WindowBlockManager::mCachedBlocksRoot` is now who is responsible
for the bookkeeping of the `KVCacheBlock`, and the `mNextBlocks` is
now the actual hash map that fetches the block.
The `mEnableHashKey` knob and related hashing is removed.
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-08-10 09:10:10 -04:00
Gal Hubara-Agam
3c5aec19c2
[ #5048 ][enhance] AutoDeploy: Optimize prepare_inputs ( #6634 )
...
Optimize prepare_inputs routine in AutoDeploy, as part of the effort to reduce the performance gap compared to the default backend.
This PR includes two major fixes, and some other minor tweaks:
1. Avoid back and forth data copies
2. Optimize position ids update by separating the implementation for generation mode and context mode.
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-10 13:55:04 +03:00
Emma Qiao
ee19ca5e58
[None][infra] Waive test main 0808 ( #6751 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-09 23:54:07 -04:00
Ziyi Xiong
de472828b9
[TRTLLM-6637][feat] Resolve KV cache divergence issue ( #6628 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-09 23:15:04 +08:00
Yilin Fan
d643aef73c
[Perf] Improve Llama4 performance for small max_seqlen cases ( #6306 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-08-09 02:58:31 -04:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation ( #5785 )
...
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Yibin Li
97787883c3
[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick changes and minor fix ( #6493 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-08 21:40:48 -04:00
dongfengy
d06675071e
[None][fix] WAR GPT OSS on H20 with Triton MOE ( #6721 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-08 19:47:09 -04:00
Fridah-nv
cc0f4c87d4
[None][doc] Move AutoDeploy README.md to torch docs ( #6528 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-08 19:11:45 -04:00
Venky
efcb8f7f16
[TRTLLM-7025] [infra] Reorganize CODEOWNERS to rectify examples mapping ( #6762 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-08 15:33:08 -07:00
Mike Iovine
90145cf557
[None][feat] Optimize CUDA graph memory usage for spec decode cases ( #6718 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-08 13:56:53 -04:00
Wanli Jiang
d45236b253
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm ( #6184 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-08 20:09:26 +08:00
Stefan Niebler
b8f036f264
[TRTLLM-6650][fix] Enhance CUDA graph + Beam search to correctly handle padding ( #6665 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-08-08 14:00:33 +02:00
Chuang Zhu
e251f7c00b
[None][fix]revert kvcache transfer ( #6709 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-08 07:18:53 -04:00
Zheng Duan
ebdc43e69d
[None][feat] move kv cache measure into transfer session ( #6633 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-08 17:49:22 +08:00
Liao Lanyu
32ad7f3c12
[None][fix] Remove lock related typo in py_executor ( #6653 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-08 17:48:57 +08:00
JunyiXu-nv
5f45227a93
[ https://nvbugs/5437106 ][fix] Fix llama4 scout TRTLLM attn_backend ( #6690 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-08 17:48:23 +08:00
Yuxian Qiu
9ff4e75f14
[None][refactor] Combine resmooth_to_fp8_e8m0 and transform_sf_into_required_layout ( #6654 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-08 17:11:41 +08:00
Leslie Fang
294e0d3dab
[ https://nvbugs/5436461 ][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI ( #6631 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-08 15:30:47 +08:00
Li Min
d913955952
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell ( #6616 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-08-08 15:03:48 +08:00
Chang Liu
9687bb42b5
[None][doc] Add doc for multimodal feature support matrix ( #6619 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-08-08 02:20:29 -04:00
ruodil
b15d6fb145
[None][test] fix yml condition error under qa folder ( #6734 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-08 15:59:01 +10:00
2ez4bz
064eb7a70f
[TRTLLM-5252][fix] Propagate mapping to intermediate layers ( #6611 )
...
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.
It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-08 01:50:36 -04:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving ( #6704 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
zhanghaotong
1cf669496a
[None][fix] Fix unnecessary GPU synchronization in torch sampler caused by incorrect tensor reference ( #6626 )
...
Signed-off-by: 皓聪 <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: 皓聪 <zhanghaotong.zht@alibaba-inc.com>
2025-08-07 23:44:47 -04:00
NVJiangShao
2f2f5cc72c
[TRTLLM-6744][feat] Remove input_sf swizzle for module WideEPMoE ( #6231 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-08 11:13:42 +08:00
ruodil
22f45a0e19
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test ( #6685 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 22:57:04 -04:00
xinhe-nv
88ced50ca7
[TRTQA-2920][fix] Add failed cases into waives.txt ( #6719 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-08 12:54:13 +10:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00