Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Emma Qiao
1cc49494fe
[Infra] - Add wiave list for pytest when using slurm ( #6130 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-17 16:53:15 +08:00
Zhenhuan Chen
8c1c9ef7aa
fix: convert venv_prefix to str before comparison with base_prefix ( #6121 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-17 15:04:54 +08:00
QI JUN
e821c68611
CI: update multi gpu test trigger file list ( #6131 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-17 14:48:23 +08:00
Yanchao Lu
48daa18de3
[None][infra] Set up the initial config for CodeRabbit ( #6128 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-17 14:29:57 +08:00
Iman Tabrizian
d4d21a106e
[fix] Release slots with spec decode + disagg ( #5975 ) ( #6032 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-17 12:58:18 +08:00
ChristinaZ
7e033c392e
Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend ( #5919 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-07-17 12:38:29 +08:00
Zhanrui Sun
4c364b9a73
infra: fix SBSA test stage ( #6113 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-17 11:56:03 +08:00
Shiyu Li
6e1aee6fd6
[fix] Performance Optimization for MNNVL TwoShot Kernel ( #5934 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-07-17 10:49:51 +08:00
chenfeiz0326
fe070a0168
test: Update Llama4 Scout FP4 & FP8 accuracy tests ( #5901 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-07-17 09:41:18 +08:00
Frank
28385f6571
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI. ( #5734 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-17 09:15:06 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support ( #5644 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
qixiang-99
e09e409dfb
Fix: Enhance ModelConfig for kv cache size calculations ( #5868 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-16 14:41:31 -07:00
Mike Iovine
fa34cb7234
[refactor] Clean up drafter/resource manager creation logic ( #5805 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-16 12:45:46 -07:00
shaharmor98
e0836f9ca9
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats ( #5372 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-17 00:50:30 +08:00
Wanli Jiang
9354114f68
fix: Update trtllm args issues with extra nested config ( #5996 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 12:41:45 -04:00
Iman Tabrizian
301b78bb77
Add documentation for eagle3+disagg+dynamo ( #6072 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-16 08:39:29 -07:00
Emma Qiao
e30d7bec38
[Infra] - Waive failed cases in post-merge on main ( #6096 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-16 22:41:18 +08:00
Zhanrui Sun
e42f5a9581
infra: [TRTLLM-5879] Spilt single GPU test and multi GPU test into 2 pipelines ( #5199 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-16 18:04:04 +08:00
Bo Li
fc2347eaf5
chore: Cleanup disable_fp4_allgather. ( #6006 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-16 17:54:36 +08:00
qsang-nv
8ef8e73002
update spec_dec ( #6079 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-07-16 17:50:43 +08:00
Tomer Shmilovich
0552a02943
BlockManager copy constructor fix ( #5982 )
...
Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com>
2025-07-16 17:33:17 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Martin Marciniszyn Mehringer
10349b54df
fix: Add $HOME/.local/bin to PATH when running docker in local user mode ( #6062 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-16 10:35:27 +02:00
Ivy Zhang
dda91b5117
tests: add QA test cases ( #5959 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:14:25 +08:00
Yan Chunwei
7568deb2f1
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig ( #6001 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:05:38 +08:00
Ivy Zhang
763012a88a
[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill ( #6051 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:04:08 +08:00
peaceh-nv
f5f31beee1
feat: Add deepseek-lite tests for RTX pro 6000 ( #5903 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-16 15:51:45 +08:00
Bo Deng
ec3ebae43e
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1 ( #5991 )
...
Signed-off-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-16 13:54:42 +08:00
Zheng Duan
38db4bc7fb
feat: use session abstraction in data transceiver and cache formatter ( #5611 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-16 13:52:44 +08:00
Zheng Duan
385af53a4d
[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test ( #6041 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-16 13:52:13 +08:00
nv-guomingz
509dc7c831
chroe: upgrade modelopt to 0.33 ( #6058 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-16 13:10:48 +08:00
Yiqing Yan
e51c541617
chore: Bump version to 1.0.0rc4 ( #6086 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-16 13:02:23 +08:00
Wanli Jiang
8679a058a3
fix: Unable to load phi4-model with tp_size>1 ( #5962 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 11:39:41 +08:00
Iman Tabrizian
665b4469b3
[fix] Fix Triton build ( #6076 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-16 11:17:22 +08:00
Aurelien Chartier
6a47cac981
feat: Add support for Triton request cancellation ( #5898 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-15 20:52:43 -04:00
danielafrimi
edab7532dd
feat/add latency support for trtllm bench ( #3730 )
...
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Daniel Afrimi <dafrimi@nvidia.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-07-15 13:13:49 -07:00
brb-nv
9214ac662a
test: Add regression tests for Gemma3 VLM ( #6033 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-15 11:37:56 -07:00
Fanrong Li
7a1af1c738
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 ( #5989 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-07-16 01:33:12 +09:00
Xiaodong (Vincent) Huang
0523f77b36
support TRTLLM_DEEP_EP_TOKEN_LIMIT to allow run deep-ep on memory-con… ( #5684 )
...
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
2025-07-15 18:34:21 +03:00
Jinyang Yuan
e761231c0b
[fix] Move NCCL group in all-gather and reduce-scatter OPs outside the outer loop ( #6053 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-07-16 00:25:32 +09:00
Tailing Yuan
4a26bd6500
Fix: pad DeepEP fp4 recv tensors if empty ( #6048 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-15 23:14:01 +09:00
MinaHuai
9ebc3ab9c4
[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision ( #5998 )
...
Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>
2025-07-15 10:01:35 -04:00
Jaedeok Kim
ab1c54709d
fix: adjust window sizes of VSWA at torch backend ( #5880 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2025-07-15 17:41:54 +08:00
Yiteng Niu
9e871ca582
[infra] add more log on reuse-uploading ( #6036 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-15 17:18:38 +08:00
ruodil
2a147c4d01
test: add llama_v3.3_70b_cases in perf test ( #6035 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:59 +10:00
ruodil
2504aa552e
test: add recursive updating pytorch config and change MOE backend format in perf test ( #6046 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:15 +10:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Zhanrui Sun
d811843a08
infra: [TRTLLM-6313] Fix the package sanity stage 'Host Node Name' in… ( #5945 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-15 15:39:31 +09:00
Lucas Liebenwein
e499f6c44a
[Fix] check for ImportError or ModuleNotFoundError for deep_ep_utils ( #6026 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-07-15 14:31:35 +09:00