Erin
9522cde464
fix: NVBug 5385576 py_batch_idx issue ( #6153 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-18 22:36:43 +08:00
Leslie Fang
44040edbf0
update broken link of PyTorchModelEngine in arch_overview ( #6171 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-07-18 19:53:38 +08:00
Robin Kobus
ec2b953e7e
refactor: Enhanced handling of decoder requests and logits within the batch manager ( #6055 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-18 12:12:08 +02:00
Emma Qiao
77acb4f753
[Infra] - Waive failed tests in post-merge ( #6176 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-18 17:34:34 +08:00
QI JUN
a95f31e72a
chore: add more log in FmhaDispatcher ( #6170 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-18 16:53:02 +08:00
Yiteng Niu
519a2116b5
[None][infra] Update the allow list of CI trigger ( #6168 )
...
Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
Co-authored-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
2025-07-18 15:38:38 +08:00
Yiqing Yan
f32169269a
[TRTLLM-5179] - Update bot help messages ( #5277 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-18 15:25:05 +08:00
Chuang Zhu
c0e416535e
fix single_disagg_test ( #6166 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-18 13:18:37 +08:00
Aurelien Chartier
812243bdd6
feat: add support for Modelopt fp8_pb_wo quantization scheme ( #6106 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-07-18 10:35:12 +08:00
Zhenhuan Chen
992b273045
[ https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e ( #6140 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-18 10:34:37 +08:00
xavier-nvidia
200ea9ee81
fix TMA error with GEMM+AR on TP=2 ( #6075 )
...
Signed-off-by: Xavier Simmons <xsimmons@nvidia.com>
2025-07-18 10:26:08 +08:00
yifeizhang-c
0155e7a3a1
[TRTLLM-6368] Update deepep dispatch API ( #6037 )
...
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-07-18 10:13:31 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
Daniel Stokes
ae28b3a664
feat: Add support for benchmarking individual gemms in MOE benchmark ( #6080 )
...
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-18 09:00:12 +12:00
qixiang-99
2c90203c36
Refactor KVCacheManager: Simplify token availability calculation and … ( #6134 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-17 13:33:33 -07:00
Frank
161490f039
[fix] Fixes KV Cache overrides in trtllm-bench ( #6103 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-07-18 03:44:44 +08:00
2ez4bz
8480c120b1
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge ( #6105 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-17 11:04:17 -07:00
Iman Tabrizian
10dbf4f0f4
[fix] Remove duplicated KVCache transmission check ( #6022 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-17 12:02:19 -04:00
ixlmar
d71c6fe526
[fix] Update jenkins container images ( #6094 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-17 16:22:25 +01:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Ziyi Xiong
58d22a72f1
[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Drafter ( #6007 )
...
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
2025-07-17 21:15:01 +08:00
Stanley Sun
9518e14f69
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout ( #6115 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-17 20:55:04 +10:00
Yi Zhang
a718486900
fix: Fix DeepSeek R1 CI ( #6129 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-17 18:24:49 +08:00
nv-guomingz
9b45499caa
test: update max_beam_width to 1 due to torchsampler changes. ( #6101 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-17 18:05:45 +08:00
Erin
de60ae47e3
chores: unwaive a few tests for v1.0 ( #6107 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-17 17:59:51 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler ( #6000 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Emma Qiao
1cc49494fe
[Infra] - Add wiave list for pytest when using slurm ( #6130 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-17 16:53:15 +08:00
Zhenhuan Chen
8c1c9ef7aa
fix: convert venv_prefix to str before comparison with base_prefix ( #6121 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-17 15:04:54 +08:00
QI JUN
e821c68611
CI: update multi gpu test trigger file list ( #6131 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-17 14:48:23 +08:00
Yanchao Lu
48daa18de3
[None][infra] Set up the initial config for CodeRabbit ( #6128 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-17 14:29:57 +08:00
Iman Tabrizian
d4d21a106e
[fix] Release slots with spec decode + disagg ( #5975 ) ( #6032 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-17 12:58:18 +08:00
ChristinaZ
7e033c392e
Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend ( #5919 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-07-17 12:38:29 +08:00
Zhanrui Sun
4c364b9a73
infra: fix SBSA test stage ( #6113 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-17 11:56:03 +08:00
Shiyu Li
6e1aee6fd6
[fix] Performance Optimization for MNNVL TwoShot Kernel ( #5934 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-07-17 10:49:51 +08:00
chenfeiz0326
fe070a0168
test: Update Llama4 Scout FP4 & FP8 accuracy tests ( #5901 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-07-17 09:41:18 +08:00
Frank
28385f6571
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI. ( #5734 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-17 09:15:06 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support ( #5644 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
qixiang-99
e09e409dfb
Fix: Enhance ModelConfig for kv cache size calculations ( #5868 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-16 14:41:31 -07:00
Mike Iovine
fa34cb7234
[refactor] Clean up drafter/resource manager creation logic ( #5805 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-16 12:45:46 -07:00
shaharmor98
e0836f9ca9
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats ( #5372 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-17 00:50:30 +08:00
Wanli Jiang
9354114f68
fix: Update trtllm args issues with extra nested config ( #5996 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 12:41:45 -04:00
Iman Tabrizian
301b78bb77
Add documentation for eagle3+disagg+dynamo ( #6072 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-16 08:39:29 -07:00
Emma Qiao
e30d7bec38
[Infra] - Waive failed cases in post-merge on main ( #6096 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-16 22:41:18 +08:00
Zhanrui Sun
e42f5a9581
infra: [TRTLLM-5879] Spilt single GPU test and multi GPU test into 2 pipelines ( #5199 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-16 18:04:04 +08:00
Bo Li
fc2347eaf5
chore: Cleanup disable_fp4_allgather. ( #6006 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-16 17:54:36 +08:00
qsang-nv
8ef8e73002
update spec_dec ( #6079 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-07-16 17:50:43 +08:00
Tomer Shmilovich
0552a02943
BlockManager copy constructor fix ( #5982 )
...
Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com>
2025-07-16 17:33:17 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Martin Marciniszyn Mehringer
10349b54df
fix: Add $HOME/.local/bin to PATH when running docker in local user mode ( #6062 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-16 10:35:27 +02:00