Commit Graph

3212 Commits

Author SHA1 Message Date
Jin Li
d594c2d0ff [https://nvbugs/5537348][fix] Use device tensor index for MTP (#8062)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yiqing Yan
05dd437084 [https://nvbugs/5565541][fix] Add timeout threshold for H100 FHMA test (#8354)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
bhsueh_NV
69325e1aa3 [https://nvbugs/5574556][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI (#8351)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Lizhi Zhou
982d4b65e8 [https://nvbugs/5550671][fix] fix disagg-serving multinodes test failure (#8307)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Chuang Zhu
18a534d2b4 [https://nvbugs/5465642][fix] Increase server timeout to wait weight loading (#8297)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Jin Li
47e6eea3fa [https://nvbugs/5543770][fix] Update to Cutlass v4.2.1 (#8055)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Patrice Castonguay
b7602f7bd4 [https://nvbugs/5534837][fix] Fix KV cache split on long context (#8247)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Enwei Zhu
526cad37d7 [https://nvbugs/5568951][fix] Fix guided decoding disagg tests (#8311)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Zhanrui Sun
19241626d0 [https://nvbugs/5563653][infra] reduce docker image layers (#8250)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yechan Kim
4230639370 [https://nvbugs/5550722][fix] Fix image load (#8093)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yechan Kim
9587f099ac [https://nvbugs/5547434][fix] Fix Qwen2.5-VL device_path error (#8057)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
1b559ba91d [None][chore] Update test configs for release (#8224)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
4789c1e588 [TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list (#8212)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
be2ab98233 [None][chore] Update constaintfor release (#8211)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yan Chunwei
4e51148088 [https://nvbugs/5532023][fix] unwaive GenerationExecutor tests (#8251)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yukun He
179c7dc501 [https://nvbugs/5536131][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Enwei Zhu
57a4ef870a [None][fix] Fix chunked prefill state of draft request (#8067)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
sunnyqgg
dd61454d5f
[https://nvbugs/5461761][fix] Unwaive eagle3 test (#8363)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-16 09:51:48 -04:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server (#7637)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
xinhe-nv
f70eff30b3
[TRTLLM-8638][fix] waive llam4 tests on H20 (#8416)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-16 03:14:56 -07:00
xiweny
89d03d7668
[https://nvbugs/5532789] [doc] Add documents about CUDA 12.9 (#8411)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-16 00:05:17 -07:00
HuiGao-NV
4e6a492aa3
[None][chore] Isolate several intermittent cases (#8408)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-15 23:48:31 -07:00
Yan Chunwei
42ab473bb0
[https://nvbugs/5583261][ci] waive test_fetch_responses_streaming_sync (#8407)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-15 23:19:31 -07:00
chinamaoge
ee588a73ac
[None][fix] Fix the error where checkpoint_dir is assigned as NONE wh… (#8401)
Signed-off-by: maoge <maoge23@qq.com>
Co-authored-by: maoge <maoge23@qq.com>
2025-10-16 13:37:43 +08:00
Min Yu
0a0159fdd8
[https://nvbugs/5378031] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend (#7286)
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
2025-10-16 11:07:48 +08:00
Cao Dong
e75b4f9f65
[None][feat] Dev DeepConf (#8362)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-16 11:01:31 +08:00
xiweny
4143887370
[https://nvbugs/5541494] [fix] Remove waivers (#8353)
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-15 19:10:35 -07:00
Wanli Jiang
ebf0e51206
[TRTLLM-8579][feat] Support quantized model for nano-v2-vlm (#8304)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-16 09:44:11 +08:00
ChristinaZ
db1c271bc6
[None][feat] Revise the calculation related to TileN in routing of MOE TRTLLM backend (#8148)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-16 09:15:46 +08:00
Yan Chunwei
206cf31705
[https://nvbugs/5560921][fix] GenerationExecutor RPC (#8209)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window (#8320)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
HuiGao-NV
e265eb5fe9
[None][feat] reuse cudagraph memory pool in normal forward flow (#8095)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-16 07:08:44 +08:00
dongfengy
7a0aa64973
[None][fix] Refactor triton paddings (#6980)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-10-15 12:59:01 -07:00
QI JUN
65ec01b257
[TRTLLM-8532][chore] clean warmup method of ModelEngine (#8264)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 08:40:58 -07:00
Venky
7efaa5216f
[None] [chore] Add OSS compliance to CODEOWNERS (#8375)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-10-15 06:22:32 -07:00
Yukun He
56c20665a9
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. (#6924)
Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-15 21:18:11 +08:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py (#8317)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
QI JUN
1a1c9a29ab
[None][ci] move all llama4 test cases to post merge (#8387)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 16:36:37 +08:00
mpikulski
93a4b7f1b6
[None][chore] update torch_dtype -> dtype in 'transformers' (#8263)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 17:09:30 +09:00
QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly (#8208)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
sychen52
6a6124dcb5
[OMNIML-2336][feat] w4a8 nvfp4 fp8 exports scale factor properly (#8180)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
Co-authored-by: Shiyang Chen <shiychen@omniml-a6.nvidia.com>
2025-10-15 13:41:27 +08:00
Erin
f4e7738f65
[None][doc] Ray orchestrator initial doc (#8373)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-14 21:17:57 -07:00
Kaiyu Xie
c822c117ce
[None] [docs] Update TPOT/ITL docs (#8378)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-10-14 20:50:54 -07:00
Jin Li
206a9930df
[https://nvbugs/5547435][fix] Fix a merge conflict (#8365)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-15 10:43:10 +08:00
Emma Qiao
493da020c1
[TRTLLM-7351][infra] Add isolate marker for L0 (#7497)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-14 16:58:14 -07:00
dongfengy
9d855f47ad
[None][fix] Remove outdated test waives for GPTOSS (#8183)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-14 16:20:38 -07:00
Lizhi Zhou
22471ecc67
[TRTLLM-7846][feat] implement etcd storage for disagg cluster (#8210)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 16:48:41 -04:00
Tailing Yuan
8444a50d3a
[None][fix] Fix is_post_quant_all2all_supported for MNNVL (#8355)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-14 11:49:21 -07:00
Lucas Liebenwein
43c46a09db
[None][chore] AutoDeplopy: Update expert section on yaml configuration in README (#8370)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-14 12:39:28 -04:00
Michal Guzek
1cdb0b62c3
[https://nvbugs/5563469][fix] Temporarily disable test_nemotron_nano_8b_lora_torch in L0 due to Torch non-determinism (#8206)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-14 17:55:28 +02:00