Chenghao Zhang
18fbda5cdb
[None][feat] AutoDeploy: Add A_log fusion for Mamba layers ( #9422 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-26 14:39:20 -08:00
Chenghao Zhang
bc7b60e016
[None][feat] AutoDeploy: Remove redundant copies in mamba layers ( #9461 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-26 14:38:33 -08:00
Chang Liu
b10137fdd5
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model ( #9376 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-26 16:38:25 +08:00
JunyiXu-nv
b7308a4000
[ https://nvbugs/5580099 ][fix] Cherry pick IMA issue fix from release/1.1 ( #9032 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-26 13:09:06 +08:00
Wanli Jiang
d100599ea7
[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm ( #9246 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-26 11:12:35 +08:00
shuyixiong
d8acea1db3
[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights ( #9224 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-26 10:59:06 +08:00
QI JUN
5972119e1c
[None][ci] move some slow test cases of DGX-B200 to post merge ( #9467 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-26 10:48:53 +08:00
fredricz-20070104
6a64cb4c71
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases ( #9356 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-26 10:34:49 +08:00
Chuang Zhu
0e9c7f8c07
[ https://nvbugs/5685143 ][fix] avoid cudaFree overlap with cuda graph ( #9438 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-25 16:20:29 -08:00
Suyog Gupta
e484bec82f
[None][chore] AutoDeploy add multi stream moe pass to default.yaml ( #9430 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-25 14:16:13 -08:00
Robin Kobus
32f53910ef
[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode ( #9308 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-25 22:11:51 +01:00
Eran Geva
afc52d7b93
[ https://nvbugs/5647400 ] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. ( #9145 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-25 10:56:07 -08:00
mpikulski
899fda9e47
[TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs ( #9457 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-25 18:53:53 +01:00
mpikulski
c5f52ab304
[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) ( #9411 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-25 18:46:48 +01:00
Fanrong Li
8da59103d6
[ https://nvbugs/5680905 ][fix] Relax the MMLU accuracy requirement for DS-v3.2 ( #9439 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-26 00:32:20 +08:00
Yan Chunwei
1f43dc8174
[None][ci] waive a test ( #9458 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-25 07:04:20 -08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
Pengyun Lin
fa61825c74
[None][feat] Support custom chat template for tool calling ( #9297 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-11-25 22:07:04 +08:00
Tailing Yuan
51ef0379d2
[None][feat] Add a parser to layer-wise benchmarks ( #9440 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-25 05:45:16 -08:00
Shi Xiaowei
60786574db
[None][fix] Mitigate test timeout issues ( #9445 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-25 20:17:54 +08:00
Chao Ni
a2d9e6250a
[ https://nvbugs/5667922 ][fix] Update long context evaluation config ( #9426 )
...
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
2025-11-25 19:33:38 +08:00
Yanchao Lu
ff02e0f05c
[None][ci] Move more test stages to use OCI machines ( #9395 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Matt Lefebvre <matthewelefebvre@gmail.com>
2025-11-25 15:59:13 +08:00
Eran Geva
6af01dc664
[ #8391 ][chore] test_perf.py to lock clocks read from gpu_configs.yml instead of max freq ( #9409 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-25 09:20:33 +02:00
Emma Qiao
15616e3ee5
[None][infra] Waive failed cases for main branch on 11/25 ( #9429 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-24 23:18:15 -08:00
William Zhang
a4049fc557
[ #9413 ][fix] Minor fixes to nemotron H and custom models in AD ( #9416 )
...
* Why?
There were a couple of issues with the recently merged custom model
injection for AutoDeploy + the reference implementation of nemotron
H:
- `d_mlp` was left in despite being mathematically always null (could
lead to runtime issues during sharding).
- the custom model mapping was inherited by children factories.
* What?
This commit fixes these issues, and refactors the key of the custom
implementation to be based on the name of the configuration class as
well.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-24 20:17:33 -08:00
Suyog Gupta
efd503751f
[ #9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy ( #9322 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-24 19:50:10 -08:00
kris1025
d1c724958d
[None][chore] unwaive ampere kernels test ( #9389 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-11-25 11:28:43 +08:00
xinhe-nv
0a9ae2e3e6
[None][chore] Remove closed bugs ( #9381 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-24 18:49:57 -08:00
Fanrong Li
5a99c9734d
[TRTLLM-8777][feat] Update DeepGEMM to the latest commit to include optimizations for DeepSeek-v3.2 ( #9380 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-25 08:58:08 +08:00
QI JUN
786d308b88
[ https://nvbugs/5685428 ][fix] fix test_openai_chat_multimodal.py ( #9406 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 16:56:33 -08:00
bhsueh_NV
1a93583438
[None][feat] Support Yarn on QwQ-32B model ( #9059 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>
2025-11-25 07:27:28 +08:00
Yibin Li
1ce483c999
[TRTLLM-7967][feat] Adding Starcoder2 PyTorch Backend Support ( #8923 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-11-24 11:23:22 -08:00
Emma Qiao
2c869f2bda
[None][infra] Waive failed cases for main ( #9400 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-24 17:42:19 +08:00
Emma Qiao
af72d93fa9
[None][infra] Waive failed cases on main branch ( #9384 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-23 22:53:02 -08:00
brb-nv
c045e359a7
[ https://nvbugs/5637012 ][fix] Fix helix unit tests ( #9369 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-23 19:34:22 -08:00
QI JUN
34a6d2d28f
[TRTLLM-9302][chore] Move build config from BaseLlmArgs to TrtLlmArgs ( #9249 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 10:54:41 +08:00
Chenghao Zhang
e1c9aa7d6a
[None][chore] AutoDeploy: Add the Nemotron MOE to CI ( #9328 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-23 12:12:12 -08:00
William Zhang
11a0b276fb
[ #9230 ][feat] Slimmed down implementation of nemotron H ( #9235 )
...
* Why?
The reference nemotron H code on HuggingFace is out of date,
and therefore bugged, and has several untested code paths.
This makes an already hairy patching system even hairier.
The proposal is to do away with those patches, and replace the
original implementation with one that is heavily slimmed down.
* What?
This PR sets the basis for an alternative path with such a
slimmed down implementation that:
- fixes bugs in the current HF implementation
- adds no new dependencies to TensorRT-LLM
- does away with unnecessary features for TensorRT-LLM/
AutoDeploy:
- no training related code (dropout, gradient checkpointing, etc.)
- no caching logic (we want to replace it with our own anyway)
- no attention masking where possible
- reuses existing AD custom ops for mamba SSM update /
causal conv1d / attention
In order for the above to be usable in the AD apparatus,
`AutoModelForCausalLMFactory` is extended to allow registrations
of custom model implementations.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-23 03:13:32 -08:00
Yan Chunwei
1ef69ecbb1
[None][ci] waive two ray tests ( #9375 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-23 15:39:01 +08:00
dongfengy
268ea9bb8a
[None][test] Add one-model and overlap-scheduling to eagle tests for GPTOSS ( #9312 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-21 22:52:53 -08:00
Enwei Zhu
13fbd4366a
[TRTLLM-9370][feat] Integration of CuteDSL NVFP4 grouped GEMM (Part 2: SwiGLU Fusion and Finalize Fusion) ( #9288 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-21 14:03:38 -08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve ( #9269 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
mpikulski
095b6864a8
[TRTLLM-8650][fix] beam search request validation ( #8433 ) ( #9228 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:08:45 -08:00
Emma Qiao
041564188c
[None][infra] Waive failed cases in main post-merge on 11/21 ( #9360 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-21 18:01:53 +08:00
QI JUN
b6483ef3e7
[None][ci] waive a test case of test_ad_build_small_multi.py ( #9355 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-21 16:25:04 +08:00
Ivy Zhang
28e9bf6167
[None][chore] add periodic junit xml path in conftest ( #9337 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-20 22:46:25 -08:00
QI JUN
e2a372a3b1
[None][ci] waive test_llm_context_only_timed_out_kv_cache_exhausted ( #9351 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-20 20:20:57 -08:00
Barry Kang
a3433dd54e
[ https://nvbugs/5325296 ][fix] Enable relaxed acceptance test on Blackwell ( #8709 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Zhanrui Sun
62e20a5441
[None][infra] Remove invaild waived tests which not in release branch ( #8841 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jin Li
6185225501
[ https://nvbugs/5488118 ][fix] Unwaive passed tests ( #8758 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Dom Brown
0c8de1f45d
[ https://nvbugs/5575841 ] [test] Move test_moe.py to serial tests to improve stability + unwaive FP4 MoE torch unit tests ( #8422 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
xiweny
05aabfbc1e
[ https://nvbugs/5601203 ] [fix]Restrict fp8 blockscale moe case ( #8583 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Eran Geva
3d66e56adb
[ https://nvbugs/5572320 ][fix] Ported test_ad_trtllm_bench.py from main ( #8671 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yukun He
9a79f32f7a
[ https://nvbugs/5608489 ][fix] Fix output unpack issues for Llama3/4 NVFP4 models. ( #8679 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Ivy Zhang
25c0624750
[None][test] Clean cache for certain easily hang cases ( #8619 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jie Li
36e244f35e
[ https://nvbugs/5587456 ][fix] Remove multimodal test cases using TRT backend ( #8611 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
348668e3ae
[ https://nvbugs/5575902 ][fix] set max_batch_size=1 to stabilize accuracy test result ( #8609 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
33b0b945c7
[ https://nvbugs/5582277 ][fix] rework DisaggPPTerminationHandler to fix hang issue ( #8519 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Pengyun Lin
81fd9be87d
[ https://nvbugs/5575829 ][fix] Unwaive gpt-oss test ( #8576 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Bo Deng
4ca6fe83d8
[ https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… ( #8500 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Guoming Zhang
af3900a195
[ https://nvbugs/5504095 ][fix] Unwaive test_user_specify_workspace case. ( #8316 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Simeng Liu
9286223288
[ https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. ( #8440 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
JunyiXu-nv
ee6944bfa2
[ https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 ( #8429 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
yufeiwu-nv
0e746fad45
[ https://nvbugs/5667454 ][test] Fix Test Case as Chunked Attention not Supported on sm_120 ( #9260 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-20 00:58:42 -08:00
Liao Lanyu
04ad9f96fa
[ https://nvbugs/5667687 ][fix] Set correct lm_head_tp_size_upper_bound ( #9300 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-20 00:41:00 -08:00
Emma Qiao
b018b2698d
[TRTLLM-9164][infra] Enable checking duplicate items in waives.txt in pre-commit ( #9265 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-20 15:47:23 +08:00
mpikulski
a39e8c5567
[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema ( #9305 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-20 08:32:23 +01:00
QI JUN
1bdd3ba173
[None][ci] waive test_disagg_server_restart ( #9326 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-19 22:34:03 -08:00
Yechan Kim
d5622b2689
[None][fix] Multimodal InputProcessor dummy builder fix ( #8916 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-19 22:32:21 -08:00
Chang Liu
79a6c9742b
[None][fix] Use fp32 for indexer weight_proj GEMM ( #9243 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-19 21:52:38 -08:00
Chenghao Zhang
cd44f80abd
[ #9316 ][feat] AutoDeploy: Add the accuracy test for Nemotron MOE models ( #9317 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-19 21:48:50 -08:00
Bo Deng
2128f73d58
[TRTLLM-9247][infra] Upgrade NIXL to 0.7.1 ( #9055 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-20 11:01:02 +08:00
Yukun He
b6bced83c0
[TRTLLM-7963][feat] Use CUDAGraph to improve the tuning accuracy for AutoTuner. ( #9089 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-20 08:54:29 +08:00
brb-nv
f6ec6e2222
[None][chore] Waive tests timing out on main ( #9315 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-19 13:10:06 -08:00
NVShreyas
1eae941d77
[ #9237 ][feat] enable iter stats in autodeploy ( #9278 )
...
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2025-11-19 19:29:29 +01:00
Neta Zmora
7ab02ad7b5
[None][feature] AutoDeploy: tighter MoE UT thresholds ( #9195 )
...
Scale down the weights in the MoE test so that the output has reasonable magnitude, allowing for tighter atol and rtol
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-19 08:37:51 -08:00
Bo Li
d8b05894ee
[None][perf] Adjust select_alltoall_method_type. ( #8950 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-19 07:43:55 -08:00
mpikulski
46dd9886bb
[ https://nvbugs/5661877 ][fix] fix test regression in TestBatchedSampling::test_samples ( #9215 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-19 01:44:44 -08:00
xinhe-nv
0f77fec932
[None][chore] Add failed cases into waives.txt ( #9289 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-19 17:03:43 +08:00
CarstyYou
ee941ac779
[ https://nvbugs/5456493 ][feat] add fp8 dense for sm120 ( #9174 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-11-19 14:40:34 +08:00
nvxuanyuc
a79c0dfb43
[None][fix] Update GLM model accuracy test ( #9286 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 21:59:01 -08:00
Emma Qiao
67d3eb26af
[None][infra] Waive failed cases for main branch on 11/17 ( #9266 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-18 20:07:03 -08:00
ChristinaZ
941a54c66a
[None][feat] Update the indexer topK ( #9255 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-19 11:49:00 +08:00
xinhe-nv
286ace22ed
[None][chore] Add failed cases into waives.txt ( #9242 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 19:27:55 -08:00
Ivy Zhang
782dfca7e8
[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error ( #9172 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 18:26:32 -08:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted ( #9155 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
xinhe-nv
35658eab55
[None][chore] Add failed cases into waives.txt ( #9193 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 17:47:55 -08:00
Enwei Zhu
7c4777a571
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM ( #8880 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-18 17:40:12 -08:00
Lizhi Zhou
c789000a62
[ https://nvbugs/5649010 ][fix] increase status-checking interval to avoid instability ( #9203 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-19 08:55:42 +08:00
Bo Deng
34f845bf69
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests ( #9247 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-11-18 14:46:20 -08:00
Ajinkya Rasane
8d7cda2318
[None][chore] Update the Flux autodeploy example ( #8434 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-11-18 14:16:04 -08:00
Ziyi Xiong
7c4344b92e
[ https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv ( #9210 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-18 15:41:56 -05:00
mpikulski
04fb481da3
[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding ( #9178 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-18 09:41:59 -08:00
Kaiyu Xie
d076aa44d3
[None] [tests] Unwaive wide ep related tests ( #9204 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-18 08:54:46 -08:00
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). ( #8194 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
Ivy Zhang
160b361588
[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check ( #9088 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 05:55:00 -08:00
Ivy Zhang
ca41a71f92
[TRTLLM-8948][test] Add long bench case ( #9165 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 04:41:48 -08:00
Tri Dao
fc088e642c
[None][feat] Support Glm4MoeForCausalLM ( #8256 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 09:43:21 +08:00
QI JUN
c3376fa114
[None][ci] split speculative test case into several small cases ( #9209 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-17 17:02:25 -08:00
Robin Kobus
df41f220a2
[TRTLLM-8831][feat] Enable early exit with overlap scheduler ( #8587 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-17 18:07:13 +01:00
Kaiyu Xie
04be5a704e
[None] [fix] Fix missing ActivationType issue ( #9171 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-17 10:43:25 +08:00
Anthony Chang
86cfb3ea7e
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound ( #9025 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-17 10:04:29 +08:00
Emma Qiao
d16b1a84c5
[None][infra] Waive a failed case in pre-merge stage 11/16 ( #9192 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-17 09:36:56 +08:00
sunnyqgg
7862b15a65
[TRTLLM-8778][feat] Add tree attention support for blackwell arch ( #8975 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-11-17 09:01:53 +08:00
Emma Qiao
2854f0cf3d
[None][infra] Waive failed tests for main branch 11/15 ( #9187 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-16 01:48:25 -08:00
brb-nv
63237494db
[None][chore] Waive failing tests blocking pre-merge ( #9189 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-16 01:06:03 -08:00
Erin
fe69243157
[None][chore] Add placement test for ray executor ( #9122 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-14 23:10:59 -08:00
Chang Liu
bed4e95e9f
[ https://nvbugs/5629887 ][fix] Add missing device count guard for DSv32 multiGPU tests ( #9159 )
2025-11-14 07:52:23 -08:00
xinhe-nv
49b7e6301a
[None][chore] Add failed cases into waives.txt ( #9156 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-14 06:28:22 -08:00
mpikulski
80bf840e69
[TRTLLM-9295][fix] unflake test_overlap_scheduler.py::test_overlap_scheduler_consis… ( #9146 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-14 11:36:22 +01:00
yuanjingx87
d72321a32e
[None][ci] Waive unittest/_torch/sampler/test_torch_sampler.py::TestBatchedSampling ( #9161 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 01:49:26 -08:00
Suyog Gupta
d12cb9436d
[None][feat] Autodeploy add triton configs and optimize mamba prefill ( #9083 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-13 19:15:43 -08:00
QI JUN
3c950910a0
[None][ci] waive test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0] ( #9162 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-13 18:56:37 -08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module ( #8682 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Tailing Yuan
cc4c980e03
[None][feat] Add Qwen3-Next to layer-wise benchmarks ( #9065 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-14 10:03:00 +08:00
JunyiXu-nv
fdb0787e85
[None][chore] Support json_schema in response_format ( #8934 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-14 09:43:13 +08:00
Erin
44d1c75701
[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server ( #8765 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-13 17:21:24 -08:00
Neta Zmora
34dc6869f3
[ #8732 ][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 ( #9011 )
...
Update TRTLLM Cutlass MoE kernels with ReLU2 activation.
Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function.
The PR adds this and adds an API to set the activation function, in general.
The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954 .
The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from
Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`).
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 16:54:45 -08:00
dongxuy04
a370643b26
[None][fix] support topk autotuner input for expert slot per group larger than 32 ( #9087 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-14 08:37:20 +08:00
Frida Hou
e96a3d294d
[None][autodeploy] minor refactor to rmsnorm transforms ( #8657 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-13 13:13:58 -08:00
Ziyi Xiong
a7aaf50541
[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding ( #8706 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-13 10:20:16 -05:00
William Zhang
121140cfec
[None][fixes] Add tool call parsing fixes and Qwen3 coder parser ( #8817 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-13 04:34:38 -08:00
Lizhi Zhou
48a27c7bef
[ https://nvbugs/5633340 ][chore] unwaive test_auto_scaling.py::test_disagg_server_restart ( #9131 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-13 01:45:36 -08:00
Emma Qiao
d0ea417ec8
[None][infra] Waive failed tests for main 11/13 ( #9132 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-13 01:00:40 -08:00
xinhe-nv
548f5ce4bc
[None][fix] waive failed tests ( #9090 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 23:40:00 -08:00
xinhe-nv
8fa3c55c76
[None][chore] Remove closed bugs ( #9114 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 22:49:37 -08:00
ruodil
c86e36fe38
[None][test] add deepseek and qwen cases for rtx series ( #8839 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-11-12 22:28:02 -08:00
HuiGao-NV
cde18c12da
[ https://nvbugs/5640873 ][fix] Move thop tests to pre-merge ( #9094 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-13 13:08:13 +08:00
Zhang Ge
49df731b96
[ #6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels ( #6917 )
...
Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-13 12:14:58 +08:00
Yan Chunwei
4fd93bdc2c
[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming ( #9118 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-12 19:55:09 -08:00
Yan Chunwei
8a8883bc73
[None][chore] Waive test_llm_rpc_streaming ( #9113 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-13 11:06:26 +08:00
Zhenhuan Chen
943b05e2d3
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number ( #9003 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-13 10:34:17 +08:00
QI JUN
3416efbc29
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill ( #9111 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-13 10:06:32 +08:00
dongxuy04
9241ccaf27
[None][feat] Enable EPLB for trtllm-gen and cutlass backend ( #8886 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-12 12:30:27 -08:00
Chenghao Zhang
5f26c31954
[ https://nvbugs/5636912 ][fix] AutoDeploy: Unwaive the test ( #9018 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-12 12:26:38 -08:00
Patrice Castonguay
8a751a0e56
[None][chore] Remove is_disaggregated param in executor request queue ( #9049 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-12 13:37:15 -05:00
Fanrong Li
780d4f9dc5
[None][feat] Add MTP>1 support for DS-v3.2 ( #9045 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-12 09:56:12 -08:00
Iman Tabrizian
cdde15b275
[TRTLLM-8540][feat] Add support for disagg in DSv3.2 ( #8735 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-11-12 08:21:11 -08:00
mpikulski
264d38e6c5
[TRTLLM-9175][test] ensure sampling is async ( #9076 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-12 15:27:52 +01:00
yufeiwu-nv
b7a2574c60
[ https://nvbugs/5568991 ][test] Remove Phi-3 models ( #9066 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-12 03:16:36 -08:00
QI JUN
4003dc7574
[None][ci] waive some test cases of disaggregated serving ( #9085 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-12 15:06:21 +08:00
Emma Qiao
bb6eb9510d
[None][infra] Waive a failed case of disaggregated/test_disaggregated.py ( #9074 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 19:38:32 -08:00
QI JUN
fd703fbb7b
[None][ci] run speculative unit tests serially ( #9080 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 19:06:44 -08:00
Lucas Liebenwein
aca56097cb
[None][fix] AutoDeploy: update nano3 accuracy test ( #9061 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-11 12:26:31 -08:00
QI JUN
524754b6fd
[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner ( #7572 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 10:13:45 -08:00
Chenghao Zhang
ec9cf715a2
[None][feat] AutoDeploy: Perf improvement for mamba layers ( #8991 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-11 08:27:07 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm ( #8840 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
mpikulski
b151de4a8f
[TRTLLM-8377][test] unit tests for TorchSampler batched sampling ( #9012 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-11 07:16:42 -08:00
HuiGao-NV
23c388c58b
[ https://nvbugs/5616189 ][fix] Make more cases use local cached models ( #8935 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-11 03:14:05 -08:00
QI JUN
0ce22ce928
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] ( #9069 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 02:11:15 -08:00
Yiqing Yan
b7d51c5549
[None][chore] Remove duplicated waive test ( #9067 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-11 16:49:49 +08:00
Emma Qiao
da1f0e2465
[None][infra] Waive failed tests on main 11/11 ( #9058 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 13:19:30 +08:00
xinhe-nv
fac522056c
[None][chore] Add failed cases into waives.txt ( #8998 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-11-11 12:40:59 +08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling ( #8988 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py ( #8938 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
dongfengy
972c21c142
[None][chore] Clean up unused and confusing code in moe test ( #9019 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-10 18:52:21 -08:00
Yechan Kim
0938a3ad2a
[ https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix ( #9034 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-11 10:24:31 +09:00
Frida Hou
f40e1f7496
[ https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp ( #9047 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-10 16:25:58 -08:00
xiweny
50c486367a
[ https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm ( #9042 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-10 08:12:14 -08:00
ChristinaZ
2e7769d1e8
[None][feat] Add customized topk and related unit tests for DSA ( #8882 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-10 03:35:35 -08:00
xinhe-nv
f848d844d9
[None][chore] Add failed cases into waives.txt ( #9030 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-09 23:36:05 -08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 ( #8943 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 ( #9006 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
Bo Li
67af7c15a5
[ https://nvbugs/5637037 ][fix] Update unwaive list. ( #9001 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-10 08:53:07 +08:00
Emma Qiao
183778d58a
[None][infra] Waive failed tests for main 11/07 ( #9008 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-08 08:51:35 -08:00
Emma Qiao
2af6a537ad
[TRTLLM-8999][infra] Reduce gb200 multi-node test stages ( #8778 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-08 06:34:24 -08:00
mpikulski
533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend ( #8951 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
Chang Liu
7081f254cf
[None][perf] Add custom indexer k cache scatter op ( #8960 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 11:24:26 -08:00
Patrice Castonguay
d8ea0b967f
[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout ( #8892 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-07 07:33:51 -08:00
Yuxian Qiu
7b82ba90da
[ https://nvbugs/5629790 ][chore] unwaive test. ( #8967 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-07 18:41:32 +08:00
Stefan Niebler
326a201473
[ https://nvbugs/5508536 ][fix] Take Over ( #8627 ): Reintroduce: Move stop_criteria to sample_async ( #7041 ) ( #8794 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely ( #8856 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Lizhi Zhou
b26e1617f2
[ https://nvbugs/5633340 ][fix] kill processes properly after test ( #8970 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-06 21:45:38 -08:00
xiweny
ee20e679a9
[ https://nvbugs/5636986 ][fix] Fix DeepGemmMoe get_buffer calls ( #8939 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-06 19:57:19 -08:00
Simeng Liu
9f8d93f89a
[ https://nvbugs/5606136 ][ci] Remove tests for deprecating triton multimodal models. ( #8926 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-11-06 17:58:42 -08:00
jthomson04
fcae852cef
[None][fix] Fix KV cache clearing with KV Connector API ( #8750 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-06 14:28:27 -08:00
Chenghao Zhang
ddf2d010e2
[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear ( #8820 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-06 11:00:10 -08:00
DylanChen-NV
b275635a9a
[ https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill ( #8910 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-11-06 07:41:21 -08:00
shuyixiong
c73efe12e7
[None][chore] Use cached model in all ray tests ( #8962 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-06 15:14:15 +01:00
Fanrong Li
d246f62868
[ https://nvbugs/5630345 ] [chore] skip deepseek-v3.2 fp8 kv tests on pre-Blackwell architectures ( #8973 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-06 03:41:37 -08:00
yunruis
51545560da
[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation ( #8495 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-11-06 17:39:57 +08:00
Yilin Fan
b7798bfab8
[None][feat] Add trtllm_ prefix for exposed metrics ( #8845 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-11-06 15:27:18 +08:00
xinhe-nv
e822184cd7
[None][feat] add waive by sm version ( #8928 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-05 19:20:43 -08:00
Lucas Liebenwein
7a552c450a
[ https://nvbugs/5606166 ][fix] AutoDeploy: unwaive test for use tuples for cudagraph shape lookup ( #8957 )
...
also updated test waive for another nvbug
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-05 16:27:00 -08:00
Lucas Liebenwein
b181568d6f
[TRTLLM-8201][feat] Nemotron H MoE Sharding ( #8744 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-05 12:35:29 -08:00
Chang Liu
e57d83c5dc
[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt ( #8771 )
2025-11-05 07:57:09 -08:00
Fanrong Li
c2feed798a
[ https://nvbugs/5630345 ][chore] unwaive DS-v32 nvfp4 and fp8 tests ( #8887 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-05 03:49:23 -08:00
Chuang Zhu
595f78078c
[ https://nvbugs/5624367 ][fix] Fix disagg GPT-OSS test ( #8870 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-05 01:47:09 -08:00
Emma Qiao
31116825b3
[None][infra] Waive failed cases on main 11/05 ( #8936 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-04 22:54:45 -08:00
xinhe-nv
cc4aa29523
[None][chore] Add failed cases into waives.txt ( #8865 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-04 19:26:50 -08:00
Shiyu Li
eeb56c2848
[None][feat] MNNVLAllreduce Kernel Refactor ( #8018 )
...
Signed-off-by: Shiyu Li <timlee0212@outlook.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-11-05 08:49:47 +08:00
Yechan Kim
ed81173c55
[None][ci] Add test on waives ( #8915 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-05 08:42:08 +08:00
Patrice Castonguay
782824533e
[ https://nvbugs/5587574 ][fix] Increase server timeout to wait for weight loading ( #8806 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-04 12:11:08 -08:00
Frida Hou
11ded113cd
[ #8389 ][fix] Update group attention matching to first map to custom torch attention ( #8638 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-04 12:00:43 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration ( #8302 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Yanchao Lu
e2b2675120
[None][fix] Remove duplicated test waives ( #8914 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 23:04:33 +08:00
Bo Li
e4bf29bc66
[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. ( #8728 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-04 21:36:29 +08:00
Robin Kobus
7e4b87b17c
[None][ci] Remove outdated test entries ( #8909 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-04 05:32:46 -08:00
Cao Dong
dddfcdd3bf
[None][fix] Fix bug of undefined py_topk_logprobs_vals ( #8789 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-04 19:32:59 +08:00
xiweny
cae468cc8e
[ https://nvbugs/5596343 ] [test] Waive flaky GPT-OSS cases ( #8904 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-04 03:00:00 -08:00
Zhanrui Sun
4de31bece2
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 ( #8838 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 18:59:34 +08:00
CarstyYou
4296c9553d
[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 ( #8844 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-11-04 18:10:36 +08:00
Ivy Zhang
23717cdb3f
[TRTLLM-8580][test] save runtime report periodically ( #8312 ) ( #8455 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
xiweny
ce23e24123
[ https://nvbugs/5565565 ] [fix] Remove waiver ( #8450 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yukun He
6c8ba3be27
[None][chore] Remove duplicate log outputs in test_perf.py ( #8418 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
ruodil
102e556863
[None][test] cherry-pick: add test-model-suites in integration conftest.py ( #8388 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yukun He
2225745782
[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising ( #7870 )
...
Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it.
Implemented new AllreduceOp heuristic:
- Added Linear programming-based heuristic implementation.
- Added LUT-based heuristic implementation and corresponding code generation script.
AllreduceOp minor fixing:
- Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set.
- Fixed a minor TWOSHOT kernel perf issue.
- Cleaned up Dispatching code in AllReduceOp.
This PR will fix the perf gaps reported in:
https://nvbugspro.nvidia.com/bug/5517023
For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Patrice Castonguay
65c138108e
[ https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg ( #8372 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Ivy Zhang
9bcd2e6c0a
[None][chore] Update nim test list ( #8356 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Stanley Sun
def9c0004d
[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled ( #8357 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
xiweny
fcac2022e2
[ https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 ( #8228 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yueh-Ting (eop) Chen
bd1c9c0af4
[ https://nvbugs/5625990 ][chore] Add test coverage for current incapability of the KV cache manager ( #8829 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-04 16:35:45 +08:00
Yechan Kim
67208f1512
[None][fix] InputProcessor config naming convention fix ( #8705 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 22:29:21 -08:00
Emma Qiao
4fe47faf47
[None][infra] Waive failed tests for main branch ( #8897 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-03 22:21:28 -08:00
Zhanrui Sun
9ec6a6b68f
[None][infra] waive failed test on main 11/4 ( #8896 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-03 21:37:09 -08:00
Matthias Jouanneaux
d0f107e4dd
[TRTLLM-5966][feat] Helix: add full MLA support for Helix ( #8104 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-11-04 09:06:58 +08:00
Mike Iovine
5e6f1bcd24
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage ( #8767 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-03 10:12:10 -08:00
Yechan Kim
f48968b6cc
[TRTLLM-6928][fix] Refactor multimodal unittest ( #8453 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 06:01:07 -08:00
Emma Qiao
14bc8571ae
[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d ( #8319 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-03 05:26:17 -08:00
Emma Qiao
d7176768cd
[None][infra] Waive the failed test for main on 11/3 ( #8875 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-03 02:52:52 -08:00
Tailing Yuan
8303cfa477
[None][fix] Fix import issues in layer-wise benchmarks ( #8827 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-03 02:32:48 -08:00
xinhe-nv
4873ca04cc
[ https://nvbugs/5521799 ][fix] add harmony channel validation ( #8837 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 02:31:54 -08:00
xinhe-nv
64540451e7
[None][chore] Add failed cases into waives.txt ( #8872 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 01:19:04 -08:00
Fanrong Li
e9f78c687a
[ https://nvbugs/5625962 ][chore] unwaive DS-v32-fp4 tests ( #8853 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-03 00:34:52 -08:00
Yechan Kim
00c0e6c440
[ https://nvbugs/5523315 ][fix] Fix serve benchmark test ( #8255 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 00:30:13 -08:00
chenfeiz0326
cc4ab8d9d1
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database ( #8653 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-11-03 16:23:13 +08:00
yufeiwu-nv
b4d17d1a4c
[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config ( #8753 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-03 13:34:06 +08:00
Chang Liu
f57dc01e6f
[ https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input ( #8846 )
2025-11-02 17:44:08 -08:00
dongfengy
6d6797c792
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test ( #8661 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-11-02 16:44:02 -08:00
Yan Chunwei
1551ed8e5f
[ https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests ( #8567 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-01 06:49:33 -07:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs ( #8600 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
dongxuy04
bba2519726
[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests ( #8720 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 16:39:51 -07:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache ( #8692 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
dongfengy
0edba5a7e2
[ https://nvbugs/5474119 ][fix] Re-enable test ( #8809 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-31 10:17:58 -07:00
Patrice Castonguay
afa75c9494
[ https://nvbugs/5614506 ][chore] Adding e+p+d e2e test ( #8801 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-31 09:52:42 -07:00
Anthony Chang
852e5060aa
[ https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json ( #8617 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-31 04:41:44 -07:00
Tailing Yuan
98453d2bb7
[None][fix] Waive layer-wise benchmark tests ( #8823 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 22:51:31 -07:00
Chang Liu
3a79d03874
[ https://nvbugs/5617275 ][fix] Extract py files from prebuilt wheel for editable installs ( #8738 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-30 21:40:22 -07:00
Emma Qiao
aecc9655a0
[None][info] Waive failed case for main ( #8826 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 20:43:59 -07:00
HuiGao-NV
1a338e1a05
[None][chore] use cached vila model ( #8788 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-30 20:26:45 -07:00
Yuxian Qiu
025d2926df
[ https://nvbugs/5599515 ][fix] Fix PP bubbles. ( #8687 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-31 10:13:56 +08:00
Yilin Fan
f3224ccd32
[None][feat] Add disagg relay time to time breakdown tool ( #8465 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-30 18:21:45 -07:00
Mike Iovine
b87448b009
[TRTLLM-8978][test] Remove llama 4 spec dec tests ( #8766 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-30 15:47:04 -04:00
Tailing Yuan
ec31363a86
[None][fix] Layer wise benchmarks: use local models, lint ( #8799 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 09:47:46 -07:00
Emma Qiao
9112cffaf3
[None][infra] Waive failed case for main branch ( #8797 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 07:57:35 -07:00
Tailing Yuan
f9c7786dc8
[None][feat] Add layer wise benchmarks ( #8777 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 20:29:34 +08:00
Anthony Chang
f666ad2f6b
[None][feat] Autotuner can iterate through all tactics for test purposes ( #8663 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-30 13:11:25 +01:00
Emma Qiao
a5cc9fe0aa
[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. ( #6256 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-30 01:56:04 -07:00
ChristinaZ
13cfd70f57
[None][feat] Add unit tests and revision in block_level kernel for invalid input ( #8718 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-30 16:42:18 +08:00
WeiHaocheng
cc286687c4
[None][feat] Refactor scaffolding streaming feature and fix openai wo… ( #8622 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-10-30 16:02:40 +08:00
xinhe-nv
a4f75399b9
[ https://nvbugs/5481206 ][fix] update waives ( #8774 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-30 00:43:38 -07:00
Leslie Fang
2072185d76
[ https://nvbugs/5608461 ][fix] exclude InductorSubproc from thread leak check ( #8704 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-30 15:35:15 +08:00
Emma Qiao
7d3cebf34e
[None][infra] Unwaive the tests passed in latest CI and disable a perf stage ( #8775 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 12:48:23 +08:00
Emma Qiao
db99a936b0
[TRTLLM-8971][infra] Update gpu key for B300/GB300 ( #8724 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-29 20:36:44 -07:00
Yuxian Qiu
3176bd3815
[None][fix] Fix UnboundLocalError. ( #8756 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-29 19:41:37 -07:00
HuiGao-NV
ae57738bae
[ https://nvbugs/5547414 ][fix] Use cached models ( #8755 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-29 19:10:10 -07:00
Iman Tabrizian
ae6875fe10
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager ( #8699 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-10-29 08:04:26 -07:00
Emma Qiao
579e1067bf
[None][infra] Waive failed tests on main ( #8759 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-29 21:32:23 +08:00
Yan Chunwei
fc3b6f5331
[None][ci] waive test_rpc.py ( #8745 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-29 05:17:40 -07:00
kris1025
e2c5a38879
[ https://nvbugs/5534574 ][fix] disable spec decoding forever once the request spec decoding is disabled ( #8446 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-29 19:28:43 +08:00
Chang Liu
81eb861df0
[None][chore] Enable GPQA in CI for DeepSeek V3.2 ( #8712 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-29 04:22:22 -07:00
Zheng Duan
d626d13d37
[ https://nvbugs/5607238 ][test] fix working dir in disagg worker test ( #8648 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-29 16:13:52 +08:00
Pengyun Lin
2aade46d18
[TRTLLM-8214][feat] Support Qwen3 tool parser ( #8216 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-29 15:48:29 +08:00
Stefan Niebler
19ca7b15c7
[ https://nvbugs/5593199 ][test] Enhance beam search tests deterministic dummy model ( #8625 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-10-29 06:12:22 +01:00
Chang Liu
5f737b8dbe
[None][perf] Use fp8 quant kernel in DS3.2 indexer module ( #8701 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-29 12:45:09 +08:00
Cheng Hang
15c293a90b
[None][feat] Enable nvfp4 cuda core for sm120 ( #8620 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
2025-10-29 12:39:03 +08:00
xinhe-nv
7ba98a6b20
[None][chore] Add failed cases into waives.txt ( #8684 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-28 20:30:01 -07:00
Yan Chunwei
f2faf2809f
[None][ci] waive test_rpc.py temporarily ( #8743 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-28 19:20:27 -07:00
Zheng Duan
fea5bfbda7
[None][feat] add detailed KV cache transfer time breakdown ( #8521 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-29 10:11:09 +08:00
ruodil
f444fe2deb
[None][test] fix a typo in perf test sampler config ( #8726 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-29 09:53:53 +08:00
Yechan Kim
cf8a1d2ef9
[ https://nvbugs/5596377 ][fix] Fix mm dummy calculation ( #8498 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-29 09:45:21 +09:00
Lizhi Zhou
24167d00eb
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests ( #8602 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-28 17:04:53 -07:00
Kaiyu Xie
227c288441
[TRTLLM-8827] [feat] Enable low precision alltoall for Cutlass and TRTLLMGen backends ( #8675 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-10-29 07:56:48 +08:00
Mike Iovine
00161b315f
[ https://nvbugs/5549111 ][fix] Fix 2-model overlap scheduler accuracy on very long prompts ( #8076 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Michael Iovine <miovine@nvidia.com>
2025-10-28 14:55:34 -07:00
dongfengy
083f3637f1
[ https://nvbugs/5596343 ][test] Update test waive to get back some coverage ( #8702 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-28 14:05:48 -07:00
Lucas Liebenwein
0ee71d95ec
[ https://nvbugs/5606166 ][fix] AutoDeploy: use tuples for cudagraph shape lookup ( #8658 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 10:52:43 -07:00
Anish Shanbhag
a09b38a862
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum ( #8330 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-28 09:17:26 -07:00
dongfengy
5a01f382c1
[ https://nvbugs/5575913 ][fix] Use separate thresholds for 120b/20b gptoss ( #8664 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-28 10:35:07 -04:00
Robin Kobus
e8e2b0697a
[None][chore] Revert "[TRTLLM-7835][test] add default sample config for perf test ( #8523 ) ( #8725 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-28 14:23:38 +01:00
ruodil
6b9b73ee27
[ https://nvbugs/5564465 ][test] ensure deepseek_v3_lite isl + osl < max_seq_len ( #8565 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-28 15:25:52 +08:00
ruodil
bf72eb045e
[TRTLLM-7835][test] add default sample config for perf test ( #8523 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-28 02:22:47 -04:00
yufeiwu-nv
0e36484fba
[None][test] Add gpt_oss_20b Model to Sanity Perf Test ( #8265 )
2025-10-28 13:36:28 +08:00
Aurelien Chartier
0a02f5f25d
[None][chore] Use a cached model path for Ray integration test ( #8660 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-27 19:16:06 -07:00
HuiGao-NV
49974eed75
[None][chore] ISOLATE some cases ( #8690 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-27 22:10:44 -04:00
chenfeiz0326
f5265a087b
[None][infra] Minor Update on Perf Sanity Testdb Files ( #8607 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-28 09:54:48 +08:00
gramnarayan
88b0fbc8ff
[ #8245 ][feat] Autodeploy: Guided Decoding Support ( #8551 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
Yechan Kim
a6017f6266
[ https://nvbugs/5608723 ][fix] Use local data on multimodal tests and unwaive tests ( #8673 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-28 09:20:02 +09:00
Emma Qiao
73a5479b26
[None][infra] Skip failed tests for main 10/27 ( #8686 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-28 08:04:30 +08:00
Aurelien Chartier
1401a3c09c
[None][feat] Add FP8 rowwise GEMMs for B200 ( #8332 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-27 16:33:14 -04:00
Bo Li
9c4432f8a4
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. ( #7499 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-27 13:23:06 -04:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter ( #8127 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
Chenghao Zhang
b9b2802599
[None][feat] Autodeploy: Update the ssm to use slice ( #8667 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-10-27 09:45:20 -07:00
mpikulski
7c8ba71b49
[TRTLLM-8832][feat] fully async _select_generated_logits with tests ( #8628 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-27 16:15:32 +01:00
QI JUN
4fd58137a1
[TRTLLM-8933][chore] remove unused update_executor_config function ( #8678 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-27 10:00:47 -04:00
Kaiyu Xie
c9b08790c2
[None] [test] Add MNNVL AlltoAll tests to pre-merge ( #8601 )
2025-10-27 21:39:44 +08:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing ( #5897 )
...
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Jie Li
ce0d76135d
[ https://nvbugs/5546507 ][fix] skip TRT-Flow test case due to CMake Error in building ( #8677 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-10-27 05:11:47 -04:00
xinhe-nv
8090c9641c
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8672 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 03:20:46 -04:00
xinhe-nv
0ac5cbcac4
[None][chore] Add failed cases into waives.txt ( #8669 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 02:36:28 -04:00
QI JUN
cc5b8b6d28
[None][ci] move some time-consuming benchmark test cases to post merge ( #8641 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-26 22:47:17 -04:00
Emma Qiao
e0728ba8a7
[None][infra] Waive failed case on main 10/26 ( #8668 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-26 22:02:32 +08:00
Chenghao Zhang
a6d20f6f9b
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron ( #8599 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-25 15:26:45 -04:00
Simeng Liu
2b27810198
[ https://nvbugs/5494718 ][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark ( #8514 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-10-24 19:09:07 -07:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension ( #8482 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
jthomson04
02081e2390
[None][feat] Support KV Connector with Disagg Prefill Worker ( #8246 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-24 11:09:06 -07:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache ( #8405 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Yechan Kim
2d86d6be40
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve ( #8528 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-24 12:53:40 -04:00
Chuang Zhu
2420918e5b
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA ( #7952 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-24 08:58:16 -04:00
Emma Qiao
35e35db422
[None][infra] Waive tests on main and remove lines which missed in MI ( #8639 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-24 02:49:23 -04:00
xinhe-nv
2aaedd08cd
[TRTLLM-8638][fix] fix test issues ( #8557 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:16:55 -04:00
xinhe-nv
9a9d647292
[None][chore] Add failed cases into waives.txt ( #8630 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:11:03 -04:00
ruodil
07a957e5cb
[None][test] remove redunctant runtime backend in perf test ( #8358 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-10-24 02:01:34 -04:00
Stanley Sun
6b793d5c3d
[TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests ( #8580 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-10-24 13:23:47 +08:00
xinhe-nv
59375e8bed
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8590 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 00:02:42 -04:00
xinhe-nv
95d39e6e76
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8588 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 23:08:52 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly ( #8561 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
QI JUN
cc81028547
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig ( #8558 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-23 10:32:09 -04:00
Shijie
928247a3f9
[ https://nvbugs/5451205 ][feat] Add cuBLASLt NVFP4 GEMM backend support ( #7943 )
...
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
2025-10-23 15:55:10 +08:00
xinhe-nv
04e2b2752a
[None][feat] add Nemotron-Ultra multi nodes eval tests ( #8577 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 02:44:26 -04:00
Suyog Gupta
2956978da3
[None][feat] Enable rms norm fusion for Nemotron MOE ( #8563 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-23 00:09:42 -04:00
Lucas Liebenwein
77fa5dfee9
[ https://nvbugs/5604136 ][fix] AutoDeploy: correct import for mxfp4_moe unit test ( #8593 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-22 22:11:18 -04:00
sunnyqgg
ea3e0eea51
[TRTLLM-7954][feat] Target model KV cache rellocation ( #8421 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-23 09:36:50 +08:00
Anthony Chang
8a3b870e09
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN ( #8156 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-23 09:14:18 +08:00
Anish Shanbhag
15de45d782
[TRTLLM-8682][chore] Remove auto_parallel module ( #8329 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-22 20:53:08 -04:00
Leslie Fang
e5865de518
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args ( #8493 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 20:03:18 -04:00
brb-nv
00c2b81037
[None][chore] Skip failing import of mxfp4_moe ( #8591 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-22 16:19:22 -04:00
Patrice Castonguay
879039f6d5
[ https://nvbugs/5429636 ][feat] Kv transfer timeout ( #8459 )
...
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
2025-10-22 09:29:02 -04:00
xinhe-nv
b8b2c9efb4
[None][chore] add precommit hook to remove redundant tab and white space ( #8534 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 09:21:54 -04:00
Eran Geva
910e6b9684
[None][fix] fixed cached model path in test ( #8549 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-22 07:47:41 -04:00
Eran Geva
d4b3bae5af
[ #8391 ][fix] check perf by device subtype ( #8428 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-22 12:38:05 +03:00
Yan Chunwei
3f9dbc76c0
[None][fix] fix rpc unique addr related issue ( #8419 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-22 04:47:18 -04:00
Ivy Zhang
912cf4f603
[TRTLLM-8785][fix] fix conflicts between periodic-junit and store-durations ( #8518 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-22 04:36:47 -04:00
Emma Qiao
92e99b6545
[None][infra] Waive failed cases for main branch 10/22 ( #8573 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-22 04:21:56 -04:00
Shi Xiaowei
77940635bb
[ https://nvbugs/5451272 ][fix] unwaive the test ( #8537 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 14:28:42 +08:00
xinhe-nv
187cf12d8f
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8554 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-22 01:26:15 -04:00
Emma Qiao
2b4e812aea
[None][infra] Let CI continue running other isolation tests when an isolation test get hanging ( #8471 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-22 00:07:35 -04:00
chenfeiz0326
6cf1c3fba4
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 ( #7985 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-22 10:17:22 +08:00
sunnyqgg
90080e0e09
[ https://nvbugs/5556020 ][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch ( #8517 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-22 09:58:22 +08:00
Leslie Fang
50d4e5bc06
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor ( #8451 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 08:33:48 +08:00
Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy ( #8469 )
2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling ( #8215 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Lucas Liebenwein
9b54b3bfaf
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype ( #8510 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-21 17:07:06 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens ( #8366 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Emma Qiao
653aa6b6dc
[None][infra] Waive failed tests for main 10/21 ( #8524 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-21 06:24:15 -04:00
Yan Chunwei
9ba5959e8e
[None][fix] the api_stability unify default values of None and inspect._empty ( #8496 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-21 16:57:40 +08:00
xinhe-nv
c566890624
[TRTLLM-8638][fix] Remove closed bugs ( #8478 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 03:48:58 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser ( #8000 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
xinhe-nv
3264d605fb
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8486 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-21 01:20:29 -04:00
ruodil
ab4b9966b2
[TRTLLM-7287][test] add multimodal chunked_prefill cases ( #8011 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-10-20 22:43:47 -04:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token ( #8502 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
Suyog Gupta
7050b1ea49
[ #8272 ][feat] Enable chunked prefill for SSMs in AutoDeploy ( #8477 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-20 15:31:52 -07:00
Venky
3e681e2a80
[None] [chore] Add architecture-specific ATTRIBUTIONS files ( #8468 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-10-20 16:29:15 -04:00
Lucas Liebenwein
55c468b218
[ #8461 ][feat] AutoDeploy: trtllm-serve bug fix + unit test ( #8462 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-20 16:06:39 -04:00
dongfengy
9b289d5230
[ https://nvbugs/5568676 ][fix] Remove test waive ( #8437 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-20 12:03:50 -07:00
HuiGao-NV
d0663e16e0
[ https://nvbugs/5492250 ][fix] Remove isolated cases and unwaive cases ( #8492 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-20 07:40:07 -04:00
Pamela Peng
b818a912d7
[ https://nvbugs/5540752 ][fix] Support quantized Phi4 MM models ( #8190 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-10-20 06:36:09 -04:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements ( #8398 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
QI JUN
d05079ba4b
[None][ci] move some test cases from H100 to A10 ( #8449 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-20 01:58:34 -04:00
Yi Zhang
3c2b3bd4d4
[TRTLLM-7255][feat] Add iteration log parser script for benchmark log ( #6942 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-10-20 01:34:52 -04:00
ChristinaZ
c8b9998acb
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) ( #7761 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-20 10:08:31 +08:00
xiweny
f7722e2b65
[TRTLLM-4866] [test] Support waiving unit tests by waives.txt ( #8359 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-20 09:52:51 +08:00
xinhe-nv
9aa086d3bb
[None][chore] update test duration ( #8377 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-19 20:45:51 -04:00
Emma Qiao
796891ba2a
[None][infra] Skip a failed case in pre-merge for main on 10/19 ( #8479 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 22:19:00 +08:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend ( #7926 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
Emma Qiao
e185173240
[None][infra] Waive test for main branch on 10/18 ( #8472 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-19 04:36:42 -04:00
brb-nv
7cc65a6296
[None][chore] Waive failing transceiver test ( #8473 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-18 17:22:10 -04:00
Lucas Liebenwein
41169fb20c
[None][feat] AutoDeploy: chunked prefill support ( #8158 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-18 00:47:35 -07:00
Kyle McGill
136e0e6882
[None][feat] Enable CUDA graph support for KvConnectorWorker API ( #8275 )
...
Signed-off-by: Kyle McGill <kmcgill@nvidia.com>
Signed-off-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
2025-10-17 18:09:03 -04:00
Anish Shanbhag
5ff4f88be6
[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic ( #8277 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-17 16:13:22 -04:00
h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs ( #8039 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
xinhe-nv
bc833d3de3
[TRTLLM-8638][fix] add waives tests ( #8445 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-17 03:37:53 -07:00
zhhuang-nv
7a2bab93f0
[None][test] Add post merge test for Seed-OSS-36B-Instruct ( #8321 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-10-17 02:30:33 -07:00
yufeiwu-nv
1e1f430163
[None][test] Filter out all fp8 test case for A100. ( #8420 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-10-16 20:42:50 -07:00
Ivy Zhang
70a0f5beb6
[TRTLLM-8580][test] save runtime report periodically ( #8312 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-17 10:47:26 +08:00
John Calderon
46ee7acb33
[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling ( #7539 )
...
Signed-off-by: John Calderon <johncalesp@gmail.com>
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Signed-off-by: john calderon <jcalderon@nvidia.com>
Signed-off-by: John Calderon <jcalderon@nvidia>
2025-10-16 17:49:22 +02:00
Yiqing Yan
05dd437084
[ https://nvbugs/5565541 ][fix] Add timeout threshold for H100 FHMA test ( #8354 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
bhsueh_NV
69325e1aa3
[ https://nvbugs/5574556 ][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI ( #8351 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Lizhi Zhou
982d4b65e8
[ https://nvbugs/5550671 ][fix] fix disagg-serving multinodes test failure ( #8307 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Chuang Zhu
18a534d2b4
[ https://nvbugs/5465642 ][fix] Increase server timeout to wait weight loading ( #8297 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Enwei Zhu
526cad37d7
[ https://nvbugs/5568951 ][fix] Fix guided decoding disagg tests ( #8311 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yechan Kim
4230639370
[ https://nvbugs/5550722 ][fix] Fix image load ( #8093 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
1b559ba91d
[None][chore] Update test configs for release ( #8224 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
4789c1e588
[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list ( #8212 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Ivy Zhang
be2ab98233
[None][chore] Update constaintfor release ( #8211 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yan Chunwei
4e51148088
[ https://nvbugs/5532023 ][fix] unwaive GenerationExecutor tests ( #8251 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yukun He
179c7dc501
[ https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. ( #7960 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
sunnyqgg
dd61454d5f
[ https://nvbugs/5461761 ][fix] Unwaive eagle3 test ( #8363 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-16 09:51:48 -04:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server ( #7637 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
xinhe-nv
f70eff30b3
[TRTLLM-8638][fix] waive llam4 tests on H20 ( #8416 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-16 03:14:56 -07:00
HuiGao-NV
4e6a492aa3
[None][chore] Isolate several intermittent cases ( #8408 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-15 23:48:31 -07:00
Yan Chunwei
42ab473bb0
[ https://nvbugs/5583261 ][ci] waive test_fetch_responses_streaming_sync ( #8407 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-15 23:19:31 -07:00
Min Yu
0a0159fdd8
[ https://nvbugs/5378031 ] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend ( #7286 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
2025-10-16 11:07:48 +08:00
xiweny
4143887370
[ https://nvbugs/5541494 ] [fix] Remove waivers ( #8353 )
...
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-15 19:10:35 -07:00
Yan Chunwei
206cf31705
[ https://nvbugs/5560921 ][fix] GenerationExecutor RPC ( #8209 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window ( #8320 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
dongfengy
7a0aa64973
[None][fix] Refactor triton paddings ( #6980 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-10-15 12:59:01 -07:00
Yukun He
56c20665a9
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. ( #6924 )
...
Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-15 21:18:11 +08:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py ( #8317 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
QI JUN
1a1c9a29ab
[None][ci] move all llama4 test cases to post merge ( #8387 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 16:36:37 +08:00
mpikulski
93a4b7f1b6
[None][chore] update torch_dtype -> dtype in 'transformers' ( #8263 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 17:09:30 +09:00
QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly ( #8208 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
sychen52
6a6124dcb5
[OMNIML-2336][feat] w4a8 nvfp4 fp8 exports scale factor properly ( #8180 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
Co-authored-by: Shiyang Chen <shiychen@omniml-a6.nvidia.com>
2025-10-15 13:41:27 +08:00
Jin Li
206a9930df
[ https://nvbugs/5547435 ][fix] Fix a merge conflict ( #8365 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-15 10:43:10 +08:00
Emma Qiao
493da020c1
[TRTLLM-7351][infra] Add isolate marker for L0 ( #7497 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-14 16:58:14 -07:00
dongfengy
9d855f47ad
[None][fix] Remove outdated test waives for GPTOSS ( #8183 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-14 16:20:38 -07:00
Lizhi Zhou
22471ecc67
[TRTLLM-7846][feat] implement etcd storage for disagg cluster ( #8210 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 16:48:41 -04:00
Michal Guzek
1cdb0b62c3
[ https://nvbugs/5563469 ][fix] Temporarily disable test_nemotron_nano_8b_lora_torch in L0 due to Torch non-determinism ( #8206 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-14 17:55:28 +02:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test ( #8175 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
Yan Chunwei
86be06bda4
[None][ci] waive several rpc tests ( #8349 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-14 03:12:49 -07:00
William Zhang
72d65d079a
[ https://nvbugs/5542878 ][fix] Unwaive test ( #8027 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-10-14 07:58:07 +02:00
xinhe-nv
371fcb0338
[TRTLLM-8366][feat] add kimi multi nodes case ( #8025 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-13 21:36:03 -07:00
Yuxian Qiu
3450fe9944
[None][fix] Fix dummy load format for key models. ( #7993 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-14 11:18:39 +08:00
Lucas Liebenwein
22aa4ac08c
[None][feat] AutoDeploy: VLMs with subgraphs + cudagraph/compile ( #8203 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-13 17:34:09 -07:00
Zheyu Fu
bac665e650
[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. ( #7283 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-10-13 15:51:14 -07:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs ( #7206 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
amitz-nv
bbae7a05f0
[ https://nvbugs/5521949 ][fix] Replace test_codellama_fp8_with_bf16_lora with test_llama_3_1_8b_fp8_with_bf16_lora ( #8199 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-13 06:01:55 -07:00
Po-Han Huang (NVIDIA)
6fc6f70a68
[ https://nvbugs/5441729 ][test] Fix test_modeling_llama_min_latency.py failures ( #7478 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-10-13 15:35:02 +08:00
xinhe-nv
9fe63dd8db
[None][chore] Add failed cases into waives.txt ( #8290 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-13 00:07:00 -07:00
Leslie Fang
8d1b068b1a
[TRTLLM-8477][chore] Replace KvCacheConfigCpp with KvCacheConfig inside PyExecutor ( #8259 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-13 14:55:36 +08:00
xinhe-nv
72fcff1044
[None][fix] add timeout for llama4 ( #8254 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-12 21:04:20 -07:00
Guoming Zhang
989c25fcba
[None][doc] Add qwen3-next doc into deployment guid and test case into L0. ( #8288 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Faradawn Yang <faradawny@gmail.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 10:25:45 +08:00
amitz-nv
fac47e2826
[ https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 ( #8063 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-12 12:29:52 -07:00
Eran Geva
a1ed03fe8a
[None][fix] AD test_trtllm_bench to use small model config and skip loading weights ( #8149 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-12 18:30:20 +03:00
Emma Qiao
fdbeea51d3
[None][infra] Skip failed cases for main branch ( #8293 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-12 08:04:09 -07:00
kris1025
a7ea544dbe
[TRTLLM-7384][feat] enable rejection sampling for CDL ( #7731 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-12 20:38:48 +08:00
brb-nv
56a539cd37
[None][chore] Waive failing pre-merge test on main ( #8282 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-10 23:52:05 -07:00
Yilin Fan
2695d70d42
[None][feat] Add request timing breakdown option in benchmark_serving ( #8128 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-10 09:24:54 -07:00
xinhe-nv
2655995a09
[None][fix] add gc for test fixture ( #8220 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-10 02:50:25 -07:00
bhsueh_NV
d3059dbd8a
[ https://nvbugs/5547416 ][fix] unwaive no_cache test ( #8213 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-10 01:50:13 -07:00
xinhe-nv
b555f1ff98
[None][chore] Add failed cases into waives.txt ( #8229 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-09 23:45:28 -07:00
xinhe-nv
e8c9bae37e
[None][chore] Remove closed bugs ( #8151 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-10 16:39:40 +11:00
Pengbo Wang
7da4b05289
[ https://nvbugs/5501820 ][fix] Add requirements for numba-cuda version to WAR mem corruption ( #7992 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-10-10 10:18:27 +08:00
Emma Qiao
ccd949ea5b
[None][infra] Waive failed tests on main 10/09 ( #8230 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-09 22:46:07 +08:00
amitz-nv
d560054e1b
[None][chore] Restore asserts in pytorch flow LoRA tests ( #8227 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-09 17:10:38 +03:00
bhsueh_NV
27677a36f5
[ https://nvbugs/5516666 ][fix] unwaive some Qwen3 CI tests ( #8130 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-09 09:44:58 +08:00
Lizhi Zhou
fdf29ab8fa
[TRTLLM-7846][feat] Http disagg-cluster management implemention ( #7869 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-09 09:44:01 +08:00
QI JUN
6884d06aed
[None][ci] move some llama4 test cases to pre merge ( #8189 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-08 18:34:08 -07:00
Liao Lanyu
ed8e00ad4a
[ https://nvbugs/5522746 ][fix] unwaive tests caused by node issues after rebooting ( #8193 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-10-09 08:45:56 +08:00
Mike Iovine
c88913dc03
[ https://nvbugs/5541545 ][fix] Remove test_llama4 ( #8031 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-08 15:20:15 -07:00
brb-nv
80517b7812
[None][chore] Waive some tests failing on main post merge ( #8186 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-08 06:52:30 -07:00
mpikulski
8298e93bd8
[TRTLLM-8414][chore] BREAKING CHANGE: refine sampling strategy selection ( #8132 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-08 15:46:50 +02:00
xxi
e98616512f
[ https://nvbugs/5550283 ][fix] update test case to the latest MoE API ( #8165 )
2025-10-07 22:54:34 -07:00
Liao Lanyu
d57b8f0951
[ https://nvbugs/5455140 ][fix] unwaive tests related to GB200 OOM ( #8159 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-10-08 13:14:12 +08:00
ruodil
971610e3ff
[None][test] add test-model-suites option in integration conftest.py ( #8016 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-10-08 10:38:31 +08:00
Mike Iovine
7facac077b
[None][fix] Fix MTP illegal memory access ( #8161 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-07 14:02:55 -04:00
Emma Qiao
ca9da1f1c2
[None][infra] Skip failed cases for main ( #8176 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-07 06:37:51 -07:00
xiweny
9298f1bdcc
[None] [test] Add B300 cases to CI ( #8056 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-06 19:23:31 -07:00
Faraz
27a5091fcb
[None][feat] GPT-OSS Sm120/Sm121 Support ( #7937 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Vincent Huang <vincenth@nvidia.com>
2025-10-06 16:59:06 -04:00
Izzy Putterman
f2657c1ae9
[None][fix] Eagle: Attention DP ( #7939 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-06 16:52:35 -04:00
Lucas Liebenwein
3492391feb
[None][chore] AutoDeploy: clean up accuracy test configs ( #8134 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-06 12:51:01 -07:00
Yan Chunwei
54ab9767b5
[None][chore] fix llmargs conflict ( #8152 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-06 02:34:27 -07:00
amitz-nv
8060aad239
[ https://nvbugs/5521949 ][fix] Re-enable test_bielik_11b_v2_2_instruct_multi_lora, fix its API use with pytorch flow LoRA ( #8146 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-05 04:28:20 -07:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) ( #5543 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Frida Hou
f6654f26a4
[ #5255 ][autodeploy] Update FuseAllreduceResidualRMSNorm to use pattern matcher utility; remove fuse_collective ( #7545 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-05 01:15:46 -07:00
Frida Hou
744246d316
[None][autodeploy] small refactors on attention matching ( #8079 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-03 22:00:27 -07:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Lucas Liebenwein
9d098e3142
[None][feat] AutoDeploy: graph/module inputs with kwargs instead of args ( #8137 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 16:53:42 -07:00
Lucas Liebenwein
2c454e8003
[None][feat] AutoDeploy: Nemotron-H accuracy test ( #8133 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 15:39:03 -07:00
Michal Guzek
38da871db3
[TRTLLM-6496][feat] Add LoRa Torch tests for the latest NIM model list ( #6806 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-03 12:10:48 -07:00
Mike Iovine
ca8291133a
[None][fix] Fix MTP 2-model ( #8115 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-10-03 10:13:50 -07:00
Lucas Liebenwein
aaf2c3c2e5
[None][feat] AutoDeploy: compiler backends based on nn.Module ( #8126 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 12:14:21 -04:00
Ziyi Xiong
7bc2d9e993
[ https://nvbugs/5537878 ][fix] Reserve an extra slot for padded batch ( #7998 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-10-03 08:42:52 -07:00
Lucas Liebenwein
5faa5e9dd8
[None][feat] AutoDeploy: dive deeper into token generation bugs + enable_block_reuse ( #8108 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 04:57:26 -07:00
Erin
ba3dbb6c94
[ https://nvbugs/5548098 ][fix] Fix flakey unit test for dynamic spec d… ( #8129 )
2025-10-02 22:58:37 -07:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement ( #8005 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Eran Geva
4136942436
[ #7588 ][fix] fixed the kv cache size parsing in test_perf.py AD backend ( #8092 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 15:55:31 -04:00
Patrice Castonguay
fefa7d8fa3
[None][feat] Support for cancelling requests with disaggregation ( #8114 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-02 11:04:26 -07:00
dongfengy
6568e565db
[TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss ( #7916 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-02 10:47:04 -07:00
Erin
293637e0a1
[ https://nvbugs/5556020 ][chore] waive test_eagle3 ( #8119 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-02 05:33:21 -04:00
mpikulski
fc7f78c400
[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling ( #8110 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-02 10:20:32 +02:00
Eran Geva
32c7f8c36f
[ #7588 ][feat] lock gpu clocks in test_perf.py to reliably detect perf regressions ( #8099 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 11:18:10 +03:00
brb-nv
bd3d0ad233
[TRTLLM-7733][feat] Executor changes to support helix parallelism ( #7972 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-01 22:13:03 -04:00
Izzy Putterman
1ad7bc4c78
[None][feat] Draft: Save state first pass ( #7012 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-01 18:40:55 -04:00
Frida Hou
de99e23696
[ #5860 ][feat] Add ModelOPT INT4 awq fake quant support in AutoDeploy ( #7770 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-01 13:13:45 -07:00
Yibin Li
d7581bb551
[TRTLLM-8031][feat] Add chunked return_generation_logits logic ( #7831 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-10-01 12:47:07 -04:00
sychen52
ba8abeab10
[OMNIML-2336][feat] add W4A8 NVFP4 FP8 fused moe ( #7968 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-10-01 02:39:33 -04:00
Patrice Castonguay
b77f19f4ff
[ https://nvbugs/5434320 ][fix] fix: Unwaiving disagg pp tests ( #8069 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-01 00:33:59 -04:00
Emma Qiao
b1e3fef8aa
[None][infra] Skip failed tests in post-merge for main ( #8102 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-01 10:12:10 +08:00
brb-nv
84aa3c981e
[None][chore] Waive failing MNNVL alltoall multi-gpu test ( #8106 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-09-30 20:05:42 -04:00
mpikulski
ee5ae49337
[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" ( #8103 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-30 16:53:49 -04:00
Iman Tabrizian
c510b67fa0
[ https://nvbugs/5547414 ][fix] avoid downloading Tiny llama from HF ( #8071 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-30 13:47:59 -04:00
xinhe-nv
1dba9fa89e
[TRTLLM-6239][feat] add test cases into QA test list ( #8081 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-30 00:23:45 -04:00
Kaiyu Xie
b0cb9ca50e
[None] [test] Add MNNVL AlltoAll tests to pre-merge ( #7466 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-29 23:12:24 -04:00
Lucas Liebenwein
dcfd3ef81c
[ #4593 ][feat] AutoDeploy: Linear Attention Support (SSM + causal_conv + Bamba + Nemotron-H) ( #8068 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-09-29 22:41:06 -04:00
Cao Dong
62010c0ab7
[None][feat] Return topk logprobs in torch backend ( #7976 )
...
Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>
2025-09-30 09:32:37 +08:00
Cheng Hang
cdce68c3e0
[TRTLLM-6741][fix] Add heuristics for lm head tp size when enable_lm_head_tp_in_adp=True ( #7891 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-30 09:24:35 +08:00
Patrice Castonguay
6396cb9208
[ https://nvbugs/5538098 ][fix] Checking connection to etcd server in unit test ( #8006 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-29 20:53:32 -04:00
Chang Liu
334e2cab0d
[ https://nvbugs/5542867 ][fix] Fix the non-determinism issue in the mm_encoder test ( #8033 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-29 09:45:16 -07:00
amitz-nv
e5f9b6aaa0
[None][fix] Fix TRT-python multi LoRA TP=2 test arguments ( #8059 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-29 12:20:04 -04:00
mpikulski
31a1a5ff80
[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling ( #7909 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-29 14:52:18 +01:00
xiweny
48e779ae8c
[ https://nvbugs/5541494 ] [fix] add back missing sm100f bmm kernels ( #8051 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-29 05:35:44 -04:00
yufeiwu-nv
3ba6727a68
[None][test] Update get_sysinfo.py to avoid UnboundLocalError ( #7982 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-09-29 05:14:38 -04:00
Gal Hubara-Agam
b2095aa074
[ #4674 ][bugfix] AutoDeploy Fix memory leak in fuse_moe ( #7844 )
...
Delete the unstacked weights immediately to save GPU memory, cleanup occurs automatically after the transformation, but for large models we'll run out of memory during the transformation itself.
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-09-29 11:01:07 +03:00
xinhe-nv
20e6cd39f1
[None][chore] Add failed cases into waives.txt ( #8043 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-29 03:37:39 -04:00
Emma Qiao
ce381d6813
[None][infra] Waive failed cases for main on 0929 ( #8053 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-29 02:46:02 -04:00
HuiGao-NV
7ac932d45e
[ https://nvbugs/5532087 ][CI] Enable test case ( #8029 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-29 01:46:28 -04:00
Ivy Zhang
1e2e851db8
[None][chore] update test case constraint ( #8020 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 13:25:09 +08:00
Eran Geva
9cea6bfb30
[ #7288 ][feat] Added AutoDeploy backend support to test_perf.py ( #7588 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-28 21:21:27 -07:00
Ivy Zhang
0ecafd84da
[None][chore] Update chunked prefill test case configs ( #7868 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 10:37:34 +08:00