ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
Aurelien Chartier
|
1389f5a4d3
|
feat: Add support for fp8 rowwise quantization (#4876)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
|
2025-06-14 06:37:48 -07:00 |
|
Tailing Yuan
|
0b60da2c45
|
feat: large-scale EP(part 7: DeepEP integration) (#4792)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-14 19:12:38 +08:00 |
|
yunruis
|
b99c5ce8c1
|
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
Signed-off-by: yunruis <yunruis@nvidia.com>
Signed-off-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
Signed-off-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
Co-authored-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
|
2025-06-14 17:36:22 +08:00 |
|
nv-guomingz
|
3b7b5a5ad5
|
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-14 14:23:13 +08:00 |
|
Enwei Zhu
|
5f2785fb90
|
fix: Fix waive list (#5205)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-13 23:33:23 +08:00 |
|
Mike Iovine
|
25aa3881d7
|
[nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len (#4879)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-13 11:06:36 -04:00 |
|
QI JUN
|
952f33dcad
|
CI: move all test cases of TensorRT backend into post merge (#5186)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-13 20:48:48 +08:00 |
|
xinhe-nv
|
30d9d0fa71
|
test: [CI] Add failed cases into waives.txt (#5178)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-13 16:38:51 +08:00 |
|
Zheng Duan
|
4d0a5ad384
|
chore: gracefully exit disagg process in tests; better startup and logging (#5109)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
|
2025-06-13 14:03:55 +08:00 |
|
Ivy Zhang
|
28cd536bd6
|
[test] Update timeout params in QA test list (#5124)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-06-13 13:40:03 +08:00 |
|
Iman Tabrizian
|
01bd4c00b4
|
Add two MTP disaggregated test (#4546)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-13 12:17:45 +08:00 |
|
Daniel Cámpora
|
dec326ba7d
|
[fix] Reenable test return logits (#5160)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-06-13 06:07:22 +02:00 |
|
Yibin Li
|
b79eb34bfe
|
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-06-13 11:37:50 +08:00 |
|
xinhe-nv
|
d9be419f45
|
tests: update tests for b200 (#5180)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-13 11:25:33 +08:00 |
|
ruodil
|
fa582cbe9a
|
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-13 11:09:15 +08:00 |
|
Yuxian Qiu
|
4ae46b6714
|
fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedMoE refactor. (#4930)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-13 10:21:32 +08:00 |
|
Fanrong Li
|
38a907aaca
|
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance (#5119)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-13 08:58:44 +08:00 |
|
Matthias Jouanneaux
|
a0b6c635b1
|
[feat] trtllmGen MoE routing: added support for top groups and top K bounds (#4063)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-06-13 06:00:02 +08:00 |
|
Omer Ullman Argov
|
655bce0b19
|
[fix][test] report individual unittests results to jenkins (#5116)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-13 01:52:09 +08:00 |
|
HuiGao-NV
|
dfeeaf6746
|
Move allreduce_strategy from committed api to reference (#5147)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-12 21:00:20 +08:00 |
|
nv-guomingz
|
cf35a079f9
|
fix:https://nvbugs/5298661 (#5022)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-12 20:41:44 +08:00 |
|
Shi Xiaowei
|
88cba5f354
|
test: waive the NIXL related tests (#5153)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-12 17:02:27 +08:00 |
|
Fanrong Li
|
4d070d3862
|
chore: fix typo in tests (#5092)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-12 15:11:26 +08:00 |
|
Michal Guzek
|
53983ad273
|
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-06-12 15:06:28 +08:00 |
|
ruodil
|
d021cc5126
|
test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-12 14:59:16 +08:00 |
|
tomeras91
|
06d9f1e2f6
|
[test] Use LLM API for Nemotron-H correctness test (#5097)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-06-12 09:54:46 +03:00 |
|
bhsueh_NV
|
505678a286
|
update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114)
Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>
|
2025-06-12 14:40:57 +08:00 |
|
Michal Guzek
|
0daa70999a
|
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#4961)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-06-12 14:32:04 +08:00 |
|
Venky
|
c3b2eb6dab
|
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ (#5066)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-06-12 14:19:15 +08:00 |
|
Lucas Liebenwein
|
49d7268acc
|
[nvbugs/5331013] fix AutoDeploy for PyTorch 25.05 dependency upgrade (#5106)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-12 13:07:27 +08:00 |
|
Netanel Haber
|
e692779ead
|
Solve underallocation in VSWA+/VGQA (#4667)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-06-12 12:12:46 +08:00 |
|
HuiGao-NV
|
43192379af
|
Use backend to replace macro to control enablement of MNNVL all reduce (#4635)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-12 11:22:49 +08:00 |
|
xinhe-nv
|
11b94feff8
|
test: skip disaggregated tests on arm (#5070)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-11 17:00:10 +08:00 |
|
ruodil
|
56abae0835
|
test: add more llama_v3.3_70b cases in perf test (#4979)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-11 15:44:22 +08:00 |
|
Daniel Cámpora
|
fdf1c47d1d
|
[TRTLLM-4995][feat] TRTLLM Sampler log probs support (#4836)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-06-11 08:18:13 +02:00 |
|
Yiqing Yan
|
0a9f105931
|
Waive L0 tests (#5111)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-11 11:53:15 +08:00 |
|
ChristinaZ
|
273c6b9355
|
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the routing unit test (#5065)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-06-11 09:44:35 +08:00 |
|
Zheng Duan
|
580a92521e
|
test: conditional disagg and cache aware balancing for deepseek v3 (#4522)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
|
2025-06-11 09:44:29 +08:00 |
|
Bo Li
|
1b79041f5d
|
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlock. (#4264)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-06-11 09:38:10 +08:00 |
|
Mike Iovine
|
fcd71921f1
|
[fix] Unwaive test_llama_eagle3 (#5042)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-10 18:11:07 -04:00 |
|
Jinyang Yuan
|
194a708d83
|
[fix] Fix test_attention_mla (#5084)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-06-10 14:20:11 -07:00 |
|
nvpohanh
|
7b210ae9c3
|
test: add unit tests for Llama4 min_latency code (#4980)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-06-10 12:10:26 -07:00 |
|
Lucas Liebenwein
|
7ddc4d6282
|
[AutoDeploy] Merge Feature Branch Week 3 (#5054)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-06-11 00:20:43 +08:00 |
|
Tracin
|
6c91f1c7ac
|
Mxfp8xmxfp4 quant mode(#4978)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-10 22:01:37 +08:00 |
|
liji-nv
|
f6a49a9343
|
[CI] waive failing L0 test (#5089)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-06-10 20:40:44 +08:00 |
|
Zongfei Jing
|
6d1f2d0fd7
|
[TRTLLM-3927] [feat] Finalize + Allreduce + add + rmsnorm fusion (#4756)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-06-10 19:55:16 +08:00 |
|
Yiqing Yan
|
8ec8e4559d
|
Waive L0 test (#5077)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-10 16:23:49 +08:00 |
|
tomeras91
|
f121f13ddf
|
[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness (#4954)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-06-10 11:09:37 +03:00 |
|
Yiqing Yan
|
fdfc711261
|
Waive L0 test (#5067)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-10 15:40:57 +08:00 |
|