Yi Zhang
|
5ac92bb8ff
|
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-04 23:23:41 +09:00 |
|
Yiqing Yan
|
3e44db11c9
|
[Infra][nvbugs/5370968] - Unwaive l0 test (#5750)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-04 15:27:53 +08:00 |
|
Yukun He
|
b0354ef43c
|
[5321981] fix: Fix the Llama3.1 405B hanging issue. (#5698)
Correct the output shape of the fusedLayerNormPlugin.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-04 12:29:19 +08:00 |
|
Yi Zhang
|
53394e0030
|
test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-04 13:26:53 +09:00 |
|
brb-nv
|
2b66fe8fbd
|
[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test (#5735)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-04 10:55:34 +08:00 |
|
Dom Brown
|
2aacdba1e4
|
[TRTLLM-6100] fix: Nvbug 5356427: autotuned TRTLLM Gen fp8 block scale MoE illegal memory access (#5676)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-07-04 10:38:08 +08:00 |
|
Faraz
|
8a8d2e9901
|
[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5651)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-03 22:08:15 +09:00 |
|
Linda
|
14f938e510
|
Doc: Update invalid hugging face URLs (#5683)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-03 09:37:01 +02:00 |
|
Emma Qiao
|
2f9d0619c3
|
[Infra] - Waive failed cases on release/0.21 (#5674)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 22:23:54 -04:00 |
|
brb-nv
|
a3c0cf02ce
|
fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-03 09:55:25 +08:00 |
|
Frank
|
92d3a2d0e0
|
[https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking calculation. (#5625)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-07-02 17:48:02 +08:00 |
|
bhsueh_NV
|
d5606b062a
|
fix: [https://nvbugs/5355219] Fix bug of Qwen3 235B CI on dgx_gb200 (#5602)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-02 10:07:01 +08:00 |
|
Kaiyu Xie
|
682b164b9b
|
doc: Fix outdated config in DeepSeek best perf practice doc (#5638)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 04:58:50 -04:00 |
|
Yi Zhang
|
aa0b9278d2
|
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-01 01:06:47 -04:00 |
|
Zheng Duan
|
1824c44004
|
[nvbug 5300551] test: increase block count in eviction test (#5465)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-01 10:48:25 +08:00 |
|
nv-guomingz
|
9fe1dd6be1
|
fix:https://nvbugs/5362398 (#5609)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 13:29:40 -04:00 |
|
Yan Chunwei
|
d6c81bad97
|
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-01 00:48:59 +08:00 |
|
Emma Qiao
|
647e070ed6
|
[Infra][release/0.21]Update nccl to 2.27.5 (#5539)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-29 20:50:15 +08:00 |
|
Venky
|
4fc0666daa
|
[cherry-pick] [CI] Waive test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] (#5553)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-06-28 01:15:04 +08:00 |
|
ixlmar
|
abb7357f25
|
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions (#5490)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-27 07:09:41 -07:00 |
|
Yan Chunwei
|
b78ad754c8
|
ci: unwaive llmapi launch test (#5281)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-27 14:10:45 +08:00 |
|
Emma Qiao
|
e2054bb2aa
|
[Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-27 13:58:13 +08:00 |
|
ixlmar
|
312fd47f84
|
fix: constrain grepping in docker/Makefile (#5493)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-26 13:44:40 +02:00 |
|
Kaiyu Xie
|
30a2a8b81c
|
doc: Fix benchmark cmd in disagg scripts (#5516)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-26 17:23:24 +08:00 |
|
ixlmar
|
a811077f90
|
fix: fix regression in LOCAL_USER (#5517)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-26 11:10:55 +02:00 |
|
Anurag Mukkara
|
c2799d0465
|
[nvbug/5354825] Fix nougat test image url (#5496)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
|
2025-06-26 10:10:18 +08:00 |
|
Yan Chunwei
|
87ead4ecbe
|
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-26 07:35:35 +08:00 |
|
Martin Marciniszyn Mehringer
|
fc64f139e4
|
Fix permission for local user issues in NGC docker container. (#5373)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-06-25 14:10:20 +02:00 |
|
Emma Qiao
|
b6d23d58c4
|
[Infra] - Waive failed tests on release/0.21 (#5477)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-25 19:01:55 +08:00 |
|
HuiGao-NV
|
5cd87bee41
|
tests: Set kv cache free memory fraction in test case (#5462)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 16:27:46 +08:00 |
|
ruodil
|
5e50fcc51b
|
test: set enable_attention_dp=True in default deepseek settings (#5461)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-25 14:21:14 +08:00 |
|
Wanli Jiang
|
af5839303d
|
feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-06-25 14:10:50 +08:00 |
|
brb-nv
|
32f50ded17
|
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-06-25 11:45:14 +08:00 |
|
Ivy Zhang
|
9e110b2d11
|
tests: fix typos in qa test (#5421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-06-25 10:42:34 +08:00 |
|
Kaiyu Xie
|
2b56957fb5
|
Fix: missing clientId when serialize and deserialize response (cherry-pick #5231) (#5378)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-24 10:00:37 +08:00 |
|
Yi Zhang
|
2d5e202484
|
fix: Fix skip by mpi size fixture (#5355)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-06-22 02:51:01 +08:00 |
|
Martin Marciniszyn Mehringer
|
ebc6dbcb0b
|
doc: cherry pick #5334 (#5368)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-06-19 20:03:59 +08:00 |
|
Emma Qiao
|
8686805a3b
|
[Infra]cherry pick sanity check yml change for 5080 and 5090 from main (#5363)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-19 15:33:57 +08:00 |
|
ruodil
|
e87cf62c12
|
tests: cherry-pick from main branch, add qwen3 test cases and amend test name in perf test (#5357)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-19 14:34:05 +08:00 |
|
Yiqing Yan
|
decfe2fdb3
|
chore: bump version to 0.21.0 (#5325)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-19 12:58:44 +08:00 |
|
Yiqing Yan
|
da576bcafa
|
Waive L0 test (#5349)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-19 12:01:11 +08:00 |
|
Fanrong Li
|
6c3210a8be
|
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-19 09:48:22 +08:00 |
|
nv-guomingz
|
6a388b105a
|
chore: remove torch_compile prefix for TorchCompileConfig field members (#5261)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-19 09:21:51 +08:00 |
|
Zongfei Jing
|
2b23cd56ce
|
[feat] Fusion finalize and allreduce for qwenmoe model (#5223)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
|
2025-06-19 08:03:58 +08:00 |
|
Robin Kobus
|
1a7c6e7974
|
ci: Split long running jobs into multiple jobs (#5268)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-19 06:24:29 +08:00 |
|
Yan Chunwei
|
3946e798db
|
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-19 06:13:53 +08:00 |
|
Omer Ullman Argov
|
0b6d005ef6
|
[fix][test] clear cuda cache before unittests automatically (#5121)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-19 00:36:53 +03:00 |
|
Aurelien Chartier
|
d25f93c07f
|
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#5293)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-18 11:13:12 -07:00 |
|
Omer Ullman Argov
|
5010f8719d
|
[fix][test] remove duplicate test runs (#5241)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-19 01:59:54 +08:00 |
|
Omer Ullman Argov
|
a28a152001
|
[fix][test] remove some cpp test cases from h100 (#5335)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-18 20:40:26 +03:00 |
|