Commit Graph

1314 Commits

Author SHA1 Message Date
Xiaodong (Vincent) Huang
cc2a1344be
None: fix OOM because of unnecessary mha workspace (#5056)
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
2025-06-12 21:56:05 +02:00
pcastonguay
3a04c9fa7b
chore: Include prompt_token_ids only for context-only disagg requests (#5055)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-12 15:00:08 -04:00
Omer Ullman Argov
655bce0b19
[fix][test] report individual unittests results to jenkins (#5116)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-13 01:52:09 +08:00
Mike Iovine
690873ba1a
[nvbug/5334370][fix] Fix one model EAGLE3 (#5134)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-12 10:28:14 -04:00
HuiGao-NV
dfeeaf6746
Move allreduce_strategy from committed api to reference (#5147)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 21:00:20 +08:00
brb-nv
8cfb567182
fix: Updates to yarn implementation (#5105)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-06-12 20:45:34 +08:00
nv-guomingz
cf35a079f9
fix:https://nvbugs/5298661 (#5022)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 20:41:44 +08:00
nv-guomingz
58d4ca2385
fix:remove duplicated trust_remote_code knob from trtllm-serve (#5143)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 19:48:24 +08:00
Daniel Cámpora
22281cfc55
doc: Added documentation for enable_trtllm_sampler. (#4990)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
2025-06-12 18:34:15 +08:00
Venky
59c9588e9a
enh(doc): Add ci-overview in docs/source/reference/ (#5137)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-06-12 17:48:13 +08:00
Shi Xiaowei
88cba5f354
test: waive the NIXL related tests (#5153)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-12 17:02:27 +08:00
nv-guomingz
b563696dee
doc:fix invalid links for trtllm-serve doc (#5145)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 16:17:32 +08:00
Zhanrui Sun
a97f4581d2
infra: upload imageTag info to artifactory and add ngc_staging to save ngc image (#4764)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-12 15:38:47 +08:00
liji-nv
10ab9791ec
[fix] Do not reuse dummy request KVCache (#4804)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-12 15:24:50 +08:00
Fanrong Li
4d070d3862
chore: fix typo in tests (#5092)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-12 15:11:26 +08:00
Daniel Cámpora
e46267765f
Fix logprobs issues. (#5136)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-12 15:07:01 +08:00
Michal Guzek
53983ad273
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 15:06:28 +08:00
ruodil
d021cc5126
test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-12 14:59:16 +08:00
tomeras91
06d9f1e2f6
[test] Use LLM API for Nemotron-H correctness test (#5097)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-12 09:54:46 +03:00
bhsueh_NV
505678a286
update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114)
Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>
2025-06-12 14:40:57 +08:00
Michal Guzek
0daa70999a
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#4961)
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 14:32:04 +08:00
Venky
c3b2eb6dab
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras (#5066)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-12 14:19:15 +08:00
Lucas Liebenwein
49d7268acc
[nvbugs/5331013] fix AutoDeploy for PyTorch 25.05 dependency upgrade (#5106)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-12 13:07:27 +08:00
Netanel Haber
e692779ead
Solve underallocation in VSWA+/VGQA (#4667)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-06-12 12:12:46 +08:00
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce (#4635)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
Zheng Duan
c592798f64
fix: limit process pool size when prefetching (#5088)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-12 10:52:52 +08:00
Zheng Duan
ee44fa00f8
chore: rename IOFormatter to BaseCacheFormatter (#5068)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-12 10:50:14 +08:00
Po-Wei (Vincent)
ad99a08fa2
[TRTLLM-5581][infra] Update Module Owners (#5052)
Signed-off-by: Po-Wei Wang (Vincent)
2025-06-12 09:38:42 +08:00
tburt-nv
ddfe4fceb3
[chore] 2025-06-10 update allowlist (#5102)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-06-11 18:02:18 +08:00
xinhe-nv
11b94feff8
test: skip disaggregated tests on arm (#5070)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-11 17:00:10 +08:00
Yiqing Yan
a90dd571f8
[TRTLLM-5082] - Add a bot run option for detailed logs (#4390)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-11 15:44:57 +08:00
liji-nv
8282d6c1a7
[fix] Fix llama4 min latency (#5117)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-11 15:44:38 +08:00
ruodil
56abae0835
test: add more llama_v3.3_70b cases in perf test (#4979)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-11 15:44:22 +08:00
Zhanrui Sun
e2863a3159
chore: bump version to 0.21.0rc2 (#5112)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-11 15:08:14 +08:00
Daniel Cámpora
fdf1c47d1d
[TRTLLM-4995][feat] TRTLLM Sampler log probs support (#4836)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-11 08:18:13 +02:00
Enwei Zhu
00991d1520
chore: Merge remaining changes from feat/large-ep branch to main (#5039)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-11 13:47:43 +08:00
Yiqing Yan
0a9f105931
Waive L0 tests (#5111)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-11 11:53:15 +08:00
Zhanrui Sun
035b048a65
infra: Add timeout and retry for wget in docker image build (#5035)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-11 10:37:13 +08:00
ChristinaZ
273c6b9355
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the routing unit test (#5065)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-06-11 09:44:35 +08:00
Zheng Duan
580a92521e
test: conditional disagg and cache aware balancing for deepseek v3 (#4522)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-11 09:44:29 +08:00
Bo Li
1b79041f5d
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlock. (#4264)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-11 09:38:10 +08:00
Mike Iovine
fcd71921f1
[fix] Unwaive test_llama_eagle3 (#5042)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-10 18:11:07 -04:00
Izzy Putterman
6cb2b7d370
CI: Allow run (#5101)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-06-11 06:03:38 +08:00
Jinyang Yuan
194a708d83
[fix] Fix test_attention_mla (#5084)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-06-10 14:20:11 -07:00
Linda
50f576172b
doc: add info about stop words appearing in output (#4956)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-06-10 22:38:33 +02:00
nvpohanh
7b210ae9c3
test: add unit tests for Llama4 min_latency code (#4980)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-06-10 12:10:26 -07:00
Lucas Liebenwein
7ddc4d6282
[AutoDeploy] Merge Feature Branch Week 3 (#5054)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-11 00:20:43 +08:00
Tracin
6c91f1c7ac
Mxfp8xmxfp4 quant mode(#4978)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-10 22:01:37 +08:00
liji-nv
f6a49a9343
[CI] waive failing L0 test (#5089)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-10 20:40:44 +08:00
Zongfei Jing
6d1f2d0fd7
[TRTLLM-3927] [feat] Finalize + Allreduce + add + rmsnorm fusion (#4756)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-06-10 19:55:16 +08:00