Commit Graph

1456 Commits

Author SHA1 Message Date
Kaiyu Xie
7246fd75d1
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-19 21:57:10 +08:00
Shi Xiaowei
1e35be5840
doc: subsequent modifications of blog 5 (#5366)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-19 18:23:13 +08:00
Fanrong Li
c7af650d5a
Fix: fix the deterministic issue in the MTP Eagle path (#5285)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-19 18:08:40 +08:00
Shi Xiaowei
9a53e58a58
blog: Disaggregated Serving in TensorRT-LLM (#5353)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-19 18:02:15 +08:00
Frank
68687a9f56
[WAR][nvbug/5321947] Add an async sleep to unblock event loop. (#5342)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-06-19 17:25:18 +08:00
Enwei Zhu
bca758fce1
fix: Fix DS-R1 nvfp4 test case naming (#5361)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-19 15:50:43 +08:00
Emma Qiao
493f268b1c
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 15:05:57 +08:00
hlu1
b558232ce1
Refactor CutlassFusedMoE (#5344)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-06-19 00:04:07 -07:00
ruodil
e22e884b02
test: amend test case name in perf cluster test (#5356)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-19 14:50:12 +08:00
ruodil
21ce9b6749
test: add qwen3 cases (#5302)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-19 14:38:36 +08:00
amitz-nv
1753202b61
[TRTLLM-5825][fix] Fix torch LoRA TP (#5338)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-06-19 09:12:00 +03:00
Emma Qiao
7f68de3e3f
Refactor test timeout for individual long case (#4757)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 13:52:11 +08:00
yunruis
b3e886074e
Fix CI build time increase (#5337)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-06-19 13:49:42 +08:00
bhsueh_NV
dce8620013
chore: enable moe_backend on Qwen3 test (#5230)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-06-19 13:40:45 +08:00
xinhe-nv
e5400eeae0
tests: add ds r1 tp4 test (#5197)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-19 12:48:33 +08:00
Yiqing Yan
dedce8ab0e
chore: bump version to 1.0.0rc0 (#5326)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-19 12:02:28 +08:00
Yiqing Yan
da576bcafa
Waive L0 test (#5349)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-19 12:01:11 +08:00
Fanrong Li
6c3210a8be
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-19 09:48:22 +08:00
nv-guomingz
6a388b105a
chore: remove torch_compile prefix for TorchCompileConfig field members (#5261)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-19 09:21:51 +08:00
Zongfei Jing
2b23cd56ce
[feat] Fusion finalize and allreduce for qwenmoe model (#5223)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-06-19 08:03:58 +08:00
Robin Kobus
1a7c6e7974
ci: Split long running jobs into multiple jobs (#5268)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-19 06:24:29 +08:00
Yan Chunwei
3946e798db
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-19 06:13:53 +08:00
Omer Ullman Argov
0b6d005ef6
[fix][test] clear cuda cache before unittests automatically (#5121)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 00:36:53 +03:00
Aurelien Chartier
d25f93c07f
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#5293)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-18 11:13:12 -07:00
Omer Ullman Argov
5010f8719d
[fix][test] remove duplicate test runs (#5241)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 01:59:54 +08:00
Omer Ullman Argov
a28a152001
[fix][test] remove some cpp test cases from h100 (#5335)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 20:40:26 +03:00
yuanjingx87
a1c5704055
[feat] Multi-node CI testing support via Slurm (#4771)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-19 01:11:12 +08:00
Iman Tabrizian
e5ee5c5352
Unwaive disaggregated serving accuracy tests (#5095)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-06-19 00:41:15 +08:00
Xianjie Qiao
857108aeca
Add disagg slurm scripts (#5243)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-06-18 23:17:55 +08:00
HuiGao-NV
d13d2f460d
Remove duplicated test cases (#5323)
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gao†<huig@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 21:20:20 +08:00
juney-nvidia
00bdd39b96
chore: Update README.md to expose meet-up info (#5329)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-18 20:04:28 +08:00
Emma Qiao
b29ac5b561
[Infra] Update 5080 and 5090 case condition due to the driver update (#5317)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-18 20:01:36 +08:00
jellysnack
0623ffe3bc
feat: Add LLGuidance Support for PyTorch Backend (#5214)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-18 19:33:34 +08:00
xinhe-nv
610a49f117
tests: add multi nodes tests (#5196)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-18 18:08:04 +08:00
Yi Zhang
375dd0b971
Waive L0 (#5311)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-18 16:40:41 +08:00
Yiqing Yan
a3a48410f3
Fix rerun step (#5319)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-18 16:38:45 +08:00
Yuan Tong
f599ee63c1
test: correct unittest rerun behavior (#5273)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-06-18 16:37:19 +08:00
Zhanrui Sun
516bd4dc05
chore: bump version to 0.21.0rc3 (#5309)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-18 15:59:53 +08:00
Robin Kobus
38547b92f3
refactor: Introduce ResourceManagerType enum for resource management (#5246)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-18 09:55:59 +02:00
Bo Li
d76bda7f2c
chore: Refine printed info of CHECK_TYPE. (#5295)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-18 15:35:41 +08:00
Wanli Jiang
3a02489e86
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-18 15:12:49 +08:00
QI JUN
9ea7bb67a4
CI: fix TensorRT H200 tests (#5301)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 14:40:57 +08:00
Yukun He
6711ad9cf3
[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM op. (#5139)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-06-18 14:33:46 +08:00
ruodil
3b5d916250
test: cherry-pick deepseek rcca cases in main branch (#5307)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-18 14:26:26 +08:00
nv-guomingz
ee26965054
doc:update contributing md for internal developers (#5250)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-18 14:20:30 +08:00
Yao Yao
908463a5f5
[feat]: improve performance of XQA-MLA for sm120 (#5087)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-18 14:19:22 +08:00
Yan Chunwei
724e495254
chore: partition LLM class into TorchLLM and TrtLLM (#4900)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-18 14:01:25 +08:00
Yi Zhang
e44f7687af
feat: Add no_kv_cache_reuse option and streaming support for trtllm serve bench (#4971)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-18 13:37:31 +08:00
Yiqing Yan
8f67e3604d
Waive L0 tests (#5308)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-18 12:43:45 +08:00
Omer Ullman Argov
f501ce57b1
[fix][test] move deepseek single gpu tests to post merge (#5280)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 06:59:39 +03:00