Omer Ullman Argov
0b6d005ef6
[fix][test] clear cuda cache before unittests automatically ( #5121 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 00:36:53 +03:00
Aurelien Chartier
d25f93c07f
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head ( #5293 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-18 11:13:12 -07:00
Omer Ullman Argov
5010f8719d
[fix][test] remove duplicate test runs ( #5241 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 01:59:54 +08:00
Omer Ullman Argov
a28a152001
[fix][test] remove some cpp test cases from h100 ( #5335 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 20:40:26 +03:00
yuanjingx87
a1c5704055
[feat] Multi-node CI testing support via Slurm ( #4771 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-19 01:11:12 +08:00
Iman Tabrizian
e5ee5c5352
Unwaive disaggregated serving accuracy tests ( #5095 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-06-19 00:41:15 +08:00
Xianjie Qiao
857108aeca
Add disagg slurm scripts ( #5243 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-06-18 23:17:55 +08:00
HuiGao-NV
d13d2f460d
Remove duplicated test cases ( #5323 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gaoâ <huig@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 21:20:20 +08:00
juney-nvidia
00bdd39b96
chore: Update README.md to expose meet-up info ( #5329 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-18 20:04:28 +08:00
Emma Qiao
b29ac5b561
[Infra] Update 5080 and 5090 case condition due to the driver update ( #5317 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-18 20:01:36 +08:00
jellysnack
0623ffe3bc
feat: Add LLGuidance Support for PyTorch Backend ( #5214 )
...
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-18 19:33:34 +08:00
xinhe-nv
610a49f117
tests: add multi nodes tests ( #5196 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-18 18:08:04 +08:00
Yi Zhang
375dd0b971
Waive L0 ( #5311 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-18 16:40:41 +08:00
Yiqing Yan
a3a48410f3
Fix rerun step ( #5319 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-18 16:38:45 +08:00
Yuan Tong
f599ee63c1
test: correct unittest rerun behavior ( #5273 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-06-18 16:37:19 +08:00
Zhanrui Sun
516bd4dc05
chore: bump version to 0.21.0rc3 ( #5309 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-18 15:59:53 +08:00
Robin Kobus
38547b92f3
refactor: Introduce ResourceManagerType enum for resource management ( #5246 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-18 09:55:59 +02:00
Bo Li
d76bda7f2c
chore: Refine printed info of CHECK_TYPE. ( #5295 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-18 15:35:41 +08:00
Wanli Jiang
3a02489e86
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support ( #5159 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-18 15:12:49 +08:00
QI JUN
9ea7bb67a4
CI: fix TensorRT H200 tests ( #5301 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 14:40:57 +08:00
Yukun He
6711ad9cf3
[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM op. ( #5139 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-06-18 14:33:46 +08:00
ruodil
3b5d916250
test: cherry-pick deepseek rcca cases in main branch ( #5307 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-18 14:26:26 +08:00
nv-guomingz
ee26965054
doc:update contributing md for internal developers ( #5250 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-18 14:20:30 +08:00
Yao Yao
908463a5f5
[feat]: improve performance of XQA-MLA for sm120 ( #5087 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-06-18 14:19:22 +08:00
Yan Chunwei
724e495254
chore: partition LLM class into TorchLLM and TrtLLM ( #4900 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-18 14:01:25 +08:00
Yi Zhang
e44f7687af
feat: Add no_kv_cache_reuse option and streaming support for trtllm serve bench ( #4971 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-18 13:37:31 +08:00
Yiqing Yan
8f67e3604d
Waive L0 tests ( #5308 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-18 12:43:45 +08:00
Omer Ullman Argov
f501ce57b1
[fix][test] move deepseek single gpu tests to post merge ( #5280 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 06:59:39 +03:00
dominicshanshan
3c0fecbf42
CI: extend model weights load time for dsv3 in stress test. ( #5275 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-06-18 11:51:48 +08:00
Ivy Zhang
41cfcaa964
test: update qa test list ( #5305 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-18 11:29:11 +08:00
QI JUN
855036d8ee
update LlmRequest.is_dummy property ( #5283 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-18 10:52:13 +08:00
Aurelien Chartier
e1e5f725fc
fix: only set _mpi_session if world_size is > 1 ( #5253 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-17 19:21:41 -07:00
Robin Kobus
627062c265
refactor: Update decoder buffer and logits management ( #4450 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-18 08:10:32 +08:00
tburt-nv
7d55c381fa
Revert "[infra] Report CI authorization errors to PR" ( #5298 )
2025-06-17 17:28:33 -04:00
tburt-nv
2df9f875cf
[infra] Report CI authorization errors to PR ( #5175 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-06-17 17:26:49 -04:00
Mike Iovine
9bf69c9fdb
[chore] Remove BaseDraftTokenManager ( #5251 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-17 11:57:52 -04:00
Emma Qiao
ff32caf4d7
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 ( #4885 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 23:48:34 +08:00
Yiteng Niu
dcf18c4bcf
infra[TRTLLM-5635] remove package stage in CI build ( #5075 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-06-17 23:44:47 +08:00
qsang-nv
5236bb9084
delete cubins ( #5274 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-17 22:10:49 +08:00
QI JUN
f899c4d294
Re-implement LlmResponse in Python to reduce host overhead of pybind ( #5224 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 21:28:09 +08:00
Yanchao Lu
f4cdbfcdf0
None - Some clean-ups for the automation pipeline ( #5245 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 21:08:24 +08:00
Dom Brown
44fb3c1673
[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Pytorch workflow kernel autotuner ( #5207 )
...
- Adds a new Python custom op (fp8_block_scale_moe_runner) and a FP8BlockScaleMoERunner class for autotuning.
- Updates C++ MoE and batched GEMM kernels to accept a configIndex for workspace sizing and execution.
- Extends the unit test to run both autotuned and non-autotuned code paths.
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-06-17 21:01:56 +08:00
amirkl94
8451a87742
chore: Mass integration of release/0.20 ( #5082 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 14:32:02 +03:00
liji-nv
13eef642e6
[feat] Piecewise cuda graph support for MLA ( #4467 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-17 18:58:38 +08:00
Robin Kobus
dc3861b4aa
refactor: Unify decoder test with e2e worklfow ( #5239 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-17 12:04:58 +02:00
QI JUN
ccd9adbe33
CI: move multi-gpu test cases of tensorrt backend to h200 ( #5272 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:37:37 +08:00
Ivy Zhang
2ad8758ecc
[TRTLLM-5786][ https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases ( #5073 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 17:14:01 +08:00
Yilin Fan
498fadceb4
[feat] Add EAGLE3 support for Qwen3 ( #5206 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-06-17 17:07:06 +08:00
QI JUN
517c1ecf72
move some test cases of TensorRT backend back ( #5232 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:03:11 +08:00
qsang-nv
faca19c2f0
update setup.py for special cases ( #5227 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-17 16:41:07 +08:00