benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Guoming Zhang
c1b0b7350f
[None][test] Unwaive qwen3 next test case. ( #9877 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 20:42:31 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading ( #10562 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
mpikulski
50c78179dd
[TRTLLM-8425][doc] document Torch Sampler details ( #10606 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-13 12:01:20 +01:00
Erin
55580f8ec1
[NVBUG-5670458][chore] Unwaive lp tests ( #10524 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin <14718778+hchings@users.noreply.github.com>
2026-01-13 04:31:27 -05:00
Void
7d16f3a28b
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. ( #10347 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00
JunyiXu-nv
e291a834db
[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} ( #9937 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-13 03:57:14 -05:00
Yuxian Qiu
04b112651b
[None][feat] Hang detection for executor loop and worker. ( #10480 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-13 02:34:32 -05:00
Yiteng Niu
50c22b80d7
[None][infra] Update allowlist 2026.01.08 ( #10535 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2026-01-13 15:28:53 +08:00
tburt-nv
7d41475954
[None][infra] try removing shared cache dir mount ( #10609 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-01-13 15:07:12 +08:00
JennyLiu
2967d299fb
[TRTLLM-10271][test] Add Spark QA functional and performance cases ( #10564 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-01-13 13:20:15 +08:00
TensorRT LLM
ba1cb6831d
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-13 03:08:08 +00:00
fredricz-20070104
bbe535fddf
[None][chore] Fix disagg assert ( #10596 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-12 21:39:57 -05:00
xxi
ba1037ca4a
[ https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" ( #10527 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-12 20:21:01 -05:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Gal Hubara-Agam
18a33764b5
[None][chore] Print correct backend name in benchmark report ( #10597 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-12 14:46:00 -05:00
Anish Shanbhag
dacc881993
[ https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests ( #10192 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-12 10:55:07 -08:00
Suyog Gupta
a1385243e1
[ #10580 ][fix] re-enable NemotronH MOE MMLU test ( #10594 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-01-12 09:26:07 -08:00
Emma Qiao
9f044b9dd9
[None][infra] Waive failed tests for main 01/12 ( #10604 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-12 10:24:54 -05:00
mpikulski
bf7998f1b8
[TRTLLM-9522][test] cover LLM API multi_modal_embeddings ( #9963 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-12 11:38:22 +01:00
Wanli Jiang
11da7e3605
[None][fix] Solve pillow version conflict ( #10537 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-12 04:05:54 -05:00
Zhenhuan Chen
3bd319dc8e
[ https://nvbugs/5794796 ][chore] waive test blocking premerge ( #10593 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-12 15:39:07 +08:00
yufeiwu-nv
8e806abac3
[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml ( #10572 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-12 15:34:55 +08:00
yingguo-trt
c5914f9085
[None][chore] update deepseekv3.2 test parameter ( #10595 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-12 01:43:22 -05:00
chenfeiz0326
54459377d2
[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel ( #10489 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-12 14:23:23 +08:00
Xianjie Qiao
3a9a00b544
[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe ( #10401 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-12 14:10:31 +08:00
Jie Li
5e0dbba0c9
[None][chore]: update waive list ( #10577 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-11 22:18:04 -05:00
TensorRT LLM
2de22f1a70
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-12 03:09:53 +00:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Eran Geva
c5d5af9e7f
[ #8391 ][chore] removed llama and added deepseek to AutoDeploy's L0 perf test ( #10585 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-11 16:31:24 -05:00
Ivy Zhang
7f018c89e9
[None][test] update core test list ( #10538 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-11 14:08:20 -05:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support ( #10355 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
Yanchao Lu
80649a8b78
[None][ci] Workaround OCI-NRT slowdown issue ( #10587 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-11 22:08:19 +08:00
Guoming Zhang
0371cbfd88
[None][doc] Update Qwen3-Next doc by adding known issues section ( #10582 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-11 14:47:47 +08:00
TensorRT LLM
b2e2538fcd
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-11 03:07:48 +00:00
HuiGao-NV
3c65ec3c55
[None][chore] waive test case ( #10581 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-10 18:53:36 -05:00
fredricz-20070104
f6045fac09
[None][chore] Fix Gitlab CI termination issues ( #10576 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2026-01-10 07:51:18 -05:00
tcherckez-nvidia
f6c4dd885f
[None][chore] Update AutoDeploy model list ( #10505 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-10 08:47:37 +02:00
TensorRT LLM
6ab996d635
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-10 03:09:09 +00:00
William Zhang
ff7eb93f31
[ https://nvbugs/5669097 ][tests] Add MMMU test for mistral small ( #10530 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-09 16:09:28 -08:00
Chenghao Zhang
38f249b479
[ https://nvbugs/5548861 ][fix] AutoDeploy: Fix the test ( #10521 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-09 13:30:24 -08:00
Linda
82dfef2e56
[ https://nvbugs/5628848 ][fix] Fix nanobind stub generation ( #10516 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-09 11:32:21 -08:00
Faraz
fdbdbba540
[ https://nvbugs/5752687 ][fix] Choose register model config over root config for VLM ( #10553 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2026-01-09 12:10:52 -05:00
yingguo-trt
d80f01d205
[None][feat] Add support for DeepSeek v3.2 tests ( #10561 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-09 10:20:29 -05:00
Yechan Kim
7295af68ba
[None][fix] Enable AttentionDP on Qwen3-VL and fix test ( #10435 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-10 00:13:26 +09:00
Kaiyu Xie
1c69aad850
[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA ( #10571 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-09 09:50:57 -05:00
Iman Tabrizian
ced88424ef
[ https://nvbugs/5756008 ][fix] unwaive test ( #10523 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-09 09:40:07 -05:00
Jie Li
627d306df9
[None][chore] remove some model support; add device constraint ( #10563 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-09 09:36:23 -05:00
ruodil
2b72d33fdc
[TRTLLM-9932][test] add kimi_k2 single node perf test ( #10436 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-09 05:36:50 -05:00
Fanrong Li
4632a8642d
[None][doc] blog: Optimizing DeepSeek-V3.2 on NVIDIA Blackwell GPUs ( #10565 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-09 05:16:00 -05:00
Yuxian Qiu
80f261ea36
[ https://nvbugs/5622938 ][feat] Run sample_async on extra stream. ( #10215 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-09 18:15:18 +08:00
Chang Liu
78bb245554
[ https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 ( #10552 )
2026-01-09 00:49:39 -08:00
bhsueh_NV
4a09acd012
[ https://nvbugs/5785206 ][infra] unwaive the accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B ( #10560 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-09 03:13:29 -05:00
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case ( #9873 )
...
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Yukun He
c5331e6dbb
[None][fix] Setup dist for AutoTuner in Layerwise benchmarking. ( #10534 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-09 14:16:39 +08:00
Jie Li
6fcd4e7099
[None][chore] Add failed cases into waives.txt ( #10541 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2026-01-09 01:03:47 -05:00
TensorRT LLM
5df03b2ea7
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-09 03:43:08 +00:00
ruodil
d707286ca8
[None][test] restrict max_num_tokens in disagg mtp config ( #10442 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2026-01-08 21:53:24 -05:00
Yuxian Qiu
afa55c12b6
[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445 . ( #10547 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 21:50:04 -05:00
Balaram Buddharaju
56e779d09f
[None][chore] Waive tests blocking premerge 01/08 ( #10555 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-08 20:22:28 -05:00
Mike Iovine
4092a87b6f
[ https://nvbugs/5740075 ][fix] Fix sm120 speculation ( #10049 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
Eran Geva
489dd60312
[ #10513 ][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor ( #10512 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 14:49:40 -05:00
mpikulski
e0331297a6
[TRTLLM-9522][fix] broken cast ( #9975 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-08 06:47:39 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL ( #10470 )
...
* Why?
We would like to support EPD disaggregated serving for Qwen3 VL.
* What?
This commit adds such support, and extends existing unit tests for
correctness checks.
Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
Eran Geva
6511dbaea0
[ #10417 ][fix] AutoDepoloy - Reverted to direct computation of minusA ( #10509 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 13:43:41 +02:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine ( #10405 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yiqing Yan
dc6b743fb6
[None][chore] Bump version to 1.2.0rc8 ( #10542 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-08 04:51:44 -05:00
Emma Qiao
43839c7d9b
[TRTLLM-9642][infra] Increase pytest verbosity for failed tests ( #9657 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2026-01-08 02:33:48 -05:00
dongfengy
8d4b09dac6
[None][doc] Update GPTOSS Doc ( #10536 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-08 02:30:53 -05:00
HuiGao-NV
22c81cb5fa
[None][chore] Enable seg fault cases since one race condition is fixed ( #10398 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-08 02:15:30 -05:00
Barry Kang
f57aab5255
[ https://nvbugs/5775402 ][fix] Fix concurrency list in Wide-EP perf tests ( #10529 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2026-01-08 01:58:55 -05:00
Lucas Liebenwein
30f8455d29
[ https://nvbugs/5747878 ][fix] unwaive llama4 scout tests ( #10468 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-07 23:33:45 -05:00
TensorRT LLM
342a47bf47
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-08 03:12:45 +00:00
yingguo-trt
f8b2a8fd30
[None][chore] Support multiple job submission at the same time ( #10492 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Co-authored-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-07 21:51:36 -05:00
Yuxian Qiu
b85c447ceb
[ https://nvbugs/5784543 ][fix] Setup dist before using autotuner. ( #10491 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 10:32:50 +08:00
Yukun He
09d9878385
[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. ( #10339 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-08 10:21:02 +08:00
xxi
81f878c279
[ https://nvbugs/5707392 ][fix] unwaive test_fused_moe_fp8_blockwise_wide_ep[NotEnabled] ( #10428 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-08 09:17:59 +08:00
Lucas Liebenwein
d736c7f290
[ https://nvbugs/5761665 ][fix] AutoDeploy: handle bugs for 25.12 dlfw upgrade ( #10511 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-07 20:16:53 -05:00
Ziyi Xiong
7187afe7b9
[ https://nvbugs/5781589 ][fix] Skip spec dec for non-last rank ( #10445 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-01-07 13:55:45 -05:00
Patrice Castonguay
e8cceb06b2
[None][doc] Adding parallelism types in feature combination matrix ( #9849 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2026-01-07 12:52:05 -05:00
yufeiwu-nv
b130d58c88
[None][test] Remove most TRT-backend test cases in llm_perf_nim.yml ( #10487 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 17:18:43 +08:00
tcherckez-nvidia
7e88212d24
[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct ( #10455 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-07 10:30:24 +02:00
xinhe-nv
872210468b
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10474 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-07 03:23:43 -05:00
Kanghwan
dc32bac9fc
[ #4745 ][fix] Pass lora_params through Qwen2/3 model forward ( #10174 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2026-01-07 15:30:17 +08:00
yingguo-trt
cbf8357e5f
[ https://nvbugs/5726086 ][fix] update kimi-k2-1k1k dataset ( #10473 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-07 01:24:08 -05:00
xinhe-nv
be5579633e
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10457 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-07 00:57:03 -05:00
Fanrong Li
a34aa63685
[ https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 ( #10449 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
TensorRT LLM
3fec7e411c
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-07 03:10:22 +00:00
xinhe-nv
1fbadd2dde
[None][chore] Add failed cases into waives.txt ( #10365 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-01-06 22:08:06 -05:00
Ivy Zhang
4a1b2e23b3
[ https://nvbugs/5698434 ][test] add qwen3-4b accuracy test case ( #10382 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 21:56:34 -05:00
Lucas Liebenwein
6095c80e56
[ https://nvbugs/5721907 ][fix] AutoDeploy: improve numerical stability of flashinfer attention test ( #10467 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 21:11:06 -05:00
Zongfei Jing
bb2f883296
[None] [feat] Add test script and raster M for gather fc1 kernel ( #10429 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-07 09:31:49 +08:00
Lucas Liebenwein
bb6a3973aa
[ https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes ( #10466 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 19:55:49 -05:00
Lucas Liebenwein
00355b24b7
[None][feat] precompiled installation from local src dir with fnmatch only ( #10430 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 15:31:59 -05:00
Mike Iovine
77be1b7572
[ https://nvbugs/5749988 ][fix] Remove redundant qwen3 spec dec test ( #10387 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-06 11:46:34 -05:00
Enwei Zhu
037753f65b
[ https://nvbugs/5748600 ][ci] Unwaive disagg guided decoding test ( #10409 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-06 11:38:12 -05:00
Lizhi Zhou
6a4bebcd01
[None][chore] remove redundant retries while binding to arbitrary port ( #10452 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-06 10:39:15 -05:00
JunyiXu-nv
7d62773c6c
[ https://nvbugs/5760726 ][fix] Use random port in container port section ( #10432 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-06 23:25:46 +08:00
xinhe-nv
704f58dfbe
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10427 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-06 04:47:54 -05:00
Emma Qiao
6507087c3f
[None][infra] Waive failed cases on 1/6 ( #10440 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-06 16:54:54 +08:00
Bo Li
df0b976b99
[ https://nvbugs/5785206 ][infra] Waive TestQwen3_30B_A3B::test_fp8[latency-torch_compile=False]. ( #10441 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-06 03:32:19 -05:00
William Zhang
ab58d7cac1
[ https://nvbugs/5772361 ][ci] Unwaive tests that have been fixed ( #10424 )
...
These tests were all failing due to the same issue, and were fixed
in #10394 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-05 23:49:54 -08:00
Kaiyu Xie
2eaabd7461
[None] [fix] Fix undefined tokens_per_block ( #10438 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 02:42:37 -05:00
Ivy Zhang
1e828587e5
[TRTLLM-9896][test] add vswa test cases coverage ( #10146 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 02:02:29 -05:00
Yiqing Yan
5108a69fc0
[TRTLLM-9622][infra] Enable DGX_B300 multi-gpu testing in pre-merge pipeline ( #9699 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-06 14:39:55 +08:00
xinhe-nv
998527724c
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10367 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-06 01:09:21 -05:00
Kaiyu Xie
810249c304
[ https://nvbugs/5769926 ] [fix] Add no container mount home WAR ( #10431 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 13:09:25 +08:00
Ivy Zhang
22a1d31a27
[None][test] update test case constraint ( #10381 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2026-01-06 12:28:59 +08:00
xinhe-nv
1b1058279c
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10384 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 23:02:27 -05:00
kris1025
3e98265682
[None][chore] unwaive qwen3 30b test ( #10115 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2026-01-06 11:17:08 +08:00
TensorRT LLM
596d4f16fb
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-06 03:16:01 +00:00
Karthik
617f728903
[ #8460 ][feat] Revive and simplify Model Explorer visualization integration ( #10150 )
...
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-05 22:15:25 -05:00
Venky
aa1fe931de
[None][docs] Add --config preference over --extra_llm_api_options in CODING_GUIDELINES.md ( #10426 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-05 22:05:47 -05:00
Xiao Xuan
46f035befe
[ #2511 ][fix] eagle: qwen2 capture hidden states ( #10091 )
...
Signed-off-by: SpicyNoodle <522169030@qq.com>
2026-01-05 21:46:41 -05:00
Min Yu
9cae7277ea
[ https://nvbugs/5726962 ][feat] Apply fusion for W4AFP8_AWQ MoE ( #9838 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-06 10:16:41 +08:00
alel
6b8ae6fa81
[None][feat] CuteDSL MOE FC1 Enhancement ( #10088 )
...
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2026-01-06 09:30:43 +08:00
Mike Iovine
77712ed4ab
[None][chore] Update SWA + spec dec support matrix ( #10421 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-05 20:26:23 -05:00
JadoTu
82aaf98070
[None][feat] add the eos tokens in generation config to stop words in the sampler ( #10389 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2026-01-06 09:24:03 +08:00
chenfeiz0326
8a04c05079
[None][fix] Only Use Throughput Metrics to Check Regression ( #10404 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-06 09:21:15 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) ( #10113 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00
Lucas Liebenwein
846e54aa09
[None][feat] precompiled installation from local src dir ( #10419 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-05 19:16:38 -05:00
Simeng Liu
3b56548fcf
[ https://nvbugs/5777044 ][chore] Remove solved bugs from waives.txt ( #10422 )
...
Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
2026-01-05 16:56:58 -05:00
Karthik
4e50cb5708
[ #10170 ][fix] Add export patch for GraniteMoe MoE models to enable torch.export compatibility ( #10169 )
...
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-05 16:13:45 -05:00
Mike Iovine
91ff46d418
[ https://nvbugs/5745152 ][fix] Unwaive gpt oss spec decode test ( #10370 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:06:58 -05:00
Mike Iovine
7a2dab8e85
[ https://nvbugs/5695984 ][fix] Unwaive llama3 eagle test ( #10092 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:03:35 -05:00
Yan Chunwei
6b71b03947
[TRTLLM-9551][infra] Partition test_llm_pytorch.py for parallel execution ( #10400 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-05 13:58:03 -05:00
Grzegorz Kwasniewski
ea380ff45c
[TRTLLM-9767][feat] Fixed recursive node traversals ( #10379 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-05 18:42:06 +02:00
Mike Iovine
db2614ef10
[ https://nvbugs/5772414 ][fix] Fix draft token tree depth=1 corner case ( #10385 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:20:14 +01:00
Mike Iovine
bedfff4f00
[ https://nvbugs/5772521 ][fix] Fix draft token tree chain crash ( #10386 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:18:44 +01:00
Gal Hubara-Agam
e98c27ee4f
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime ( #10397 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-05 18:17:27 +02:00
Anthony Chang
225d3a9001
[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 ( #9998 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-05 17:16:12 +01:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order ( #10350 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
Eran Geva
3749a2ce1c
[ #10374 ][fix] fixed race condition in AutoDeploy's mp tests port acquisition ( #10366 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-05 13:33:01 +02:00
xinhe-nv
b1733d56f6
[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests ( #10357 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 05:15:52 -05:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch ( #9898 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. ( #8531 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00
HuiGao-NV
2f768b76f8
[ https://nvbugs/5715568 ][fix] Force release torch memory when LLM is destroyed ( #10314 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-05 15:30:18 +08:00
Emma Qiao
c63fad7d96
[None][infra] Waive failed cases again on 1/5 ( #10403 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-05 02:12:16 -05:00
Yihan Wang
e7a4486294
[ https://nvbugs/5752521 ][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py ( #10227 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-01-05 14:37:05 +08:00
Pengyun Lin
c04cf4334e
[TRTLLM-8242][feat] Add stability tags for serve subcommand ( #10012 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-05 14:16:15 +08:00
Yukun He
0937df2c68
[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one ( #10336 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 13:44:09 +08:00
Emma Qiao
5a8bfcbb50
[None][infra]Waive failed cases in post-merge on 1/5 ( #10399 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-05 12:30:10 +08:00
Tailing Yuan
a7fe043b13
[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts ( #10237 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-05 11:23:04 +08:00
TensorRT LLM
aaf80be0f3
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-05 03:19:07 +00:00
Yuxian Qiu
5773a4d775
[ https://nvbugs/5701425 ][chore] Unwaive tests. ( #10269 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-05 09:54:26 +08:00
Cheng Hang
656c705ff1
[None][feat] sm100 weight-only kernel ( #10190 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
2026-01-05 09:44:36 +08:00
Fanrong Li
b5a1e10bc0
[ https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata ( #10393 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 09:43:44 +08:00
Wanli Jiang
da0830670a
[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus ( #10234 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-05 09:41:49 +08:00
Lizhi Zhou
82c1ba84a7
[ https://nvbugs/5649010 ][fix] use 0 port as arbitrary port when disagg service discovery is enabled ( #10383 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-05 09:40:40 +08:00
bhsueh_NV
0517b62789
[ https://nvbugs/5772363 ][fix] fix bug of Mistral-Small-3.1-24B-Instruct-2503 ( #10394 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-05 09:04:13 +08:00
Faraz
8e2065b4d9
[ https://nvbugs/5670469 ][fix] Filter 0s and choose min of kv_head for Nemotron model ( #10206 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2026-01-05 08:42:53 +08:00
Eran Geva
e2f5455533
[ #8391 ][chore] added deepseek_r1_distill_qwen_32b AutoDeploy perf test to L0 ( #10377 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-04 20:35:52 +02:00
chenfeiz0326
a65b0d4efa
[None][fix] Decrease Pre Merge Perf Tests ( #10390 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 12:21:34 -05:00
Yanchao Lu
c4f27fa4c0
[None][ci] Some tweaks for the CI pipeline ( #10359 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 11:10:47 -05:00
dongfengy
afc533193d
[None][feat] Support nvfp4 for gptoss ( #8956 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-04 08:57:44 -05:00
Jaedeok Kim
a4dcc6a711
[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager ( #10330 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-04 06:07:30 -05:00
Yuxian Qiu
6ba04eba06
[ https://nvbugs/5748683 ][fix] Use get_free_port_in_ci to avoid port conflict. ( #10392 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-04 19:04:58 +08:00
TensorRT LLM
71b4a8aa60
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-04 03:08:01 +00:00
yuanjingx87
5bd37ce41e
[None][infra] add retry logic to get slurm sbatch job log when ssh dropped ( #9167 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-04 10:11:37 +08:00
Grzegorz Kwasniewski
0d1f5ad7a2
[TRTLLM-10358][feat] Added proper rescaling of FP4 weights ( #10378 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-03 16:26:16 -05:00
Yanchao Lu
c0b3c2b919
[None][ci] Remove an invalid test waive
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-03 23:34:13 +08:00
Ludwig Schneider
59045a0e41
[None][fix] [fix] Make NCCL resource manager destructor exception-safe ( #10166 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-03 10:25:05 -05:00
Emma Qiao
865992b86b
[None][infra] Waive failed cases on 1/3 ( #10391 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-03 05:54:09 -05:00
Bo Deng
9e7b50aefb
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues ( #10285 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-03 14:34:55 +08:00
TensorRT LLM
45ffbf1f21
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-03 03:07:50 +00:00
Lucas Liebenwein
937f8f78a1
[None][doc] promote AutoDeploy to beta feature in docs ( #10372 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-02 18:46:31 -05:00
Izzy Putterman
bdf6953ddc
[None][feat] Eagle: MLA Based Eagle ( #9677 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2026-01-02 13:45:07 -05:00
Gal Hubara-Agam
f3dd6da080
[ #10056 ][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test ( #10308 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-02 11:20:19 +02:00
chenfeiz0326
5e0e48144f
[None][fix] Minor updates on Perf Test System ( #10375 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-02 17:17:42 +08:00
TensorRT LLM
098251648d
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-02 03:11:08 +00:00
fredricz-20070104
f631b25c85
[None][test] Unified slurm extra args management and session collection logic ( #10332 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Co-authored-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-01 21:10:51 -05:00
Balaram Buddharaju
4a1b742aa0
[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism ( #10312 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 13:42:53 -05:00
Gal Hubara-Agam
5845951538
[ #10056 ][fix] AutoDeploy: Handle deletion of nested params in sharding ( #10376 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-01 08:11:11 -05:00
tcherckez-nvidia
4868772ad7
[None][feat] Add export data to build and run script for AD ( #10299 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-01 04:54:47 -05:00
Balaram Buddharaju
9f5b750a93
[None][chore] Waive tests blocking pre-merge 12/31 ( #10373 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 03:00:24 -05:00
Balaram Buddharaju
0b75340223
[ https://nvbugs/5744427 ][fix] Make Gemma3 multimodal test fp8 ( #10368 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 01:11:34 -05:00
TensorRT LLM
edbcff0257
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-01 03:08:31 +00:00
Yuxian Qiu
ff836d4f41
[ https://nvbugs/5740359 ][chore] Unwaive tests. ( #10260 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-01 09:53:34 +08:00
Lucas Liebenwein
1bbe71b3ed
[ #10244 ][feat] AutoDeploy: separate prefill/decode in flashinfer ( #10252 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-31 17:01:24 -05:00
Mike Iovine
9085021aa4
[None][feat] Implement sampling for MTP 1-model ( #10019 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-31 13:48:34 -05:00
Simeng Liu
84d107b2f0
[ https://nvbugs/5717993 ][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. ( #10060 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-31 09:22:54 -08:00
xinhe-nv
0d2e2718ce
[None][chore] Add failed cases into waives.txt ( #10354 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 09:30:22 -05:00
chenfeiz0326
a23c6f1092
[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression ( #10282 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-31 21:44:59 +08:00
tcherckez-nvidia
464847c6be
[ #9717 ][chore] Standardize MoE weights interface ( #10295 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-31 07:37:18 -05:00
Jin Li
ef1d4a40b5
[ https://nvbugs/5727475 ][fix] Avoid use property with setter in nn.Mo… ( #10212 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-31 06:21:36 -05:00
Emma Qiao
d944430f96
[None][infra] Waive failed cases on 12/31 ( #10353 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-31 17:39:49 +08:00
Necofish
73870ae4ad
[None][feat] support Qwen3-VL dense model in pytorch backend ( #9060 )
...
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
2025-12-31 17:54:26 +09:00
xinhe-nv
827d12caaf
[ https://nvbugs/5558516 ][test] add disaggregated stress test ( #9354 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 16:47:36 +08:00
Yuxian Qiu
910a633066
[ https://nvbugs/5774869 ][chore] waive tests. ( #10356 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-31 03:00:52 -05:00
Yiqing Yan
fdc03684cc
[TRTLLM-10016][infra] Use SlurmPatition attribute time as timeout threshold ( #10254 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-31 15:02:24 +08:00
Pengyun Lin
fad000589d
[None][chore] Unify DS tool parser names ( #10239 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-31 14:40:07 +08:00
xinhe-nv
1e9c153b4c
[None][fix] disable thread leak check for kimi ( #10337 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 01:31:37 -05:00
xinhe-nv
6c1abf2d45
[None][chore] Add failed cases into waives.txt ( #10344 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-31 00:11:54 -05:00
TensorRT LLM
ed3a3097a4
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-31 03:11:56 +00:00
Jin Li
34c2fd50a9
[ https://nvbugs/5707359 ][fix] Unwaive OOM case that should be fixed by #9446 ( #10334 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-31 10:41:39 +08:00
Yuxian Qiu
1f3afb8e6f
[None][feat] Implement send_object for TorchDist. ( #10213 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-31 10:40:52 +08:00
Yuxian Qiu
ec8a388c25
[ https://nvbugs/5769890 ][fix] Import get_free_port. ( #10341 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-31 09:47:27 +08:00
Eran Geva
74832a1895
[ https://nvbugs/5766986 ][fix] fixed the shard_all_unprocessed default value to align with the default.yml ( #10271 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-30 08:54:13 -05:00
Bo Li
1f0365da36
[None][infra] Add LongBenchV1 to trtllm-eval. ( #10265 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-30 21:39:34 +08:00
Emma Qiao
6732c76414
[None][infra] Waive failed cases for main on 12/30 ( #10338 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-30 05:17:43 -05:00
Emma Qiao
fb05cd769a
[None][infra] Enable single-gpu CI on spark ( #9304 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-30 17:22:14 +08:00
Emma Qiao
cce7247815
[ https://nvbugs/5594703 ][infra] Unwaive the failed case to test ( #10275 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-30 16:38:54 +08:00
xinhe-nv
6accdbc6a6
[None][chore] Add failed cases into waives.txt ( #10302 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-30 03:11:52 -05:00
ruodil
0f4ed90560
[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml ( #10225 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-30 02:39:50 -05:00
binghanc
692d8f2023
[TRTLLM-9455][feat] support for new checkpoint ( #10082 )
...
Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-12-30 14:46:39 +08:00
xinhe-nv
3e0344a53d
[None][chore] Add failed cases into waives.txt ( #10301 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-30 14:04:28 +08:00
xinhe-nv
48fee8d0f6
[None][chore] Add failed cases into waives.txt ( #10321 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-30 00:11:49 -05:00
Emma Qiao
f396ad83b0
[None][infra] Remove duplicates in waives.txt ( #10333 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-29 22:32:52 -05:00
TensorRT LLM
fa4c7997c5
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-30 03:07:48 +00:00
Balaram Buddharaju
4944192eae
[None][chore] Waive tests failing in pre-merge 12/28 ( #10311 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-29 20:53:49 -05:00
Neta Zmora
966231d29c
[ #9626 ][feat] Add an auto-deploy transform for using cutlass FP4 MoE kernels ( #10304 )
...
Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe
with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused.
Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128,
so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled).
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-29 23:18:15 +02:00
Yanchao Lu
965578ca21
[None][infra] Some improvements for Slurm execution path in the CI ( #10316 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-29 06:49:44 -05:00
Yueh-Ting (eop) Chen
9cee32ab39
[ https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager ( #10183 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-12-29 14:29:14 +08:00
Yanchao Lu
2f8d6d25a8
[None][ci] Waive an intermittent test hang case ( #10324 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-29 13:04:31 +08:00
TensorRT LLM
223411e988
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-29 03:08:32 +00:00
Yanchao Lu
270be801aa
[None][ci] Move remaining DGX-B200 tests to LBD ( #9876 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-28 13:55:39 +08:00
Ziyi Xiong
c59aa8bec5
[TRTLLM-9962][feat] Some optimizations for two-model spec dec ( #10208 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-28 12:52:04 +08:00
TensorRT LLM
ae6d5766ed
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-28 03:11:53 +00:00
JunyiXu-nv
55bc6a5ff8
[ https://nvbugs/5753250 ][fix] Fix undefined local variable in responses utils ( #10154 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-28 06:59:32 +08:00
shivghai
ee07a7c55e
[None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 ( #9961 )
...
Signed-off-by: Shiv Ghai <8965168+shivghai@users.noreply.github.com>
2025-12-27 11:50:59 -08:00
Guoming Zhang
1865020b6f
[TRTLLM-8577][feat] Clean the Qwen3-next code by removing Qwen3NextCo… ( #10228 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:49:55 +08:00
Guoming Zhang
93ac0bc1dc
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:48:10 +08:00
TensorRT LLM
27976fce9c
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-27 03:08:04 +00:00
Olya Kozlova
55f3cda66d
[None][fix] Fix request_id for best_of/n case ( #8368 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-12-26 22:20:24 +01:00
Jin Li
c04563657e
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile ( #9740 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-27 00:07:20 +08:00
chenfeiz0326
d70aeddc7f
[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI ( #9138 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-26 22:50:53 +08:00
Pengyun Lin
684b37df02
[ https://nvbugs/5747938 ][fix] Use local tokenizer ( #10230 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 22:08:10 +08:00
Pengyun Lin
c5b0f9e436
[ https://nvbugs/5633700 ][fix] Cache tiktoken vocab for gpt-oss ( #10219 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 18:39:03 +08:00
dongfengy
bfc591994c
[ https://nvbugs/5745152 ][fix] Fix some GPTOSS test setups ( #10085 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-12-26 17:52:40 +08:00
Jatin Gangani
4a5ef84dc2
[None] [doc] Document perfect MoE router feature for perf analysis ( #10303 )
...
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
2025-12-26 04:27:40 -05:00
Wanli Jiang
14554ab3f3
[None][feat] Support multi-gpu running for nemotron-v3-nano and super ( #10118 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-26 11:23:14 +08:00
TensorRT LLM
819d03fa88
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-26 03:08:15 +00:00
Enwei Zhu
13ffe52ad0
[None][fix] Allow YAML config overwriting CLI args for trtllm-eval ( #10296 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 15:08:15 -05:00
Neta Zmora
f3f02315df
[None][chore]: small refactoring to auto-deploy MoE operator ( #10300 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-25 12:27:11 -05:00
bhsueh_NV
db3430f589
[None][feat] Support VLM part for Mistral Large 3 ( #10188 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-25 11:20:58 -05:00
Jin Li
7e4cef9def
[None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 ( #9446 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-25 10:23:04 -05:00
Ziyi Xiong
d8b5aeb061
[ https://nvbugs/5652062 ][fix] Rewind kv_cache and reset draft tokens ( #10160 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-25 09:13:51 -05:00
ZhichenJiang
46e4af5688
[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations ( #10201 )
...
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 09:04:20 -05:00
Lizhi Zhou
fe12faef81
[ https://nvbugs/5752516 ][chore] unwaive test; fix port conflicts in CI ( #10152 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-25 08:16:09 -05:00
Iman Tabrizian
cd5cd60ee4
[None][infra] Move install_boost from install_triton.sh to install_base.sh ( #10055 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-25 08:09:55 -05:00
Zhenhuan Chen
8462cf6c96
[TRTLLM-9578][feat] make PDL enabled by default ( #9695 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-25 07:15:24 -05:00
Jatin Gangani
97b38ac403
[None] [doc] Update IFB performance guide & GPTOSS deployment guide ( #10283 )
...
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
2025-12-25 05:52:04 -05:00
Emma Qiao
0ecdb69b93
[None][infra] Waive failed tests for main on 12/25 ( #10298 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-25 05:22:39 -05:00
Xianjie Qiao
53b81783b1
[None][fix] Fix pageable H2D memcopy issue on GB200 ( #10289 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-12-25 18:15:57 +08:00
Jie Li
83e02ee335
[None][chore] Remove NIM TRT-Backend Test Lists ( #10232 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-12-25 04:01:51 -05:00
Enwei Zhu
182b3eb633
[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout ( #10293 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 02:35:18 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
xinhe-nv
4ae6f6a46c
[None][chore] Add failed cases into waives.txt ( #10249 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-25 01:26:21 -05:00
heyuhhh
7395ca93b6
[None][doc] Add Sparse Attention feature doc ( #9648 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-25 00:26:18 -05:00
Venky
c059e6caa1
[TRTC-121] [feat] Add recipe selector UI to complement the recipe database ( #10125 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-24 23:56:54 -05:00
gramnarayan
a9eb5afc9f
[ #9241 ][feat] AutoDeploy: Support Eagle3 Speculative Decoding ( #9869 )
...
Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend.
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-24 23:30:42 -05:00
TensorRT LLM
1f8ed71d5f
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-25 03:08:38 +00:00
Emma Qiao
16fd781e42
[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge ( #9897 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-24 21:45:33 -05:00
Ziyi Xiong
43178590d1
[TRTLLM-10143][feat] Reuse previous draft requests if possible ( #10263 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-24 17:48:38 -08:00
Neta Zmora
c4b36d31ff
[ #10137 ][feat] AutoDeploy FP8 MoE refactor ( #10138 )
...
The trtllm (cutlass) fp8 moe operator performs W3+W1 fusion (concat) during inference and we want to move this fusion to the model optimization time.
The Cutlass MoE kernel is used thru a trtllm torch operator.
Its implementation uses two FC operations (fc1 and fc2) while the canonical MoE API defines three GEMM operations and their associated weights (W1, W2, W3) so when we switch from the torch.moe op to the trtllm.moe op we also change terminology from w1, w2, w3 to fc1, fc2.
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-24 18:58:10 +02:00
Necofish
8614cd3439
[None][fix] fix: resolve GPU memory imbalance in concurrent weight loading ( #6472 )
...
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-24 09:43:09 -05:00
Suyog Gupta
e2891a6c77
[ #10052 ][feat] AutoDeploy enable cudagraphs for flashinfer BatchDecode ( #10193 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-24 05:55:09 -08:00
Stanley Sun
ddac4d7379
[None][test] Add disag-serving auto scaling qa test ( #10262 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-12-24 08:43:47 -05:00
Yiqing Yan
69152c4e7c
[None][infra] Check GB200 coherent GPU mapping ( #10253 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-24 17:12:36 +08:00
tcherckez-nvidia
56ef97e06e
[ #10246 ][feature] Move AD dashboard to use cudagraph compile backend ( #10267 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-24 11:09:59 +02:00
Jonas Li
ecea71ca7a
[None][chore] Update tinygemm kernel name ( #10248 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-12-24 02:33:25 -05:00
shuyixiong
f4f0fe85e9
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests ( #9939 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-24 15:27:01 +08:00
xinhe-nv
534700ecd9
[None][chore] Add failed cases into waives.txt ( #10240 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-24 02:21:50 -05:00
Yukun He
595daa5089
[TRTLLM-9615][feat] Support synchronization through PP ranks in the distributed tuning system ( #10011 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-24 15:03:10 +08:00
Fanrong Li
156f6453dc
[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 ( #10226 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-24 14:39:12 +08:00
zackyoray
f6c3bc16b9
[None][docs] Add NIXL-Libfabric Usage to Documentation ( #10205 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-23 23:05:40 -05:00
Emma Qiao
7b84e48e0f
[None][infra] Waive failed cases om 12/24 ( #10257 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-23 22:49:57 -05:00
TensorRT LLM
68cf5c7924
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-24 03:08:16 +00:00
xinhe-nv
fc1f77eafc
[None][chore] Add failed cases into waives.txt ( #10204 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2025-12-24 10:37:23 +08:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Jhao-Ting Chen
92d90fa29a
[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS ( #10018 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski
0027a01ad5
[ https://nvbugs/5680312 ][fix] Updated test waiving ( #9630 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 09:38:12 -08:00
Grzegorz Kwasniewski
06900a7f19
[TRTLLM-9565][fix] Fix deepseek sharding ( #9984 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 10:28:14 -05:00
Emma Qiao
984c20e0b2
[None][infra] Waive failed cases on 12/23 ( #10236 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-23 08:48:54 -05:00
dongfengy
e284d0bf80
[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests ( #10209 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-23 07:43:13 -05:00
tcherckez-nvidia
64bb1a5155
[None][chore] Update AD coverage to use torch-cudagraph ( #10233 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-23 07:20:32 -05:00
Roey Azran
8408c40d8b
[ https://nvbugs/5702786 ][fix] Fix race conditions in KV cache communication during unexpected termination ( #10076 )
...
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-12-23 14:09:51 +02:00
Xianjie Qiao
871c6b435c
[None] [feat] skip batch_tokenize_prompts in CustomDataset ( #10214 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-12-23 17:40:57 +08:00
Yukun He
522f1d2bc3
[ https://nvbugs/5764627 ][chore] waive the time-out test ( #10222 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-23 16:36:06 +08:00
Balaram Buddharaju
f2e00a75de
[None][chore] Remove helix test from rtx test list ( #10224 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 03:07:37 -05:00
Shiyu Li
3ddc9d2b48
[ https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. ( #10062 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-12-23 16:07:07 +08:00
chenfeiz0326
48c875f8ea
[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test ( #9990 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-23 16:02:38 +08:00
Bo Li
cc1323be24
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. ( #10197 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-23 02:13:37 -05:00
Yiqing Yan
59b05dc0a8
[None][chore] Bump version to 1.2.0rc7 ( #10216 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-23 15:07:47 +08:00
Chuang Zhu
53db3b2612
[ https://nvbugs/5741884 ][fix] unwaive disagg sampler ( #10189 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-23 14:38:07 +08:00
xinhe-nv
77b591f73b
[None][chore] Add failed cases into waives.txt ( #10177 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-12-23 13:43:50 +08:00
Harshini Komali
d691371eaf
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf ( #9310 )
...
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-23 13:25:55 +08:00
Pamela Peng
5bc7ffe379
[None][test] Add qa tests for RTX 6K ( #10210 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-12-22 22:47:09 -05:00
TensorRT LLM
18f8b22956
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-23 03:10:39 +00:00
fredricz-20070104
621156ad44
[None][chore] Fix GB300 support issues ( #10196 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-23 10:42:41 +08:00
Li Min
1e82ff7a0c
[TRTLLM-9989][fix] Fix tvm_ffi aaarch64 issue. ( #10199 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-23 10:20:40 +08:00
Yuxian Qiu
696f754ef4
[None][fix] avoid implicit cudaStreamSynchronize in sample_async. ( #10120 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-23 10:15:40 +08:00
Tailing Yuan
648196f8ae
[TRTLLM-9432][feat] Reduce synchronization and recompilation for qwen3-next ( #9691 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-23 10:14:29 +08:00
Faraz
f05af48bca
[ https://nvbugs/5747674 ][fix] Add contiguous() before view() in load_expert_w3_w1_weight and load ( #10136 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-12-22 21:03:34 -05:00
Fanrong Li
0d2500c631
[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser ( #10126 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-23 08:46:47 +08:00
Grzegorz Kwasniewski
ccc64da287
[TRTLLM-9847][fix] WAR fix hanging fused allreduce. ( #10087 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 00:03:32 +01:00
tcherckez-nvidia
12e1cb8d7e
[ #9717 ][chore] Refactor MoE code to use enums ( #9910 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-22 15:14:56 -05:00
JunyiXu-nv
aaa87abf41
[TRTLLM-7906][feat] Support multiple post process for Responses API ( #9908 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-22 11:33:34 -05:00
Emma Qiao
ba14a9308e
[None][infra] Waive failed cases on 12/22 ( #10200 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-23 00:05:45 +08:00
Pengyun Lin
0f308e95f9
[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend ( #9911 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-22 21:37:22 +08:00
William Zhang
a6a88985cf
[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg ( #9758 )
...
* Why?
Certain VLMs like the Qwen family need more than just the multimodal
embeddings in the language model, and need MRoPE position IDs and
deltas. Prior to this commit, only the embeddings could be communicated
from the encoder worker to the prefill worker.
* What?
This commit extends the `DisaggregatedParams` to include the MRoPE
information. It also adjusts several pieces of code required to
communicate that between E, P and D workers.
Closes TRTLLM-9409.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-22 06:32:49 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. ( #9822 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Yan Chunwei
ea6cd76c55
[None][refactor] simplify get_stats and get_kvcache_events with rpc ( #9980 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-22 18:23:43 +08:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
shuyixiong
9e9523c3cc
[ https://nvbugs/5762016 ][chore] Skip a ray test ( #10194 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-22 17:06:19 +08:00
JadoTu
7421224d69
[None][fix] NVFP4 linear method's weight and weight_scale padding ( #10148 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-22 15:00:31 +08:00
xinhe-nv
d30ee8101e
[None][chore] Remove closed bugs ( #10182 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-22 01:58:17 -05:00
Yuxian Qiu
237fd0eae4
[ https://nvbugs/5666821 ][chore] unwaive tests. ( #9958 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-22 11:39:45 +08:00
TensorRT LLM
f8501f3cc8
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-22 03:08:12 +00:00
Fanrong Li
f0bd60a395
[ https://nvbugs/5684820 ][fix] fix the detokenizer issue for DeepSeek-v3.2 ( #10106 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-22 10:56:33 +08:00
Jin Li
066b653940
[TRTLLM-9880][feat] Include torch compile tests in QA test list ( #10149 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-22 10:37:09 +08:00
Yuxian Qiu
2f139ee07e
[ https://nvbugs/5701445 ][chore] unwaive test. ( #9949 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-22 10:12:54 +08:00
Chuang Zhu
914dd39127
[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test ( #9735 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-22 09:29:24 +08:00
dominicshanshan
d274a4c5d3
[ https://nvbugs/5701457 ][fix] Unwaive ray test. ( #10175 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-22 09:25:58 +08:00
Enwei Zhu
5549067966
[None][ci] Waive GPTOSS test case ( #10155 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-22 08:50:44 +08:00
Balaram Buddharaju
5266475014
[None][feat] Cudagraph updates for helix parallelism ( #10141 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 15:21:52 -05:00
shuyixiong
4fc6036276
[ https://nvbugs/5702793 ][fix] Fix view operation on uncontiguous tensor ( #10147 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-21 11:47:20 -05:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 ( #9971 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00
Kaiyu Xie
5a611cb8f5
[None] [feat] Enhancements to slurm scripts ( #10112 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-21 10:24:56 -05:00
Emma Qiao
aa5dbb7ca5
[None][infra] Waive failed tests for main branch on 12/21 ( #10184 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-21 22:23:46 +08:00
xxi
5ae154022a
[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… ( #10067 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-12-21 08:14:50 -05:00
Eran Geva
b15f987972
[None][chore] removed duplicated test from l0_b200.yml ( #10090 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-21 11:34:01 +02:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Balaram Buddharaju
dcd3f7b5ea
[ https://nvbugs/5744427 ][fix] Fix accuracy test OOM ( #10173 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 02:03:38 -05:00
TensorRT LLM
6c76148b56
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-21 03:08:20 +00:00
Bo Li
77e37d9dd0
[ https://nvbugs/5753250 ][infra] Further waive all tests in _test_openai_responses.py ( #10176 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-20 10:25:14 -05:00
Enwei Zhu
2ce785f39a
[ https://nvbugs/5643631 ][fix] Fix hostfunc seg fault ( #10028 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 07:58:43 -05:00
Enwei Zhu
21a93fbf9d
[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset ( #10043 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 03:12:41 -05:00
TensorRT LLM
3f25db9d3e
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-20 03:07:30 +00:00
Yuxian Qiu
3b3069b390
[ https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. ( #10121 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:42:07 +08:00
Yuxian Qiu
e75331480f
[None][fix] fix draft_lengths for CUDA graph capture. ( #10004 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:04:48 +08:00
Anish Shanbhag
7c82605327
[None][fix] enable KV cache reuse for config database ( #10094 )
2025-12-19 15:16:56 -08:00
Balaram Buddharaju
bee9051484
[None][chore] Waive timing out pre-merge test ( #10167 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-19 17:56:33 -05:00
Gal Hubara-Agam
20b69a982a
[ #10056 ][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 ( #10131 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-19 13:28:42 -08:00
Chang Liu
5489d188a4
[None][fix] Revert the change and remove device count guard for DSv32 ( #9631 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 15:00:55 -05:00
longcheng-nv
b882393d69
[ https://nvbugs/5720357 ][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case ( #10027 )
...
Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 14:58:01 -05:00
Venky
dfa11d810e
[TRTC-102][docs] --extra_llm_api_options->--config in docs/examples/tests ( #10005 )
2025-12-19 13:48:43 -05:00
JunyiXu-nv
7b71ff6b8a
[ https://nvbugs/5722653 ][fix] Unwaive fixed test ( #10157 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-19 11:19:20 -05:00
xxi
27e49e2904
[None][fix] waive the failed test test_service_discovery[etcd-load_ba… ( #10161 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-12-19 06:14:26 -08:00
tcherckez-nvidia
9f6abaf59f
[ #9640 ][feat] Migrate model registry to v2.0 format with composable configs ( #9836 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-19 05:30:02 -08:00
xinhe-nv
7b51e3cedb
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10129 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-19 17:55:17 +08:00
Emma Qiao
dd8ce68c94
[None][infra] Update waive and waive failed tests for main branch on 12/19 ( #10151 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-19 01:20:42 -08:00
Pengyun Lin
ac03915dc3
[TRTLLM-9604][feat] DS R1 & V3.1 tool parser ( #10010 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-19 17:20:03 +08:00
Chang Liu
31bc14b350
[TRTLLM-9654][feat] Support DeepSeek-V32 chat template ( #9814 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 17:05:38 +08:00
yufeiwu-nv
52cee573ad
[TRTLLM-8830][test] Overlap scheduler enhancement perf test: Add qwen3_0,8b and llama3.1 test cases ( #10114 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-19 17:01:52 +08:00
xinhe-nv
cb0444b1b5
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #10132 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-12-19 16:07:56 +08:00
JunyiXu-nv
356ad4fe3a
[ https://nvbugs/5722653 ][fix] Address port conflict by assigning different port section in the same node. ( #10035 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-19 15:34:04 +08:00
Ziyi Xiong
70b4d282c6
[TRTLLM-7736][feat] Incrementally update the inputs of target and draft models ( #9708 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-19 15:11:25 +08:00
Larry Xu
48dbc61129
[None][chore] Update CODEOWNERS for test cases and test list ( #10119 )
...
Signed-off-by: LarryXFly <197874197+LarryXFly@users.noreply.github.com>
2025-12-19 13:38:21 +08:00
William Zhang
478b6b20a1
[ #9230 ][refactor] Replace nemotron patches with custom model implementation ( #9751 )
...
[#9230 ][refactor] Replace nemotron patches with custom model implementation
* Why?
Patching for nemotron H models was growing out of hand, and made certain
optimizations more complex than they needed to be.
* What?
This commit finally gets rid of them, and replaces them with the custom
model implementation in `modeling_nemotron_h.py`.
Closes #9230
Closes NvBug 5747867
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-18 19:36:27 -08:00
Balaram Buddharaju
72c5480dfb
[None][chore] Waive test blocking pre-merge 12/18 ( #10145 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-18 19:12:05 -08:00
TensorRT LLM
00f70c30a6
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-19 03:11:26 +00:00
Ivy Zhang
9aa40871c2
[TRTLLM-9840][test] switch ucx backend to default backend ( #10101 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-12-18 18:54:15 -08:00
TensorRT LLM
a7ac5a6bca
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-19 02:14:37 +00:00
Wangjue Yao
9f283f330b
[None][feat] Support Mooncake transfer engine as a cache transceiver backend ( #8309 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-19 10:09:51 +08:00
Chuang Zhu
e0b2a94309
[None][fix] Fix ready signal in NIXL backend ( #10000 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-19 09:43:40 +08:00
yuanjingx87
2e88c86f10
[None][infra] Fix issue that lock file geneartion will skip dependency with comment ( #10144 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-18 17:41:23 -08:00
Yukun He
bd5b3c2ac0
[ https://nvbugs/5721912 ][chore] Unwaive the test ( #10108 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-19 09:12:25 +08:00
Anish Shanbhag
91a9ae42d2
[TRTC-71][feat] Add regression testing for config database ( #9832 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-12-18 16:15:38 -08:00
Balaram Buddharaju
799a2ae311
[ https://nvbugs/5741331 ][fix] Fix helix accuracy test ( #10021 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-18 15:27:53 -08:00
Chang Liu
a97e411b44
[ https://nvbugs/5747911 ][fix] Use offline data path for the unit test of mmencoder server ( #10135 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-18 15:19:23 -08:00
Lizhi Zhou
f02782a6f2
[ https://nvbugs/5726066 ][fix] fix auto-scaling related failures ( #9845 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
2025-12-18 16:37:48 -05:00
Enwei Zhu
6fe89ea00f
[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output ( #9840 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-18 10:36:38 -08:00
CarstyYou
0b279f4ad4
[ https://nvbugs/5456493 ][feat] Add fp8 bmm on sm120 ( #9687 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-12-18 22:57:20 +08:00
ZhichenJiang
4e55b83101
[None][perf] Add more optimization options for MOE CuteDSL finalized kernel ( #10042 )
...
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
2025-12-18 22:49:28 +08:00
Nikita Korobov
3b4f26e4d1
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 ( #9734 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-18 11:58:23 +01:00
yuanjingx87
df15be3fad
[None][infra] Fix slurm job does not catch cancelled jobs ( #9722 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-18 00:32:43 -08:00
Bo Li
9d7e038bcb
[ https://nvbugs/5753250 ][infra] Waive _test_openai_responses. ( #10110 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-18 00:15:06 -08:00
Emma Qiao
33a90f2dd2
[None][infra] Waive failed cases for main branch on 12/18 ( #10105 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-17 21:35:45 -08:00
Yuxian Qiu
bec864a78c
[None][fix] avoid ID conversion for non enable_configurable_moe cases. ( #10003 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-18 13:29:52 +08:00
yuanjingx87
897a38978d
[None][infra] Update allowlist 2025.12.17 ( #10097 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-17 21:11:35 -08:00
Wanli Jiang
601c29ca73
[ https://nvbugs/5721644 ][fix] Update tests for nemotron_h ( #9993 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-18 12:38:02 +08:00
Lucas Liebenwein
76ec820465
[ #7532 ][feat] AutoDeploy: gather logits before lm head ( #9962 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-17 19:50:13 -08:00
TensorRT LLM
cfe53e7425
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-18 03:23:35 +00:00
xinhe-nv
4a98f190a8
[None][chore] Add failed cases into waives.txt ( #10025 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-17 19:13:52 -08:00
xinhe-nv
c1cfb61b1b
[TRTLLM-9381][feat] Add kimi k2 fp4 tests ( #9906 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-17 18:15:27 -08:00
TensorRT LLM
50c2b82f24
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-17 23:45:35 +00:00
tburt-nv
27064f95c7
[None][chore] Clarify copyright header guidance ( #9882 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-18 06:38:10 +08:00
tburt-nv
5da7879b38
[None][fix] Revert GHA upgrade for blossom-ci workflow ( #10095 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-17 15:57:04 -05:00
Chenghao Zhang
22c6e8a424
[None][fix] Autodeploy: fix some legacy flashinfer attention test errors ( #9928 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-17 12:27:22 -08:00
Salman Chishti
cb5cd4376e
[None][chore] Upgrade GitHub Actions for Node 24 compatibility ( #10045 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2025-12-17 09:44:09 -08:00
Yuan Tong
f7e245668b
[TRTLLM-9680][perf] Optimize TRTLLMSampler log_probs performance (Core fix has been merged via #9353 ) ( #9655 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-12-17 17:56:01 +08:00
Yukun He
00c0564334
[None][chore] Remove unnecessary warning log for tuning. ( #10077 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-17 01:51:17 -08:00
Yukun He
18b335d584
[TRTLLM-9989][fix] Disable tvm_ffi for CuteDSL nvFP4 dense GEMM. ( #10040 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-17 00:41:26 -08:00
Yukun He
2fd1a23e4c
[TRTLLM-9998][fix] Change trtllm-gen MoE distributed tuning strategy back to INDEPENDENT ( #10036 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-17 00:35:22 -08:00
yufeiwu-nv
5d71f662c3
[ https://nvbugs/5698434 ][test] Add Qwen3-4B-Eagle3 One-model perf test ( #10041 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-17 13:37:25 +08:00
Void
47404196fa
[None][fix] Enabled simultaneous support for low-precision combine and MTP. ( #9091 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-17 13:37:08 +08:00
Emma Qiao
0dbf3948cc
[None][infra] Waive failed tests due to llm model files ( #10068 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-16 20:12:57 -08:00
Kaiyu Xie
02fd13448b
[None] [feat] Enhancements to slurm scripts ( #10031 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-16 19:31:27 -08:00
JunyiXu-nv
6649c3743c
[ https://nvbugs/5635153 ][chore] Remove responses tests from waive list ( #10026 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-17 11:22:02 +08:00
shuyixiong
26fb063076
[ https://nvbugs/5741060 ][fix] Fix pg op test ( #9989 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-17 09:44:25 +08:00
Aurelien Chartier
7175d89b48
[None][fix] Fix iteration stats for spec-dec ( #9855 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-16 14:11:38 -08:00
QI JUN
dba9036072
[None][doc] remove nano-vl-v2 model support in release notes ( #9887 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
QI JUN
3daca4fea3
[ https://nvbugs/5729847 ][doc] fix broken links to modelopt ( #9868 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
QI JUN
e6ab864066
[None][doc] Update release notes ( #9739 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Laikh Tewari <laikhtewari1@gmail.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Zac Patel
1ffa2c8937
[IB-1920][doc] Update Perf_Overview.md with Benchmarking Results for Release 1.1 ( #9723 )
...
Signed-off-by: Zachary Patel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
xiweny
2756a0da60
[TRTLLM-4629][doc] Add B300 & GB300 in documents ( #9663 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
ruodil
07f307d131
[ https://nvbugs/5652552 ][fix] cherry-pick add printing for llm args ( #9206 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Iman Tabrizian
1fc8bd3cd8
[TRTLLM-9082][doc] Address Dynamo Example feedback ( #9619 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Kaiyu Xie
e41b060fe6
[TRTLLM-9090] [doc] Update online benchmarking docs ( #9611 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-16 13:33:20 -05:00
Lizhi Zhou
bd13957e70
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic ( #9726 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-16 05:16:32 -08:00
Enwei Zhu
609d1d0383
[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM ( #10008 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-16 04:06:49 -08:00
Enwei Zhu
6a238ca8ad
[None][doc] Update CONTRIBUTING.md ( #10023 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-16 18:58:43 +08:00
Emma Qiao
12727ebd7f
[None][infra] Waive failed test for main branch on 12/16 ( #10029 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-16 02:54:32 -08:00
Perkz Zheng
064b67e40c
[ https://nvbugs/5727952 ][fix] a pdl bug in trtllm-gen fmha kernels ( #9913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-16 00:34:37 -08:00
yuanjingx87
0a4c59136a
[None][infra] Fixing credential loading in lockfile generation pipeline ( #10020 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-16 15:38:29 +08:00
William Zhang
28b02b4f5a
[None][docs] Add README for Nemotron Nano v3 ( #10017 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 22:17:24 -08:00
Yihan Wang
6b5ebaae3e
[None][chore] Update internal_cutlass_kernels artifacts ( #9992 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-15 21:15:25 -08:00
Wanli Jiang
8af51211c1
[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass ( #9358 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-16 12:41:17 +08:00
Eran Geva
ce7a42f4cf
[ https://nvbugs/5731717 ][fix] fixed flashinfer build race condition during test ( #9983 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-15 20:30:24 -08:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE ( #9689 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend ( #9792 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
QI JUN
4ce35eacf1
[TRTLLM-9794][ci] move more test cases to gb200 ( #9994 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-15 19:50:41 -08:00
xinhe-nv
cdf56c278f
[TRTLLM-8638][fix] Add failed cases into waives.txt New activity. ( #9979 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-15 18:59:13 -08:00
Zhanrui Sun
b757ea73ba
[TRTLLM-9641][infra] Use public triton 3.5.0 in SBSA ( #9652 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-15 18:58:59 -08:00
Michal Guzek
e6187d8109
[ https://nvbugs/5708810 ][fix] Fix TRTLLMSampler ( #9710 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-15 23:26:52 +01:00
Patrice Castonguay
9ba14263db
[ https://nvbugs/5673559 ][fix] Unwaiving disagg test for nvbug 5673559 ( #9957 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-15 12:32:15 -05:00
Emma Qiao
d5d15c06df
[None][infra] Waive failed tests for main branch on 12/15 ( #10001 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-16 01:29:43 +08:00
Faraz
0c31502fbc
[None][feat] disable fused gemm for sm121 ( #9916 )
...
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
2025-12-15 12:07:06 -05:00
Kaiyu Xie
44b0f8c3ed
[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" ( #10002 )
2025-12-15 08:52:52 -08:00
zackyoray
63e7a2fa70
[None][infra] Update ucx to 1.20.x ( #9977 )
...
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-16 00:31:48 +08:00
arekay-nv
4f75a31a45
[ https://nvbugs/5540979 ][fix] Potential fix for 5540979 ( #9716 )
...
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
2025-12-15 10:49:31 -05:00
Wanli Jiang
3230fbe79a
[None][feat] Update reasoning parser for nano-v3 ( #9944 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 05:39:37 -08:00
Yukun He
9e7182b603
[TRTLLM-9615][feat] Implement a distributed tuning system ( #9621 )
...
Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL.
* Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases.
* Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability.
* Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-15 21:08:53 +08:00
Kaiyu Xie
ef4ea955b2
[None] [fix] Fix slrum scripts ( #10007 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-15 04:20:53 -08:00
Anthony Chang
ad12b795c9
[ https://nvbugs/5661741 ][fix] Fix accuracy issue in TRTLLM MoE introduced in #9377 ( #9999 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 03:31:56 -08:00
Bo Li
9eb5a229dd
[None][infra] Fully waive test_worker_restart test_disagg_server_restart. ( #9988 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-15 01:26:18 -08:00
Grzegorz Kwasniewski
83885c69e7
[TRTLLM-9136][feat] 2D parallel EP TP support ( #9459 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-15 09:52:29 +01:00
dominicshanshan
825025b137
[None][infra] Add multi gpu Ray tests into L0 merge change request list. ( #9996 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 15:55:54 +08:00
xinhe-nv
3c98b25005
[None][chore] Add failed cases into waives.txt ( #9941 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-14 23:14:24 -08:00
Kaiyu Xie
504ede707e
[None] [fix] Fix nsys_on argument for slurm scripts ( #9995 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-14 22:41:30 -08:00
Void
dda7658306
[ https://nvbugs/5655885 ][fix] fix invalid instruction error in 2shot ar kernel on Ampere ( #9394 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-15 14:22:56 +08:00
Yuxian Qiu
7588029763
[None][feat] Async pp send for PPCommTorch. ( #9976 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-15 14:03:46 +08:00
JunyiXu-nv
af899d2fe7
[TRTLLM-9860][doc] Add docs and examples for Responses API ( #9946 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-14 21:46:13 -08:00
Ziyi Xiong
f2aee0db03
[TRTLLM-9854][feat] Optimize the host overhead of _sample_async ( #9935 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-15 13:28:54 +08:00
shuyixiong
25db9e7b3e
[ https://nvbugs/5741060 ][chore] Waive all pg operator tests ( #9991 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-14 21:24:43 -08:00
Balaram Buddharaju
dfc8799352
[ https://nvbugs/5669114 ][fix] Switch to MMMU benchmark for Gemma3 27B ( #9966 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 21:23:59 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. ( #9524 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
Kaiyu Xie
0788635d6c
[TRTLLM-9762] [doc] Update documents for GB300 NVL72 ( #9987 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-14 19:30:28 -08:00
QI JUN
b57650f1e6
[TRTLLM-9794][ci] move test cases of gpt-oss to gb200 ( #9934 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-14 19:21:54 -08:00
xxi
f5696df285
[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm ( #9858 )
2025-12-15 10:47:15 +08:00
Yan Chunwei
355e06d66d
[None][doc] update readme for rpc ( #9972 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-15 10:16:50 +08:00
dominicshanshan
4bf42f8fa8
[ https://nvbugs/5580297 ][fix] Skip capture request error test from Ray stage ( #9947 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 10:03:16 +08:00
Anthony Chang
3be5f3abcf
[None][fix] Fix regex pattern for cubin filtering ( #9914 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 10:02:48 +08:00
Zongfei Jing
bf923a1074
[None] [chore] Comments cleanup ( #9978 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-12-15 09:46:37 +08:00
Simeng Liu
f21e2b3329
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. ( #9604 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-15 08:42:30 +08:00
Balaram Buddharaju
9a1750c8f9
[TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing ( #9922 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 11:29:30 -08:00
Emma Qiao
e0a4b72279
[None][infra] Waive failed tests for main branch on 12/14 ( #9982 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-14 22:48:34 +08:00
Matt Lefebvre
1375910f1b
[None][infra] Delete container before attempting import ( #9967 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-14 00:09:33 -08:00
Mike Iovine
96d654029d
[ https://nvbugs/5666816 ][fix] Unwaive llama3 eagle3 test ( #9964 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-12-14 15:07:35 +08:00
Yuxian Qiu
fcda1a1442
[None][fix] disable async pp send for ray cases. ( #9959 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-13 20:22:36 -08:00
TensorRT LLM
f6b0ddd61d
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-14 03:29:59 +00:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe ( #9852 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Faraz
64d7796234
[None][chore] Add namespace to header to fix tot failure ( #9973 )
2025-12-13 12:18:10 -05:00
Mike Iovine
383b13e0e5
[None][feat] Implement sampling on 1-model EAGLE3 ( #9885 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-13 07:38:22 -08:00
jellysnack
079ef8ae77
[None][feat] Graceful Error Handling for Guided Decoder ( #9078 )
...
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-13 19:57:59 +08:00
Yan Chunwei
85406f9dda
[ https://nvbugs/5720482 ][fix] Fix test rpc streaming ( #9902 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-13 01:14:43 -08:00
shuyixiong
8cbf2d958c
[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy ( #9793 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-13 01:02:11 -08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism ( #9720 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
shuyixiong
7fc720a397
[TRTLLM-9784][fix] Resolve port conflicts ( #9780 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-12 22:10:01 -08:00
bhsueh_NV
e49c70f6df
[None][feat] Support Mistral Large3 LLM part ( #9820 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-13 11:44:27 +08:00
Faraz
98d72c7648
[None][feat] spark cublas LUT table for llama-8b-bf16 perf ( #9811 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-12-12 22:37:56 -05:00
TensorRT LLM
e4e09867d1
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-13 03:26:42 +00:00
Balaram Buddharaju
461446045e
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 ( #9924 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 16:49:25 -08:00
tburt-nv
6147452158
[ https://nvbugs/4141427 ][chore] Add more details to LICENSE file ( #9881 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
yuanjingx87
246a877571
[None][infra] Remove generate lockfile schedule for 1.2.0rc4.post1 branch ( #9945 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-12 09:10:32 -08:00
Yuxian Qiu
cd4e639536
[None][feat] Async pp send. ( #9952 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-13 00:52:30 +08:00
Chuang Zhu
4cc4cbe926
[ https://nvbugs/5716787 ][fix] terminate nixl running when exiting ( #9785 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-12 11:15:02 -05:00
Chuang Zhu
9c59c9f920
[ https://nvbugs/5643787 ][fix] remove the war path for notify to itself ( #9834 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:10:05 -05:00
JunyiXu-nv
2fec53dfa5
[TRTLLM-9637][feat] Support tool parser for Kimi K2 ( #9830 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 23:32:39 +08:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Balaram Buddharaju
af315d8ef1
[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism ( #9757 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:05 +08:00
zackyoray
d5b9ad91c9
[None][feat] Upgrade NIXL to v0.8.0 ( #9707 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray
Signed-off-by: Bo Deng
Co-authored-by: Bo Deng
2025-12-12 20:21:10 +08:00
Lucas Liebenwein
e767fc649a
[None][feat] AutoDeploy: prepare_metadata revisited ( #9764 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-12 20:14:14 +08:00
Yukun He
a6263a127f
[None][chore] Degrade log level in cublas fp4 runner when using default configs ( #9951 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-12 18:53:54 +08:00
ruodil
9b3e5e90ee
[None][test] fix a typo in model name in script ( #9867 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-12 17:35:55 +08:00
chenfeiz0326
61745f034a
[ https://nvbugs/5727481 ][ci] Fix Port Conflict in Perf-Sanity CI Test ( #9896 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-12 17:16:50 +08:00
kris1025
2fc94e5dd7
[None][chore] unwaive qwen3 accuracy test ( #9895 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-12-12 16:30:09 +08:00
yufeiwu-nv
fd3d3a553d
[None][chore] Modify python ipc_util to align with C++ path ( #9894 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-12-12 15:55:22 +08:00
Yihan Wang
711016c799
[ https://nvbugs/5736923 ][infra] Waive timeout disaggregated/test_auto_scaling[http-round_robin] test ( #9942 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 15:15:13 +08:00
yuanjingx87
eeb03f314a
[None][infra] Replace the deprecated github token ( #9915 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-11 22:46:14 -08:00
Yifei Wang
9d1f2a9925
[ #6425 ][fix] address CUDA stream sync issue in ModelRunnerCPP ( #6426 )
...
Signed-off-by: yifei.w <yifei.w@bytedance.com>
2025-12-12 13:33:22 +08:00
Ivy Zhang
fded6c393d
[TRTLLM-9262][test] add groupgemm ada case for rcca ( #9833 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-12-12 13:23:33 +08:00
Kaiyu Xie
110820bb15
[TRTLLM-9792] [feat] Support multiple instances on single node for slurm scripts ( #9900 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-12 12:12:08 +08:00
Chuang Zhu
bd441e9822
[None][infra] revert ucx to 1.19 ( #9936 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:37:19 +08:00
Yiteng Niu
3e39afea9a
[None][infra] update nspect version for api change ( #9899 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-12-12 11:27:42 +08:00
dominicshanshan
093465ed29
[ https://nvbugs/5599176 ][fix] Unwaive fixed test for Ray ( #9861 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-12 11:24:05 +08:00
TensorRT LLM
0132769c22
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-12 03:20:43 +00:00
Yiqing Yan
5065b60cd1
[None][infra] Fix mergeWaiveList stage ( #9892 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-12 11:19:42 +08:00
xinhe-nv
e8efeb765d
[TRTLLM-9717][fix] fix multi nodes tests cases ( #9736 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-12 10:14:23 +08:00
Chuang Zhu
4670e0c297
[None][infra] update ucx to 1.20 ( #9786 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 09:49:46 +08:00
JunyiXu-nv
710c592d7c
[ https://nvbugs/5727517 ][fix] Preserve ip:port for disagg ( #9859 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 09:45:34 +08:00
Kanghwan
98c68c195b
[None][infra] Ignore comments from bots and CI accounts ( #9929 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-12-12 09:20:51 +08:00
jthomson04
4f6d4da035
[None][perf] Fix TPOT when min_tokens set ( #9862 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-12-11 13:55:31 -08:00
Kanghwan
95d928f071
[None][infra] Add workflow to auto-label 'waiting for feedback' on team comments ( #9886 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-12-12 05:43:30 +08:00
Venky
fd1270b9ab
[TRTC-43] [feat] Add config db and docs ( #9420 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-12-12 04:00:03 +08:00
Simeng Liu
24f92721f2
[ https://nvbugs/5597647 ][ci] Unwaive fixed tests. ( #9812 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-12 02:29:30 +08:00
Erin
89dabf5aa1
[TRTLLM-9736][feat] AsyncLLM and verl integ ( #9353 )
...
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-11 09:33:25 -08:00
JadoTu
02edb19f43
[None] [feat] add eos_token_id in generation_config to sampling params ( #9514 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-12 00:52:03 +08:00
xxi
488d38f88d
[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS ( #9772 )
2025-12-12 00:22:13 +08:00
Fanrong Li
af2849cc7a
[None][doc] Add DeepSeek-V3.2 to the supported models ( #9893 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-11 18:04:48 +08:00
Yan Chunwei
04a39a4e2b
[None][chore] enable test_ipc.py ( #9865 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-11 17:47:14 +08:00
Zongfei Jing
c76b428e2e
[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL ( #9618 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-12-11 16:21:32 +08:00
ChristinaZ
b8a5159fad
[None][feat] Enable PDL for indexer topK ( #9843 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-11 14:31:23 +08:00
Kanghwan
d147ad053e
[ #2730 ][fix] Fix circular import bug in medusa/weight.py ( #9866 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-12-11 13:51:08 +08:00
JunyiXu-nv
454e7e59e5
[ https://nvbugs/5718004 ][fix] Add warmup for cancellation test ( #9860 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-11 12:20:33 +08:00
Ziyi Xiong
81222c3670
[None] Fix warning when capturing CUDA graph ( #9746 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-10 19:22:38 -08:00
Bo Deng
c1d53ee43d
[ https://nvbugs/5582258 ][fix] unwaive ( #9650 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-12-10 19:18:30 -08:00
fredricz-20070104
341cb1a12c
[None][chore] Add GB300 support since it does not support segment ( #9731 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-10 18:36:55 -08:00
Patrice Castonguay
2c0293c612
[ https://nvbugs/5601682 ][fix] Unwaiving disagg test ( #9627 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-10 13:42:26 -05:00
Tian Zheng
ece3a8748f
[None][doc] Update doc for NVFP4 KV cache ( #9475 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-10 06:20:12 -08:00
cheshirekow
2f030312a8
[TRTLLM-9228][infra] Verify thirdparty C++ process ( #9367 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-12-10 21:01:19 +08:00
Yiqing Yan
1c11cae54d
[None][chore] bump version to 1.2.0rc6 ( #9874 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-10 04:53:26 -08:00
Yukun He
072f236002
[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache ( #9835 )
...
Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-10 20:41:04 +08:00
Matt Lefebvre
df1adfbb50
[TRTINFRA-7328][infra] - Move half B200 tests to lbd ( #9853 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-10 04:24:30 -08:00
Brian K. Ryu
8cec2da375
[None][feat] Port fp4 quantization kernel optimization from FlashInfer ( #9854 )
...
Signed-off-by: Brian Ryu <bryu@nvidia.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-10 13:13:48 +01:00
Matt Lefebvre
8fefa2c9d1
[None][infra] Fail fast if SLURM entrypoint fails ( #9744 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-10 02:31:29 -08:00
Perkz Zheng
e34302986d
[ https://nvbugs/5727952 ][fix] PDL bugs with trtllm-gen fmha kernels ( #9863 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-10 01:47:03 -08:00
Guoming Zhang
12693a526b
[None][chore] Enable L0 multi-gpus testing for Qwen3-next ( #9789 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-10 17:11:32 +08:00
Zhanrui Sun
49fe089470
[TRTLLM-9811][infra] Update urllib3 version >= 2.6.0 to fix high vulnerability issue ( #9823 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-10 00:18:11 -08:00
dominicshanshan
0e78a4b244
[ https://nvbugs/5702791 ][fix] Unwaive fixed test ( #9844 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-10 14:01:44 +08:00
Yukun He
979f37e443
[None][fix] Fix nvfp4 gemm allowed backends arg passing ( #9837 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-09 20:09:53 -08:00
QI JUN
2c46126a93
[TRTLLM-9794][ci] move some deepseek test cases to gb200 ( #9841 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-09 19:54:51 -08:00
Bo Li
9d3c675a0b
[None][chore] Support larger topK for NVLinkOneSided AlltoAll. ( #9816 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-10 11:10:55 +08:00
TensorRT LLM
6a39bb983c
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-10 03:07:34 +00:00
zhanghaotong
36c9e7cfe6
[None][chore] Add unittest for otlp tracing ( #8716 )
...
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-09 18:34:08 -08:00
dhansen-nvidia
2d33ae94d5
[ https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… ( #8463 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-12-09 18:51:31 -05:00
Patrice Castonguay
414448bb37
[ https://nvbugs/5719561 ][chore] Unwaive tests for nvbug 5719561 ( #9801 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-09 18:21:50 -05:00
Patrice Castonguay
ff0ef19ee9
[ https://nvbugs/5688388 ][chore] Unwaiving fixed disagg test ( #9800 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-09 16:51:46 -05:00
Matt Lefebvre
5de4e3f621
[TRTINFRA-7328][infra] Consume SlurmCluster scratchPath and cleanup mounts ( #9600 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-09 13:34:09 -08:00
Eran Geva
4da3121363
[ #8921 ][chore] AutoDeploy NanoV3 to use SYMM_MEM allreduce strategy ( #9797 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-09 13:05:38 -08:00
Patrice Castonguay
7d7d05d8db
[None][chore] Adding flaky auto scaling test to waives ( #9851 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-09 15:05:19 -05:00
Mike Iovine
07c76a5fac
[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) ( #8810 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-12-09 11:06:31 -05:00
Dom Brown
3156f2e852
[ https://nvbugs/5575841 ] [fix] Nvbug 5575841: Remove additional test waivers for TestMoEFP4 ( #9788 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-12-09 13:37:55 +00:00
Emma Qiao
75bc386b65
[None][infra] Waive failed cases for main branch on 12/09 ( #9839 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-09 19:39:29 +08:00
QI JUN
58c29957d9
[TRTLLM-9794][ci] move qwen3-next test cases to gb200 ( #9827 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-09 01:58:25 -08:00
Stefan Niebler
d600b9f851
[TRTLLM-6756][feat] Update BeamSearch for TorchSampler ( #9660 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-12-09 10:44:01 +01:00
Robin Kobus
76f49c903b
[None][fix] Additional model outputs for pipeline parallelism ( #9794 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-09 10:41:22 +01:00
Yiqing Yan
2ddcb45b2a
[None][chore] Generate lock file for release/1.2.0rc4.post1 branch automatically ( #9829 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-09 16:34:17 +08:00
yufeiwu-nv
fbcf03040f
[None][test] Refactor qa/llm_perf_nim.yml test list ( #9700 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-08 22:00:43 -08:00
QI JUN
252769c930
[TRTLLM-9794][ci] remove duplicated test cases in DGX B200 ( #9817 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-08 21:51:30 -08:00
Zhanrui Sun
309f92ec09
[None][infra] Use artifactory pypi mirror for Cython install ( #9774 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-09 13:49:41 +08:00
Shi Xiaowei
b050804b63
[TRTLLM-6537][infra] extend multi-gpu tests related file list ( #9614 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-09 12:54:53 +08:00
JunyiXu-nv
90890785eb
[ https://nvbugs/5722653 ][fix] Fix config file used by disagg_client ( #9783 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-08 20:34:55 -08:00
Balaram Buddharaju
bafb60c1bc
[None][chore] Fix tests failing on pre-merge 12/08 ( #9819 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-08 20:08:52 -08:00
Bo Li
f2006a1f74
[ https://nvbugs/5726066 ][infra] Waive timeout disaggregated/test_auto_scaling tests. ( #9815 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-08 19:51:43 -08:00
TensorRT LLM
c7a2568872
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-09 03:19:48 +00:00
JunyiXu-nv
f521f6d910
[None][fix] Fix unterminated process issue for RemoteOpenAIServer ( #9490 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-09 11:15:40 +08:00
Jiagan Cheng
4a3a66b124
[ https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang ( #9659 )
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-12-08 18:43:52 -08:00
bhsueh_NV
d6f961d3fe
[None][feat] Add llama4 scaling ( #9771 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-09 10:27:39 +08:00
Tri Dao
1c4dacb19a
[None][fix] Fix PDL in TRTLLM MOE for dsv3 ( #9799 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
2025-12-09 10:16:29 +08:00
yuanjingx87
390391ebf1
[None][infra] Correct the waived test names due to a merge conflict ( #9803 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-09 09:48:21 +08:00
Chenghao Zhang
75f5446d67
[ #9753 ][feat] AutoDeploy: Implement add rms_norm fusion ( #9754 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-08 14:24:27 -08:00
Jhao-Ting Chen
da074be037
[None][fix] Fix #8383 introduced TRTLLM backend python error ( #9804 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 13:31:37 -08:00
Eran Geva
23cf72b0f8
[ #8921 ][feat] Added symetric memory AllReduce strategy ( #8919 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-08 13:12:56 -08:00
Thor Johnsen
f9380581c5
[ https://nvbugs/5508267 ][fix] Proper handling of inactive canceled requests ( #9280 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-12-08 13:11:44 -08:00
Yibin Li
faabc1a387
[TRTLLM-7967][chore] Add more tests ( #9415 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-12-08 11:57:32 -08:00
Jhao-Ting Chen
0a09465089
[ https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model ( #8383 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 11:16:05 -08:00
Frank
f6df9eb2a6
[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench ( #9250 )
2025-12-08 10:37:40 -08:00
sunnyqgg
1c7b7cdd47
[TRTLLM-9506][fix] Fix AR for DeepSeek-R1 2 model path ( #9661 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-12-08 10:12:32 -05:00
Eran Geva
98db262a67
[None][fix] Switch AutoDeploy's default allreduce strategy to NCCL ( #9666 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-08 03:26:21 -08:00
Lizhi Zhou
52f78e4000
[ http://nvbugs/5649010 ][fix] fix test_auto_scaling.py::test_worker_restart timeout ( #9775 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-08 03:26:01 -08:00
fredricz-20070104
96d9b67d65
[ https://nvbugs/5527655 ][test] Add test case for RCCA 5527655 ( #9511 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-08 01:27:13 -08:00
fredricz-20070104
ededeecb0f
[None][test] Add Kimi k2 WIDEEP perf and accuracy cases ( #9686 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-08 01:25:07 -08:00
Zheng Duan
e7395c6607
[None][infra] update mooncake in docker images ( #9584 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-12-08 16:56:40 +08:00
xinhe-nv
3f55c07223
[None][chore] Remove closed bugs ( #9770 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-07 22:51:55 -08:00
Guoming Zhang
448bb1a44f
[TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… ( #9696 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-08 13:39:12 +08:00
Li Min
a422d70be6
[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. ( #9690 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-08 13:28:11 +08:00
Fanrong Li
2f526583fb
[None][chore] Move the rocketkv e2e test to post-merge ( #9768 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-08 13:22:16 +08:00
Emma Qiao
137713a869
[None][infra] Waive failed cases for main on 12/08 ( #9773 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-07 20:18:29 -08:00
ruodil
d232709568
[ https://nvbugs/5666804 ][test] only adding sampler config for limited models ( #9512 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-12-07 19:40:29 -08:00
Kaiyu Xie
069b05cf3d
[TRTLLM-9706] [doc] Update wide EP documents ( #9724 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-08 11:21:11 +08:00
TensorRT LLM
03f89d7aa4
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-08 03:07:49 +00:00
Yukun He
8b9ab9a701
[None][fix] Fix two tuning cache miss issues. ( #9743 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-08 10:47:21 +08:00
fredricz-20070104
9bfb6179ec
[ https://nvbugs/5422621 ][test] Add GB 200 WIDEEP test case for RCCA 5422621 ( #9506 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-08 10:41:40 +08:00
xxi
8e27ce7084
[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI ( #9645 )
2025-12-08 10:19:40 +08:00
Zheng Duan
4da0e1473c
[None][test] add ntp tolerance in time metrics verification ( #9741 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-12-08 09:51:10 +08:00
chenfeiz0326
383178c00a
[TRTLLM-9000][feat] Add multi-node Perf Tests into CI ( #8800 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-08 09:00:44 +08:00
Ludwig Schneider
41ce14ab04
[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce ( #9314 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2025-12-07 09:43:26 -08:00
Chenjie Luo
d252101a76
[OMNIML-3036][doc] Re-branding TensorRT-Model-Optimizer as Nvidia Model-Optimizer ( #9679 )
...
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
2025-12-07 07:14:05 -08:00
Yanchao Lu
f59d64e6c7
[None][fix] Several minor fixes to CI setting ( #9765 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-07 23:07:59 +08:00
Emma Qiao
7c6c493993
[None][infra] Waive failed cases for main branch on 12/07 ( #9769 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-07 06:26:47 -08:00
JunyiXu-nv
b210f22c7e
[ https://nvbugs/5703953 ][fix] Preserving ip:port for trtllm-serve before initializing llm ( #9646 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-06 20:13:48 -08:00
TensorRT LLM
6dc8877416
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-07 03:08:38 +00:00
Yan Chunwei
e4c707845f
[None][fix] enable hmac in RPC ( #9745 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-07 08:24:46 +08:00
Jonas Li
2645a78f34
[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature ( #9682 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-06 02:24:51 -08:00
mpikulski
8d2178d321
[TRTLLM-9522][chore] implement default attach_multimodal_embeddings ( #9664 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-05 22:12:16 -08:00
Enwei Zhu
7cd5a67e25
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP ( #9592 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-05 22:08:52 -08:00
xxi
c2f2add6df
[None][fix] fix a bug: deepseek_fp8_block_scales in TRTLLMGEN-MoE use 2D x_sf instead of 1D ( #9658 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-12-05 21:01:39 -08:00
shuyixiong
df5b32966d
[None][fix] Fix triton moe load_weight ( #9649 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-12-06 11:17:04 +08:00
TensorRT LLM
74ed9f0468
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-06 03:10:18 +00:00
QI JUN
d4f68195c3
[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide ( #9571 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
QI JUN
0406949f32
[TRTLLM-9093][doc] update hyper links in overview ( #9568 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Yan Chunwei
b7a255d67e
[TRTLLM-9075][doc] refine the slurm examples ( #9548 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Yiqing Yan
6ebdf1c304
[None][infra] Updated Linux installation guide ( #9485 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Enwei Zhu
b46e78e263
[TRTLLM-9157][doc] Guided decoding doc improvement ( #9359 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
QI JUN
0915c4e3a1
[TRTLLM-9086][doc] Clean up TODOs in documentation ( #9292 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Pengyun Lin
c6dc68a28e
[None][doc] VDR 1.0 trtllm-serve doc enhancement ( #9443 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Yan Chunwei
3e442922a3
[TRTLLM-9160][doc] add doc to llm_runtime.py ( #9482 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
jthomson04
6332bf27e6
[TRTLLM-9199][docs] KV Connector Docs ( #9325 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Iman Tabrizian
9425f7fe3a
[ https://nvbugs/5601682 ][fix] Fix cacheTransceiver hang ( #9311 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Mike Iovine
31ab367576
[None][chore] Waive flakey disagg tests ( #9749 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 13:07:05 -08:00
Chenghao Zhang
d6f95a4363
[None][feat] AutoDeploy: Perf optimization for Attention and rmsnorm ( #9719 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-05 12:59:04 -08:00
yuanjingx87
c7b5e3ea8f
[None][infra] Update allowed list 20251204 ( #9718 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-05 11:55:56 -08:00
jthomson04
299601aebf
[ https://nvbugs/5670672 ][fix] Fix flaky KV connector tests ( #9676 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-12-05 10:04:54 -08:00
Robin Kobus
eb0b426e5d
[None][refactor] Improve request processing function in sampler ( #9671 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-05 16:41:49 +01:00
Robin Kobus
faf682b8bc
[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders ( #9583 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-05 16:07:20 +01:00
yufeiwu-nv
68253d9d29
[ https://nvbugs/5518713 ][test] Refactor core test lists by merging with llm_perf_cluster.yml ( #9714 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-05 01:15:37 -08:00
Kaiyu Xie
e06c582648
[None] [tests] Unwaive EPLB tests ( #9625 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-05 00:13:24 -08:00
TensorRT LLM
a736226abd
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-05 03:26:00 +00:00
gramnarayan
74df9b180b
[ #9602 ][feat] AutoDeploy: Support TRTLLM Sampler ( #9641 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 19:24:11 -08:00
Kaiyu Xie
cb87c44912
[TRTLLM-9562] [doc] Add Deployment Guide for Kimi K2 Thinking on TensorRT LLM - Blackwell ( #9711 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-04 19:20:06 -08:00
Lizhi Zhou
dc766fc126
[ https://nvbugs/5633340 ][fix] start disagg workers and servers on free ports ( #9694 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:51:29 +08:00
Lizhi Zhou
0d0a16fff4
[TRTLLM-8920][feat] decouple disagg service from fastapi ( #8714 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:44:16 +08:00
Thor Johnsen
33224560b8
[None][doc] Added line about partial reuse ( #7846 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-12-04 18:19:32 -08:00
Yiqing Yan
e834f04238
[TRTLLM-9579][infra] Set mergeWaiveList stage UNSTABLE when there is any issue ( #9692 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-05 10:18:31 +08:00
brb-nv
5d6edc3944
[None][doc] Add feature docs for helix parallelism ( #9684 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-04 18:08:40 -08:00
Yiqing Yan
731b2eb4ef
[TRTLLM-5312][infra] Add triton trigger rules ( #6440 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-05 07:35:04 +08:00
pdrake-nv
cee7071e27
[None][infra] Add container notices and documentation ( #9185 )
...
Signed-off-by: Parker Drake <pdrake@nvidia.com>
2025-12-04 10:08:55 -08:00
Aurelien Chartier
041bb32151
[None][fix] Fix TLLM_SPEC_DECODE_FORCE_NUM_ACCEPTED_TOKENS for MTP/EAGLE ( #9608 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-04 08:23:57 -08:00
xinhe-nv
530af1a98e
[None][chore] Add failed cases into waives.txt ( #9662 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-04 22:33:22 +08:00
Anthony Chang
60cdca3740
[None][fix] Recover TRTLLM MoE Perf for DEP ( #9562 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-04 22:10:25 +08:00
Jin Li
e5d4305c04
[ https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … ( #9617 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 18:17:24 +08:00
ruodil
8a392af28f
[None][test] rename wide ep and disagg metric name in perf test ( #9704 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-04 18:16:06 +08:00
zackyoray
398d24232d
[None][feat] Add NIXL-LIBFABRIC support ( #9225 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray <yorayz@nvidia.com>
2025-12-04 15:38:06 +08:00
Yan Chunwei
05058f5e2a
[None][ci] unwaive tests ( #9651 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-04 15:06:07 +08:00
tcherckez-nvidia
f9aa86dbdd
[ #8733 ][feat] Add Llama4 MoE handling to AutoDeploy ( #9556 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
2025-12-04 08:03:33 +02:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint ( #9057 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
Tailing Yuan
4eed648e22
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks ( #9667 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-04 13:41:15 +08:00
Jin Li
87e0c8a749
[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 ( #7838 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 13:32:11 +08:00
Necofish
323a82f4d5
[None][fix] fix error when processing batches containing both text and mm data ( #8381 )
...
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
2025-12-04 14:28:24 +09:00
Yiqing Yan
47f650ca13
[TRTLLM-5093][infra] Write env variables to a file in the interactive debug session ( #6792 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-04 11:41:27 +08:00
mpikulski
744f0eff1b
[TRTLLM-9522][fix] restore trtllm-serve mm_embedding_serve ( #9669 )
2025-12-03 19:27:11 -08:00
TensorRT LLM
94924634e0
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-04 03:19:23 +00:00
Yiqing Yan
e31142202e
[TRTLLM-7181][infra] Generate test results when pytest timeout happens ( #9396 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-04 10:05:38 +08:00
Wanli Jiang
4485e516a2
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters ( #9540 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-04 06:47:32 +08:00
gramnarayan
098b9ff226
[ #9147 ][feat] AutoDeploy: Draft Target Speculative Decoding ( #9275 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 05:13:49 +08:00
Lucas Liebenwein
a1964bcbbc
[ #9643 ][fix] AutoDeploy: fix nano sharding config ( #9668 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-04 03:10:25 +08:00
Wei-Ming Chen
d9fba85396
[OMNIML-2932] [feat] nvfp4 awq support ( #8698 )
...
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
2025-12-03 19:47:13 +02:00
Gal Hubara-Agam
d7bd62b1a0
[ https://nvbugs/5693853 ][fix] Fix error handling when querying machin… ( #9483 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-12-03 19:44:51 +02:00
Guoming Zhang
b5e2b9b51f
[ https://nvbugs/5702795 ][fix] Remove the warning message for aten.log. ( #9665 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-04 00:02:15 +08:00
Iman Tabrizian
09beaa5933
[None][fix] Fix wide ep MoE error ( #9642 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-03 23:11:06 +08:00
Michal Guzek
4e5b10da48
[ https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch ( #8253 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-03 15:42:15 +01:00
Patrice Castonguay
ae8d8a266a
[ https://nvbugs/5705197 ][chore] Unwaive timeout disagg tests ( #9637 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-03 22:18:36 +08:00
Guoming Zhang
e2f82085f1
[None][doc] Replace the tensorrt icon with torch icon on overview.md ( #9644 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-03 21:52:46 +08:00
Perkz Zheng
992781dc7b
[None][feat] update trtllm-gen nvfp4 kernels with better performance ( #9510 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-03 21:35:49 +08:00
Guoming Zhang
79e872de31
[None][test] Update Qwen3-next accuracy testing by setting the cuda … ( #9613 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-03 20:52:53 +08:00
JunyiXu-nv
743486b2ea
[TRTLLM-6842][feat] Support Response API for general purpose ( #9392 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 16:49:26 +08:00
xinhe-nv
3a748b166b
[None][chore] Add failed cases into waives.txt ( #9593 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-03 16:26:06 +08:00
Pengyun Lin
1d4fb89235
[TRTLLM-8241][feat] Aliasing to comply to LlmArgs ( #9586 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-03 15:28:45 +08:00
fredricz-20070104
80ff9015ce
[ https://nvbugs/5561153 ][test] Fix log error for perf test ( #9622 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-03 15:27:13 +08:00
brb-nv
43f6ad7813
[ https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism ( #9647 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 15:13:59 +08:00
Bo Li
8b5ededc83
[TRTLLM-9391][chore] Automatically estimate required workspace. ( #9535 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-03 12:49:38 +08:00
Suyog Gupta
93871d52b2
[None][chore] AutoDeploy update cuda stream manager for multi-device ( #9575 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-12-02 20:43:14 -08:00
JunyiXu-nv
beffbd6002
[TRTLLM-9242][doc] Add examples showcasing openai compatible APIs ( #9520 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 11:47:02 +08:00
heyuhhh
a08eb81cce
[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 ( #9572 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-03 11:33:46 +08:00
TensorRT LLM
097ac32b28
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-03 03:19:14 +00:00
yufeiwu-nv
21f2ba74e8
[None][test] Remove duplicate test cases ( #9623 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-03 10:35:26 +08:00
Yiqing Yan
8c88454fa5
[TRTLLM-7101][infra] Reuse passed tests ( #6894 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-03 10:07:23 +08:00
Anurag Mukkara
642dfae73a
[ https://nvbugs/5698434 ][fix] Use separate weight mapper for draft ( #9607 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-12-02 16:00:22 -08:00
Enwei Zhu
a3455f55c7
[None][chore] Fix trtllm-eval and move GroupedGemmInputsHelper ( #9612 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-03 07:55:03 +08:00
Chang Liu
3916d032ec
[None][chore] Remove traceback dump for multimodal input processor ( #9634 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-03 07:41:03 +08:00
brb-nv
55c7023c92
[None][chore] Waive test failing on pre-merge ( #9638 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 07:31:10 +08:00
Yu Chi Li
7ca38a6c0b
[ #9632 ][feat] Support EXTRA_WHEEL_BUILD_ARGS during wheel build ( #9633 )
...
Signed-off-by: Yu Chi Li <yuchil@nvidia.com>
2025-12-02 14:54:58 -08:00
Grzegorz Kwasniewski
0a7a88e74e
[TRTLLM-8946][feat] Improved heuristics to detect shardable regions ( #9200 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 22:08:19 +01:00
Patrice Castonguay
3991aa9c72
[ https://nvbugs/5688388 ][fix] fix: Reducing num request in disagg test to speed up ( #9598 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-02 12:48:53 -05:00
Neta Zmora
a560ba5546
[ #9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels ( #9551 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-03 01:39:38 +08:00
Shi Xiaowei
227d42e492
[ https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity ( #9549 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-03 01:17:03 +08:00
Lucas Liebenwein
e72ce98c0f
[ #9150 ][feat] AutoDeploy: reviewer comments for #9150 ( #9527 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 12:09:10 -05:00
William Zhang
2dd3ebf037
[ #9150 ][feat] Add code for nano v3 to custom implementation in AD ( #9465 )
...
* Why?
We would like to show an alternative to monkey-patching in AutoDeploy.
* What?
This commit builds on the existing custom model implementation for
NemotronH and adds the bits relevant for MoE layers.
Part of #9150 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-02 08:56:44 -08:00
Mike Iovine
d5b7f0c8ad
[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch ( #8889 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-02 10:32:02 -05:00
Thor Johnsen
95049eea86
[ https://nvbugs/5627710 ][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks ( #9056 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-02 09:10:21 -06:00
Yan Chunwei
b86256eb54
[TRTLLM-9144][fix] enhance RPC robustness ( #8711 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-12-02 21:37:59 +08:00
Jin Li
21e3dc11d8
[ https://nvbugs/5667774 ][fix] Refine Piecewise Cuda Graph Condition for DP ( #9393 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-02 21:09:15 +08:00
Chang Liu
73a543d78f
[None][fix] Extract GPU count from single-node stage names ( #9599 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-02 20:58:16 +08:00
brb-nv
be48cdf1d1
[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite ( #9597 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-02 20:10:07 +08:00
Eran Geva
1a46bb0d18
Lock the gpu clocks in L0 perf tests ( #9585 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-02 18:13:45 +08:00
Emma Qiao
4a8766c11d
[None][infra] Remove an invalid test name in waives.txt ( #9620 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 18:05:17 +08:00
yuanjingx87
f9524bcc07
[None][infra] Update allowlist 2025/12/01 ( #9616 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-02 16:55:14 +08:00
mpikulski
84a1531594
[TRTLLM-9488][feat] use FlashInfer.sampling by default ( #9545 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-02 16:29:55 +08:00
Emma Qiao
3e4f2388a9
[None][infra] Waive failed cases for main branch ( #9615 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 15:48:27 +08:00
shuyixiong
1a2118b8fe
[ https://nvbugs/5702793 ][fix] Fix uncontiguous tensor view ( #9576 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-12-02 15:41:32 +08:00
xinhe-nv
ad46d19027
[None][chore] Add failed cases into waives.txt ( #9588 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-02 14:24:11 +08:00
ruodil
4586b5f42f
[ https://nvbugs/5582091 ][test] increase warmup times in testing for multi-gpu cases ( #9578 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-02 14:22:49 +08:00
Wanli Jiang
5657a00ec0
[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend ( #9261 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-02 13:40:20 +08:00
xinhe-nv
3911d0496e
[None][fix] Waive gb200 ( #9580 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-02 12:09:21 +08:00
JunyiXu-nv
9a6df980cd
[ https://nvbugs/5703953 ][fix] Use random port for disagg tests ( #9582 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-02 11:40:14 +08:00
Guoming Zhang
6fbe87c8b5
[None][chroe] Polish qwen3-next modeling code. ( #8902 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-02 11:28:35 +08:00
TensorRT LLM
96a0e14522
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-02 03:17:38 +00:00
Iman Tabrizian
356a52edf5
[None][feat] Add support for KVCache reuse for DSv32 ( #9383 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-02 11:14:30 +08:00
Shijie
dcf5c86720
[None][feat] Unify nvfp4 gemm backend ( #8963 )
...
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Shijie <jaywan@nvidia.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-02 11:03:51 +08:00
QI JUN
d11acee22d
[TRTLLM-9085][doc] fix math formula rendering issues in github ( #9605 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-02 10:18:16 +08:00
Yuening Li
09c840184c
[None][fix] Prevent YAML partial kv_cache_config from incorrectly overriding the complete kv_cache_config ( #9262 )
...
Signed-off-by: Yuening Li <62227368+Yuening-wa@users.noreply.github.com>
2025-12-02 10:10:08 +08:00
Eran Geva
c9771ebb99
[ #9198 ][feat] Refactor dist ops in AutoDeploy ( #9301 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-02 02:36:32 +08:00
Chenghao Zhang
0a2104dce9
[None][feat] AutoDeploy: Use the router gemm op for nemotron MOE ( #9500 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-01 10:24:31 -08:00
Venky
639c939a4f
[TRTC-1943][feat] Env vars override support in LLM API ( #9104 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-01 10:04:49 -08:00
brb-nv
f61067cbb5
[None][chore] Defer exposing context parallel configs ( #9552 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-01 09:50:02 -08:00
Stefan Niebler
f155812eb0
[TRTLLM-6756][feat] Add Beam Search to TorchSampler ( #8509 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-12-01 18:48:04 +01:00
Emma Qiao
b024040df0
[None][infra] Update the pytest options after MI ( #9579 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-02 00:11:30 +08:00
Yiqing Yan
c72919980a
[TRTLLM-6768][infra] Fix params for not updating github status ( #6747 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-01 23:51:21 +08:00
Yanchao Lu
078d3a576e
[None][ci] Minor change for Slurm scripts ( #9561 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-01 22:52:08 +08:00
Yanchao Lu
7127c4407a
[None][test] [None][test] Waive main branch test failures 12/1 ( #9566 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-01 21:54:53 +08:00
Enwei Zhu
90345ad3f3
[None][fix] Skip Allreduce init for Attention DP ( #9542 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 21:24:40 +08:00
Shi Xiaowei
48b1d31895
[ https://nvbugs/5651854 ][infra] Enable perf metrics during accuracy testing ( #9140 )
2025-12-01 20:15:32 +08:00
Martin Marciniszyn Mehringer
974ad56515
[None][chore] reduce the layers of the devel docker image ( #9077 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-12-01 03:56:30 -08:00
alel
4107254c82
[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm ( #9428 )
...
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2025-12-01 18:10:45 +08:00
Zhenhuan Chen
24004535fe
[None][chore] refactor disaggregated scripts to use named arguments ( #9581 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-01 17:33:47 +08:00
Yukun He
730eb3d859
[None][fix] Replace hash method with unique_id for cutedsl MoE runners. ( #9569 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-01 17:02:33 +08:00
Neta Zmora
bc25fff039
[ #9496 ][fix] AutoDeploy: remove auto-tuner from nvfp4_gemm forward ( #9497 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-01 10:04:39 +02:00
Fanrong Li
d69bf9f92a
[None][feat] add chat template kwargs support to longbench-v2 ( #9544 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-01 15:59:13 +08:00
Gaoji Liu
9d2df04a72
[None][doc] fix mtp.py typo ( #9307 )
...
Signed-off-by: liugaoji <757394026@qq.com>
2025-11-30 21:55:13 -08:00
JadoTu
a92af27411
[None][chore] remove qwen3-next accuracy tests ( #9534 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-01 11:49:37 +08:00
Pengbo Wang
aa3310f64f
[ https://nvbugs/5503479 ][fix] Temporarily lower reference accuracy to stabilize CI ( #9398 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-12-01 11:49:14 +08:00
Enwei Zhu
2e3ac3c48f
[ https://nvbugs/5684703 ][fix] Unwaive disagg guided decoding test ( #9466 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 11:39:40 +08:00
TensorRT LLM
0b10214f55
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-01 03:08:12 +00:00
Yuan Tong
becd44f9bc
[None][fix] Correct virtual memory allocation alignment ( #9491 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-12-01 10:59:19 +08:00
Li Min
1797e91dfd
[TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. ( #9543 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-01 10:19:36 +08:00
Enwei Zhu
34e2fa5c96
[ https://nvbugs/5690172 ][fix] Fix Qwen3-235B ATP accuracy issue with PDL ( #9530 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 09:10:21 +08:00
heyuhhh
6e470aab72
[None] [feat] Optimize the algorithm part of RocketKV ( #9333 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-01 09:04:09 +08:00
xxi
c12e67bb66
[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend ( #9486 )
2025-12-01 08:37:07 +08:00
Yanchao Lu
694b60d92d
[None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage ( #9559 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-30 21:14:18 +08:00
Yanchao Lu
0398875d55
[None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage ( #9558 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-30 20:27:13 +08:00
JunyiXu-nv
3f588198dc
[None][fix] Fix port conflict in disagg tests ( #9474 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-30 17:33:22 +08:00
Emma Qiao
c927ccf510
[None][infra] Wiave failed tests for main branch on 11/30 ( #9555 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-30 16:13:20 +08:00
Yanchao Lu
f03641808b
[None][infra] - Request idle time exemption for OCI jobs ( #9528 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-30 13:34:09 +08:00
TensorRT LLM
bde69dd1df
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-30 03:07:46 +00:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism ( #9342 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
dominicshanshan
6345074686
[None][chore] Weekly mass integration of release/1.1 -- rebase ( #9522 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-11-29 21:48:48 +08:00
TensorRT LLM
ae0124ef84
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-29 03:07:19 +00:00
Grzegorz Kwasniewski
cff54fcae3
[ #8948 ][feat] Support custom sharding config ( #9143 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-29 05:28:05 +08:00
mpikulski
bc355eadf5
[TRTLLM-9488][fix] llmapi references ( #9547 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 08:54:05 -08:00
binghanc
db5b876124
[None][feat] support for more accurate AR calculation ( #9323 )
...
Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>
2025-11-29 00:34:21 +08:00
Matthias Jouanneaux
f8dd494536
[None][perf] Helix: improve all-to-all perf for large CP size ( #9494 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
Co-authored-by: Zheyu Fu <zheyuf@nvidia.com>
2025-11-28 07:24:55 -08:00
dominicshanshan
70efa3ac43
[None][infra] Waive failed case in pre-merge on 11/28 ( #9537 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-11-28 20:53:45 +08:00
mpikulski
e5f39ec7cf
[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option ( #9454 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 13:00:39 +01:00
Zhanrui Sun
930cdad054
[TRTLLM-9541][infra] Use artifactory mirror for download.pytorch.org ( #9477 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-28 18:31:50 +08:00
Robin Kobus
5eae3650c3
[None][fix] Pass checkpoint_format to create_input_processor ( #9521 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-28 10:32:29 +01:00
Emma Qiao
2d7421b314
[None][infra] Waive failed cases for main branch on 11/28 ( #9539 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-28 17:19:55 +08:00
Zhenhuan Chen
7c3bb8534d
[None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… ( #9538 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-28 16:45:23 +08:00
Kaiyu Xie
0d3c0c2156
[None] [chore] Enhancements and clean up to slurm scripts ( #9493 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-28 16:41:41 +08:00
Chang Liu
389b73c349
[None][fix] Remove FP8 K/V buffer from TRTLLM sparse MLA attention kernel ( #9529 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-28 15:26:52 +08:00
Liao Lanyu
bf84d9cea1
[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos ( #9533 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-28 14:52:05 +08:00
yufeiwu-nv
08755a809d
[ https://nvbugs/5689658 ][test] Fix gpu lock issue running on cluster ( #9441 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-28 13:59:22 +08:00
Yukun He
60c43a200a
[None][fix] Fix on-disk cache and revise logger/statistics for AutoTuner. ( #9211 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-28 13:32:21 +08:00
JunyiXu-nv
c87e81c1d8
[ https://nvbugs/5685015 ][fix] Update invalid max_token test ( #9435 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-28 11:41:16 +08:00
Emma Qiao
658d9fc0c5
[TRTLLM-8970][infra] Fix generate report when has isolation test result ( #8861 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-28 11:26:06 +08:00
TensorRT LLM
5e52dff6c6
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-28 03:18:41 +00:00
Bo Li
19f3f4e520
[ https://nvbugs/5637037 ][chore] Update waive lists. ( #9386 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-28 10:45:22 +08:00
Kaiyu Xie
85b4c92d60
[None] [chore] Update to cutlass 4.3 ( #8637 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-28 08:54:34 +08:00
Lucas Liebenwein
2f8bd6fb36
[ #9150 ][feat] AutoDeploy Nemotron-Flash support ( #9504 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-27 18:03:57 +01:00
Enwei Zhu
c2562fc800
[ https://nvbugs/5687820 ][fix] Remove self.abort() in DetokenizedGenerationResult ( #9449 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-27 22:54:40 +08:00
Yiqing Yan
1c9158fde3
[TRTLLM-7288][infra] Download merged waive list in slurm script ( #8999 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-27 21:48:40 +08:00
Yueh-Ting (eop) Chen
4cbfc10b28
[ https://nvbugs/5674665 ][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 ( #9518 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-27 21:40:34 +08:00
Bo Li
62b771877c
[TRTLLM-9389][chore] Refactor AlltoallMethodType. ( #9388 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-27 21:09:29 +08:00
Fanrong Li
2d5eadf65f
[None][fix] fix TP support for DeepSeek-V3.2 on hopper ( #9484 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-27 21:02:25 +08:00
JadoTu
51bf7164d3
[None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 ( #9330 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-27 18:05:00 +08:00
Zhenhuan Chen
e47927e847
[None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow ( #9479 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-27 17:08:41 +08:00
yuanjingx87
3ada0bfc65
[None][infra] Fix Slurm job script ( #9508 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-27 16:41:01 +08:00
xxi
f1ed057b4c
[cherry-pick][ https://nvbugs/5670793 ][fix] Solve trtllm-serve launch_disaggregated issue ( #9346 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-11-27 16:13:58 +08:00
Emma Qiao
a21be43677
[TRTLLM-9279][infra] Use flexcache for gh200 nodes since they locate in Austin ( #9405 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-27 15:42:38 +08:00
Lizhi Zhou
8104a78931
[None][chore] revert batch_size=1 to prevent timeout and lower accuracy reference by 0.12% as a WAR ( #9447 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-27 14:25:44 +08:00
Liao Lanyu
5425d96757
[TRTLLM-9513][docs] Qwen3 deployment guide ( #9488 )
...
Signed-off-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
Co-authored-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
2025-11-27 14:12:35 +08:00
Emma Qiao
0442510304
[None][infra] Waive failed case in pre-merge on 11/27 ( #9507 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-27 13:53:33 +08:00
Ziyi Xiong
1dd55d8507
[ https://nvbugs/5698581 ][fix] Init draft tokens for CUDA graph dummy request ( #9505 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-27 13:05:37 +08:00
Jiagan Cheng
14762e0287
[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning ( #9294 )
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-11-27 12:22:01 +08:00
HuiGao-NV
03331bc43d
[ https://nvbugs/5547414 ][fix] enable case after using local cache model ( #9473 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-27 12:18:20 +08:00
Patrice Castonguay
1b2da426cd
[ https://nvbugs/5680310 ][fix] Fix ctx only timed out test ( #9410 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-27 11:21:21 +08:00
TensorRT LLM
89701a594b
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-27 03:19:47 +00:00
QI JUN
a67d94963e
[None][chore] update comments in llm_args.py ( #9472 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-27 11:06:34 +08:00
QI JUN
c6fa042332
[TRTLLM-9085][doc] fix math formula rendering issues ( #9481 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-27 10:09:12 +08:00
Aurelien Chartier
f2f197360d
[ #9463 ][feat] Add revision option to trtllm commands ( #9498 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-11-27 09:30:01 +08:00
Shi Xiaowei
e76e149861
[ https://nvbugs/5608930 ][fix] Fix a typo ( #9487 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-27 09:05:17 +08:00
Zheyu Fu
dbbed1f85a
[None][ci] Waive blackwell test on spec gate. ( #9502 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-27 07:19:58 +08:00
Chenghao Zhang
18fbda5cdb
[None][feat] AutoDeploy: Add A_log fusion for Mamba layers ( #9422 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-26 14:39:20 -08:00
Chenghao Zhang
bc7b60e016
[None][feat] AutoDeploy: Remove redundant copies in mamba layers ( #9461 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-26 14:38:33 -08:00
yuanjingx87
356f67c1cb
[None][infra] Fail the pipeline when slurm ssh dropped ( #9157 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-26 09:35:04 -08:00
yuanjingx87
d7ef8849d2
[None][infra] Update allowed list 2025.11.25 ( #9468 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-26 09:32:05 -08:00
Aurelien Chartier
ef7ee6a940
[None][feat] Add environment variable to force spec-dec number of accepted tokens ( #9371 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-11-26 07:22:16 -08:00
Chang Liu
b10137fdd5
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model ( #9376 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-26 16:38:25 +08:00
Enwei Zhu
1bf2d750a2
[None][chore] Upgrade CuteDSL to 4.3.0 ( #9444 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-26 14:53:09 +08:00
JunyiXu-nv
b7308a4000
[ https://nvbugs/5580099 ][fix] Cherry pick IMA issue fix from release/1.1 ( #9032 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-26 13:09:06 +08:00
Wanli Jiang
d100599ea7
[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm ( #9246 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-26 11:12:35 +08:00
TensorRT LLM
b04421e5ba
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-26 03:08:38 +00:00
shuyixiong
d8acea1db3
[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights ( #9224 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-26 10:59:06 +08:00
QI JUN
5972119e1c
[None][ci] move some slow test cases of DGX-B200 to post merge ( #9467 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-26 10:48:53 +08:00
fredricz-20070104
6a64cb4c71
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases ( #9356 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-26 10:34:49 +08:00
Yiqing Yan
1b9edf62c9
[None][chore] Bump version to 1.2.0rc5 ( #9455 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-26 08:37:53 +08:00
Chuang Zhu
0e9c7f8c07
[ https://nvbugs/5685143 ][fix] avoid cudaFree overlap with cuda graph ( #9438 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-25 16:20:29 -08:00
Suyog Gupta
e484bec82f
[None][chore] AutoDeploy add multi stream moe pass to default.yaml ( #9430 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-25 14:16:13 -08:00
Robin Kobus
32f53910ef
[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode ( #9308 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-25 22:11:51 +01:00
Eran Geva
afc52d7b93
[ https://nvbugs/5647400 ] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. ( #9145 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-25 10:56:07 -08:00
mpikulski
899fda9e47
[TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs ( #9457 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-25 18:53:53 +01:00
mpikulski
c5f52ab304
[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) ( #9411 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-25 18:46:48 +01:00
Fanrong Li
8da59103d6
[ https://nvbugs/5680905 ][fix] Relax the MMLU accuracy requirement for DS-v3.2 ( #9439 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-26 00:32:20 +08:00
Yan Chunwei
1f43dc8174
[None][ci] waive a test ( #9458 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-25 07:04:20 -08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
Pengyun Lin
fa61825c74
[None][feat] Support custom chat template for tool calling ( #9297 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-11-25 22:07:04 +08:00
Tailing Yuan
51ef0379d2
[None][feat] Add a parser to layer-wise benchmarks ( #9440 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-25 05:45:16 -08:00
Fanrong Li
c36f144591
[None][chore] Fix trtllm-eval for PyTorchLLM ( #9427 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-25 04:49:03 -08:00
Shi Xiaowei
60786574db
[None][fix] Mitigate test timeout issues ( #9445 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-25 20:17:54 +08:00
Chao Ni
a2d9e6250a
[ https://nvbugs/5667922 ][fix] Update long context evaluation config ( #9426 )
...
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
2025-11-25 19:33:38 +08:00
Yueh-Ting (eop) Chen
a38d91aae2
[ https://nvbugs/5537996 ][fix] Let KV cache manager block initialization be aware whether it is doing a dry run or not ( #9093 )
...
Before this commit, the kv cache manager does the same regardless, which causes a mis-calculation in free memory available to allocate for the KV cache manager, hence causing a crash.
This commit fixes this by letting KV cache manager initialization be aware whether it is doing the dry run or not. If it is a dry run, use the max_tokens setting that is already pre-calculated and filled into kv_cache_config.max_tokens.
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-25 17:27:11 +08:00
Anthony Chang
4742c130db
[None][feat] Improve TRTLLM MoE in small hidden size throughput cases ( #9377 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-25 09:09:27 +01:00
Yanchao Lu
ff02e0f05c
[None][ci] Move more test stages to use OCI machines ( #9395 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Matt Lefebvre <matthewelefebvre@gmail.com>
2025-11-25 15:59:13 +08:00
Eran Geva
6af01dc664
[ #8391 ][chore] test_perf.py to lock clocks read from gpu_configs.yml instead of max freq ( #9409 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-25 09:20:33 +02:00
Emma Qiao
15616e3ee5
[None][infra] Waive failed cases for main branch on 11/25 ( #9429 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-24 23:18:15 -08:00
Yukun He
e580da4155
[TRTLLM-7963][feat] Cold L2 cache when doing autotune benchmarking. ( #8779 )
...
The performance results of some kernels could be easily affected by the warm/cold L2 cache status. To achieve more precise profiling results, the L2 cache is cleared for every execution by the circular buffer method for better benchmarking during autotuning.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-25 15:06:22 +08:00
William Zhang
a4049fc557
[ #9413 ][fix] Minor fixes to nemotron H and custom models in AD ( #9416 )
...
* Why?
There were a couple of issues with the recently merged custom model
injection for AutoDeploy + the reference implementation of nemotron
H:
- `d_mlp` was left in despite being mathematically always null (could
lead to runtime issues during sharding).
- the custom model mapping was inherited by children factories.
* What?
This commit fixes these issues, and refactors the key of the custom
implementation to be based on the name of the configuration class as
well.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-24 20:17:33 -08:00
Suyog Gupta
efd503751f
[ #9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy ( #9322 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-24 19:50:10 -08:00
kris1025
d1c724958d
[None][chore] unwaive ampere kernels test ( #9389 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-11-25 11:28:43 +08:00
TensorRT LLM
bf0d1dc6a8
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-25 03:21:14 +00:00
xinhe-nv
0a9ae2e3e6
[None][chore] Remove closed bugs ( #9381 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-24 18:49:57 -08:00
Yuxian Qiu
8a0295015f
[None][chore] Reduce nested nvtx ranges. ( #9347 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-25 09:58:41 +08:00
Fanrong Li
5a99c9734d
[TRTLLM-8777][feat] Update DeepGEMM to the latest commit to include optimizations for DeepSeek-v3.2 ( #9380 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-25 08:58:08 +08:00
QI JUN
786d308b88
[ https://nvbugs/5685428 ][fix] fix test_openai_chat_multimodal.py ( #9406 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 16:56:33 -08:00
bhsueh_NV
1a93583438
[None][feat] Support Yarn on QwQ-32B model ( #9059 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>
2025-11-25 07:27:28 +08:00
Yibin Li
1ce483c999
[TRTLLM-7967][feat] Adding Starcoder2 PyTorch Backend Support ( #8923 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-11-24 11:23:22 -08:00
YueWeng
336593cac5
[None][fix] Fix topk outIndices when using vectorized_process ( #9404 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-24 09:08:00 -08:00
Chuang Zhu
f95edb53e1
[None][fix] enhance warning in cacheTransBuffer ( #9390 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-24 02:17:54 -08:00
Emma Qiao
2c869f2bda
[None][infra] Waive failed cases for main ( #9400 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-24 17:42:19 +08:00
cheshirekow
6e5384d03c
[TRTLLM-9299][infra] Add third-party docs for python ( #9366 )
...
In this change we rename 3rdparty/README.md (which contains the process
playboook for C++ dependencies) to 3rdparty/cpp-thirdparty.md and add a
new 3rdparty/py-thirdparty.md file which contains the process playbook
for python dependencies.
We also update the main 3rdparty/README.md file to serve as a
starting-point referring to both of these files.
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-23 22:58:25 -08:00
cheshirekow
2810be7b3b
[TRTLLM-9211][infra] Minor fixes to 3rdparty/CMakelists ( #9365 )
...
This change addresses the nitpick comments from coderabbit on the
previous pull request !8986 . None of the changes appear to be critical
as the build is healthy without them, but they should provide some
protection against future breakages if we change CMake version or
or modify other build logic.
This change consists of the following:
1. Add GIT_SUBMODULE_RECURSE ON to FetchContent_Declare calls for
deepgemm and flashmla to ensure submodules are initialized in
cmake versions where it is not the default.
2. Modify error messages in deep_gemm and flash_mla CMakeLists to
indicate that submodule initialization failed if the expected
submodule directories are not present.
3. Remove the NVTX include directories if the build is configured
with NVTX_DISABLE off, to avoid potential confusions if NVTX is
included on the compile commands when disabled.
4. Fix a minor CMake syntax issue in cpp/CMakeLists.txt where a
message() call was missing parentheses around a string.
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-23 22:57:02 -08:00
Emma Qiao
af72d93fa9
[None][infra] Waive failed cases on main branch ( #9384 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-23 22:53:02 -08:00
Yukun He
960851f419
[None][chore] Remove unnecessary log in the short tuning profile ( #9387 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-24 12:31:26 +08:00
Yukun He
39076410a8
[ https://nvbugs/5676748 ][fix] Fix mismatched nvfp4 gemm sf shape. ( #9336 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-24 12:16:32 +08:00
brb-nv
c045e359a7
[ https://nvbugs/5637012 ][fix] Fix helix unit tests ( #9369 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-23 19:34:22 -08:00
TensorRT LLM
5a44994d05
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-24 03:17:53 +00:00
QI JUN
34a6d2d28f
[TRTLLM-9302][chore] Move build config from BaseLlmArgs to TrtLlmArgs ( #9249 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 10:54:41 +08:00
Yukun He
c3acf965a6
[TRTLLM-7963][fix] Several improvements of autotuning quality ( #9348 )
...
* Skip the shape profile generating process if the profile has already been found in the cache under tuning mode. This is a prerequisite for nested autotuning because host overhead might be included during the profiling of the high-level op.
* Enable the profiling with CUDA graph as the default profiling method.
* Apply a heuristic method to cut off the number of repeat times of profiling according to a few-run time measurement.
2025-11-24 10:38:45 +08:00
Bo Li
fcfec93cad
[TRTLLM-9389][chore] Rename AlltoAll backend names ( #9329 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-23 13:52:57 -08:00
Chenghao Zhang
e1c9aa7d6a
[None][chore] AutoDeploy: Add the Nemotron MOE to CI ( #9328 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-23 12:12:12 -08:00
JadoTu
0582e54b61
[None][fix] modify qwen3-next sampling stop_tokens ( #9331 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-23 21:10:09 +08:00
William Zhang
11a0b276fb
[ #9230 ][feat] Slimmed down implementation of nemotron H ( #9235 )
...
* Why?
The reference nemotron H code on HuggingFace is out of date,
and therefore bugged, and has several untested code paths.
This makes an already hairy patching system even hairier.
The proposal is to do away with those patches, and replace the
original implementation with one that is heavily slimmed down.
* What?
This PR sets the basis for an alternative path with such a
slimmed down implementation that:
- fixes bugs in the current HF implementation
- adds no new dependencies to TensorRT-LLM
- does away with unnecessary features for TensorRT-LLM/
AutoDeploy:
- no training related code (dropout, gradient checkpointing, etc.)
- no caching logic (we want to replace it with our own anyway)
- no attention masking where possible
- reuses existing AD custom ops for mamba SSM update /
causal conv1d / attention
In order for the above to be usable in the AD apparatus,
`AutoModelForCausalLMFactory` is extended to allow registrations
of custom model implementations.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-23 03:13:32 -08:00
Yan Chunwei
1ef69ecbb1
[None][ci] waive two ray tests ( #9375 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-23 15:39:01 +08:00
TensorRT LLM
a761585d9c
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-23 03:06:43 +00:00
dongfengy
268ea9bb8a
[None][test] Add one-model and overlap-scheduling to eagle tests for GPTOSS ( #9312 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-21 22:52:53 -08:00
TensorRT LLM
15ceba8705
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-22 03:18:03 +00:00
Matt Lefebvre
fefa02fa95
[TRTINFRA-7326][infra] - Consume SlurmCluster sshPort for clusters with custom SSH port ( #9313 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-11-21 18:58:00 -08:00
Neta Zmora
3952a61681
[ #9388 ][fix] AutoDeploy: Fix cutlass BF16 MoE kernel invocation ( #9339 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-21 17:05:03 -08:00
Chenghao Zhang
564989865c
[TRTLLM-9082][feat] AutoDeploy: Move the moe Align kernel to AOT ( #9106 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-21 16:05:48 -08:00
Izzy Putterman
eb7792e875
[None][feat] Eagle: PostNorm and multilayer options ( #9233 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-11-21 17:39:00 -05:00
Enwei Zhu
13fbd4366a
[TRTLLM-9370][feat] Integration of CuteDSL NVFP4 grouped GEMM (Part 2: SwiGLU Fusion and Finalize Fusion) ( #9288 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-21 14:03:38 -08:00
cheshirekow
9b2abb8d28
[TRTLLM-9208][infra] Document the process for C++ deps ( #9016 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-21 09:22:11 -08:00
Ziyi Xiong
5df907b388
[ https://nvbugs/5590408 ][fix] Fallback to greedy sampling in two-model overlap scheduler ( #9321 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-21 10:19:59 -05:00
Nikita Korobov
f2ebaf288a
[None][feat] TRT-LLM Gen MoE optimize DeepSeek Fp8 activation kernel ( #9175 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-11-21 15:35:00 +01:00
HuiGao-NV
6dd2fcd7b3
[ https://nvbugs/5629833 ][fix] Don't fill tensors with 0 ( #9296 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-21 20:50:05 +08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve ( #9269 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
mpikulski
095b6864a8
[TRTLLM-8650][fix] beam search request validation ( #8433 ) ( #9228 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:08:45 -08:00
Yiqing Yan
8cd3b496e9
[None][chore] Bump version to 1.2.0rc4 ( #9363 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-21 18:28:12 +08:00
Emma Qiao
041564188c
[None][infra] Waive failed cases in main post-merge on 11/21 ( #9360 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-21 18:01:53 +08:00
QI JUN
b6483ef3e7
[None][ci] waive a test case of test_ad_build_small_multi.py ( #9355 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-21 16:25:04 +08:00
Ivy Zhang
28e9bf6167
[None][chore] add periodic junit xml path in conftest ( #9337 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-20 22:46:25 -08:00
xxi
cc0dc7c124
[TRTLLM-8957][feat] create communication related classes ( #8968 )
2025-11-20 22:32:42 -08:00
Yiqing Yan
2a27166b59
[TRTLLM-9183][infra] Add --waives-file in rerun pytest command ( #8971 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-21 13:40:45 +08:00
Zhanrui Sun
5138ef3227
[None][infra] Add fallback when get wheel from build stage is fail ( #9290 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-21 13:26:20 +08:00
QI JUN
e2a372a3b1
[None][ci] waive test_llm_context_only_timed_out_kv_cache_exhausted ( #9351 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-20 20:20:57 -08:00
TensorRT LLM
39e641872c
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-21 03:19:55 +00:00
Yingge He
b5863ed1e2
[TRI-332] [fix] Fix L0_backend_trtllm ( #9282 )
...
Signed-off-by: Yingge He <yinggeh@nvidia.com>
2025-11-20 18:55:37 -08:00
cheshirekow
1379cfac3a
[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile ( #8986 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-20 16:44:23 -08:00
Kanghwan
b1c9936c36
[None][infra] Update goggles_action repository ( #9240 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-11-20 13:32:32 -08:00
tburt-nv
f8dd52621d
[None][chore] Upgrade starlette and FastAPI ( #9319 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-11-20 11:21:14 -08:00
Mike Iovine
69b4e52757
[None][chore] Update linter rules for mass integration
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Barry Kang
a3433dd54e
[ https://nvbugs/5325296 ][fix] Enable relaxed acceptance test on Blackwell ( #8709 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Zhanrui Sun
62e20a5441
[None][infra] Remove invaild waived tests which not in release branch ( #8841 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jin Li
6185225501
[ https://nvbugs/5488118 ][fix] Unwaive passed tests ( #8758 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Dom Brown
0c8de1f45d
[ https://nvbugs/5575841 ] [test] Move test_moe.py to serial tests to improve stability + unwaive FP4 MoE torch unit tests ( #8422 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
xiweny
05aabfbc1e
[ https://nvbugs/5601203 ] [fix]Restrict fp8 blockscale moe case ( #8583 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Chuang Zhu
8846dac9b4
[ https://nvbugs/5578175 ][fix] Fix block range index ( #8470 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Pengyun Lin
eca68e4465
[ https://nvbugs/5564465 ][fix] Overwrite only if default_max_tokens is legal ( #8538 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Eran Geva
3d66e56adb
[ https://nvbugs/5572320 ][fix] Ported test_ad_trtllm_bench.py from main ( #8671 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yukun He
9a79f32f7a
[ https://nvbugs/5608489 ][fix] Fix output unpack issues for Llama3/4 NVFP4 models. ( #8679 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Ivy Zhang
25c0624750
[None][test] Clean cache for certain easily hang cases ( #8619 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jie Li
36e244f35e
[ https://nvbugs/5587456 ][fix] Remove multimodal test cases using TRT backend ( #8611 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
348668e3ae
[ https://nvbugs/5575902 ][fix] set max_batch_size=1 to stabilize accuracy test result ( #8609 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Lizhi Zhou
33b0b945c7
[ https://nvbugs/5582277 ][fix] rework DisaggPPTerminationHandler to fix hang issue ( #8519 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yan Chunwei
b5f9fff1c1
[ https://nvbugs/5569754 ][fix] trtllm-llmapi-launch port conflict ( #8582 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Pengyun Lin
81fd9be87d
[ https://nvbugs/5575829 ][fix] Unwaive gpt-oss test ( #8576 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Bo Deng
4ca6fe83d8
[ https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… ( #8500 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Jin Li
3454eacd74
[ https://nvbugs/5546510 ][fix] Move torch.cuda.Stream out of torch com… ( #8494 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Guoming Zhang
af3900a195
[ https://nvbugs/5504095 ][fix] Unwaive test_user_specify_workspace case. ( #8316 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Simeng Liu
9286223288
[ https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. ( #8440 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
JunyiXu-nv
ee6944bfa2
[ https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 ( #8429 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
yufeiwu-nv
0e746fad45
[ https://nvbugs/5667454 ][test] Fix Test Case as Chunked Attention not Supported on sm_120 ( #9260 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-20 00:58:42 -08:00
Liao Lanyu
04ad9f96fa
[ https://nvbugs/5667687 ][fix] Set correct lm_head_tp_size_upper_bound ( #9300 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-20 00:41:00 -08:00
Neta Zmora
1d6fbbf45d
[ #9236 ][feature] Make sharing of activation_type across SW layers more robust ( #9238 )
...
C++, Python and Python MoE layer all share the definition of ActivationType.
Currently this is done thru redefinition which is fragile and can break when adding new activation function types.
tensorrt_llm/_torch/utils.py
cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h
=>
tensorrt_llm/layers/moe.py
cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-20 16:06:58 +08:00
Emma Qiao
b018b2698d
[TRTLLM-9164][infra] Enable checking duplicate items in waives.txt in pre-commit ( #9265 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-20 15:47:23 +08:00
mpikulski
a39e8c5567
[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema ( #9305 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-20 08:32:23 +01:00
Yukun He
5d118e0326
[None][chore] Revise the description of enable_autotuner. ( #9320 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-19 22:59:37 -08:00
QI JUN
1bdd3ba173
[None][ci] waive test_disagg_server_restart ( #9326 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-19 22:34:03 -08:00
Yechan Kim
d5622b2689
[None][fix] Multimodal InputProcessor dummy builder fix ( #8916 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-19 22:32:21 -08:00
Chang Liu
79a6c9742b
[None][fix] Use fp32 for indexer weight_proj GEMM ( #9243 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-19 21:52:38 -08:00
Neta Zmora
028fc877a5
[ #9096 ][feature] Auto Deploy: configurable fused MoE backend ( #9194 )
...
Allow configuring Auto Deploy's MoE/FP8-MoE backend from external yaml config file.
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-19 21:50:22 -08:00
Chenghao Zhang
cd44f80abd
[ #9316 ][feat] AutoDeploy: Add the accuracy test for Nemotron MOE models ( #9317 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-19 21:48:50 -08:00
TensorRT LLM
3004692949
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-20 03:37:32 +00:00
Bo Deng
2128f73d58
[TRTLLM-9247][infra] Upgrade NIXL to 0.7.1 ( #9055 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-20 11:01:02 +08:00
JunyiXu-nv
46dccb5e2d
[None][chore] Prevent negative max_tokens passed into tllm request ( #9037 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-20 09:58:13 +08:00
Yukun He
b6bced83c0
[TRTLLM-7963][feat] Use CUDAGraph to improve the tuning accuracy for AutoTuner. ( #9089 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-20 08:54:29 +08:00
Kanghwan
41e5870a70
[ #8476 ][chore] Update license ( #8807 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-11-19 15:05:25 -08:00
Fanrong Li
d4abb86f3e
[None][fix] fix EPLB for DeepSeek-V3.2-Exp ( #9245 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-19 13:45:54 -08:00
brb-nv
f6ec6e2222
[None][chore] Waive tests timing out on main ( #9315 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-19 13:10:06 -08:00
Faraz
49c45ebef1
[None][fix] change logging for weight loading on unified memory ( #9177 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
2025-11-19 14:31:19 -05:00
NVShreyas
1eae941d77
[ #9237 ][feat] enable iter stats in autodeploy ( #9278 )
...
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2025-11-19 19:29:29 +01:00
NVShreyas
a7c0b54ce7
[None][feat] add specdec to nemotron nas ( #8985 )
...
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2025-11-19 19:28:35 +01:00
Neta Zmora
7ab02ad7b5
[None][feature] AutoDeploy: tighter MoE UT thresholds ( #9195 )
...
Scale down the weights in the MoE test so that the output has reasonable magnitude, allowing for tighter atol and rtol
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-19 08:37:51 -08:00
Bo Li
d8b05894ee
[None][perf] Adjust select_alltoall_method_type. ( #8950 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-19 07:43:55 -08:00
mpikulski
46dd9886bb
[ https://nvbugs/5661877 ][fix] fix test regression in TestBatchedSampling::test_samples ( #9215 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-19 01:44:44 -08:00
xinhe-nv
0f77fec932
[None][chore] Add failed cases into waives.txt ( #9289 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-19 17:03:43 +08:00
CarstyYou
ee941ac779
[ https://nvbugs/5456493 ][feat] add fp8 dense for sm120 ( #9174 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-11-19 14:40:34 +08:00
nvxuanyuc
a79c0dfb43
[None][fix] Update GLM model accuracy test ( #9286 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 21:59:01 -08:00
jiahanc
255e4ea9f0
[None][doc] Update DS-R1 example doc ( #9231 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-18 21:10:02 -08:00
Emma Qiao
67d3eb26af
[None][infra] Waive failed cases for main branch on 11/17 ( #9266 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-18 20:07:03 -08:00
ChristinaZ
941a54c66a
[None][feat] Update the indexer topK ( #9255 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-19 11:49:00 +08:00
xinhe-nv
286ace22ed
[None][chore] Add failed cases into waives.txt ( #9242 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 19:27:55 -08:00
TensorRT LLM
9135d580bf
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-19 03:25:00 +00:00
jellysnack
99ba723e20
[None][fix] logits device and shape issues in dynamic draft path ( #9079 )
...
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
2025-11-18 19:22:47 -08:00
Ivy Zhang
782dfca7e8
[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error ( #9172 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 18:26:32 -08:00
Grzegorz Kwasniewski
7905d6c0da
[ #9098 ][feat] Simple sharding latent experts ( #9099 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-18 21:14:22 -05:00
ChristinaZ
fbf6c16cd2
[None][fix] Update the default invalid value for deepseek mode of routing ( #9222 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-19 10:14:06 +08:00
Grzegorz Kwasniewski
92f86a50d4
[ #9137 ][feat] Factory sharding as default ( #9144 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-18 21:12:03 -05:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted ( #9155 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
xinhe-nv
35658eab55
[None][chore] Add failed cases into waives.txt ( #9193 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-18 17:47:55 -08:00
Enwei Zhu
7c4777a571
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM ( #8880 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-18 17:40:12 -08:00
Lizhi Zhou
c789000a62
[ https://nvbugs/5649010 ][fix] increase status-checking interval to avoid instability ( #9203 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-19 08:55:42 +08:00
Bo Deng
34f845bf69
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests ( #9247 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-11-18 14:46:20 -08:00
Ajinkya Rasane
8d7cda2318
[None][chore] Update the Flux autodeploy example ( #8434 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-11-18 14:16:04 -08:00
Ziyi Xiong
7c4344b92e
[ https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv ( #9210 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-18 15:41:56 -05:00
Eran Geva
3ac11a6180
[ #9152 ][fix] AutoDeploy fused_allreduce_residual_rmsnorm to support demollm mode ( #9197 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-18 22:15:29 +02:00
Chenghao Zhang
f0b68e4c66
[None][feat] AutoDeploy: Perf improvement for small batch size ( #9163 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-18 12:11:12 -08:00
Nikita Korobov
fe569f0594
[None][feat] bias for FP4 TRT-LLM Gen MoE ( #9220 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-11-18 09:59:47 -08:00
mpikulski
04fb481da3
[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding ( #9178 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-18 09:41:59 -08:00
Gal Hubara-Agam
36d3d8f608
[None][chore] Print device info in trtllm-bench report ( #8584 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-11-18 09:00:10 -08:00
Kaiyu Xie
d076aa44d3
[None] [tests] Unwaive wide ep related tests ( #9204 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-18 08:54:46 -08:00
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). ( #8194 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
Ivy Zhang
160b361588
[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check ( #9088 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 05:55:00 -08:00
Robin Kobus
9913dc25ae
[None][refactor] decoding inputs, part 2 ( #5799 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-18 14:38:51 +01:00
Ivy Zhang
ca41a71f92
[TRTLLM-8948][test] Add long bench case ( #9165 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-11-18 04:41:48 -08:00
Chang Liu
8e001dd195
[None][fix] DeepSeek V3.2 indexer RoPE fix ( #9232 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-18 20:35:27 +08:00
Lizhi Zhou
07343bb11c
[None][chore] fix a deepseekv3 error when debug mode is on ( #9217 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-18 01:14:32 -08:00
ruodil
82480346aa
[ https://nvbugs/5652552 ][fix] add printing for llm args ( #9205 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-11-17 23:58:36 -08:00
Zero Zeng
43896af1b1
[None][chore] benchmark refactor ( #9207 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-17 23:29:28 -08:00
Stanley Sun
96cfdd8a72
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme ( #9173 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-17 22:02:24 -08:00
Gal Hubara-Agam
5e5300898b
[ #8732 ][feat] Add ReLU2 to TRTLLM Cutlass MoE BF16 kernels ( #9191 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-11-17 20:30:00 -08:00
TensorRT LLM
fd9916424f
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-18 03:23:16 +00:00
Tri Dao
fc088e642c
[None][feat] Support Glm4MoeForCausalLM ( #8256 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 09:43:21 +08:00
QI JUN
c3376fa114
[None][ci] split speculative test case into several small cases ( #9209 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-17 17:02:25 -08:00
Lucas Liebenwein
6d0a8edbbb
[None][chore] local imports for AutoDeploy in serve and bench ( #9199 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-18 08:14:32 +08:00
zackyoray
e3c9a97075
[None][feat] Add TRTLLM_NIXL_KVCACHE_BACKEND environment variable for NIXL backend selection ( #9075 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-11-17 15:39:55 -08:00
TensorRT LLM
2d6289b4b4
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-17 22:26:06 +00:00
yuanjingx87
ec36a3af7e
[None][infra] Fix lock file generation script ( #9180 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-17 11:53:56 -08:00
Matt Lefebvre
470d777744
[TRTINFRA-7280][infra] Support enroot/pyxis clusters in multi-node SLURM and enable oci-hsg GB200 in post-merge ( #9117 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-11-17 10:59:30 -08:00
Robin Kobus
df41f220a2
[TRTLLM-8831][feat] Enable early exit with overlap scheduler ( #8587 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-17 18:07:13 +01:00
Mike Iovine
6151a4c9d6
[None][feat] Add simple optimizations for MTP 2-model ( #9176 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-17 10:05:39 -05:00
Yiqing Yan
24f5cd7493
[TRTLLM-8000][infra] Catch error in merge waive list stage ( #7289 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-17 13:28:50 +08:00
Kaiyu Xie
04be5a704e
[None] [fix] Fix missing ActivationType issue ( #9171 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-17 10:43:25 +08:00
Anthony Chang
86cfb3ea7e
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound ( #9025 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-17 10:04:29 +08:00
Jinyang Yuan
6dc70aa0e5
[ https://nvbugs/5613089 ][fix] Fix the rank to access all_rank_chunk_size_list when chunked MoE is used ( #8723 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-11-17 10:01:08 +08:00
Emma Qiao
d16b1a84c5
[None][infra] Waive a failed case in pre-merge stage 11/16 ( #9192 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-17 09:36:56 +08:00
sunnyqgg
7862b15a65
[TRTLLM-8778][feat] Add tree attention support for blackwell arch ( #8975 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-11-17 09:01:53 +08:00
Guoming Zhang
e0f69657c7
[None][fix] Update the attention layers counting for Qwen3-next. ( #9072 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-16 11:52:56 -08:00
Emma Qiao
2854f0cf3d
[None][infra] Waive failed tests for main branch 11/15 ( #9187 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-16 01:48:25 -08:00
brb-nv
63237494db
[None][chore] Waive failing tests blocking pre-merge ( #9189 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-16 01:06:03 -08:00
JadoTu
3cde84581d
[None][fix] Make the sliced nvfp4 output contiguous ( #9123 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-15 20:00:54 +08:00
Thor Johnsen
64cd91ae0a
[None][infra] Add trt-llm-kv-cache-manager-devs as code owner for appropriate files ( #9182 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-11-15 16:46:14 +08:00
Erin
fe69243157
[None][chore] Add placement test for ray executor ( #9122 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-14 23:10:59 -08:00
Zhanrui Sun
bdcf837784
[TRTLLM-9079][infra] upgrade tritonserver DLFW 25.10 ( #8929 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-14 20:22:10 -08:00
yuanjingx87
83122bfd64
[None][infra] Update allowlist 2025.11.14 ( #9183 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 16:29:26 -08:00
yuanjingx87
73b8783903
[None][infra] Fix medata.json generated by lock file genreation pipeline ( #9179 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 12:28:20 -08:00
TensorRT LLM
cbabdae57d
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-14 18:54:51 +00:00
yuanjingx87
05b5336ab6
[None][infra] Lock generation pipeline update ( #9084 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 10:12:25 -08:00
Chang Liu
bed4e95e9f
[ https://nvbugs/5629887 ][fix] Add missing device count guard for DSv32 multiGPU tests ( #9159 )
2025-11-14 07:52:23 -08:00
xinhe-nv
49b7e6301a
[None][chore] Add failed cases into waives.txt ( #9156 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-14 06:28:22 -08:00
mpikulski
80bf840e69
[TRTLLM-9295][fix] unflake test_overlap_scheduler.py::test_overlap_scheduler_consis… ( #9146 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-14 11:36:22 +01:00
yuanjingx87
d72321a32e
[None][ci] Waive unittest/_torch/sampler/test_torch_sampler.py::TestBatchedSampling ( #9161 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-14 01:49:26 -08:00
Chenghao Zhang
f6f6e1f25d
[ #9102 ][feat] AutoDeploy: Support fp8 kv cache ( #9107 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 23:55:45 -08:00
Zero Zeng
c6cce398f5
[TRTLLM-9053][feat] Support accuracy test and install from wheel ( #9038 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-13 23:34:47 -08:00
dongxuy04
84483a238a
[None][doc] update docs for EPLB ( #9166 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-13 22:24:29 -08:00
Fanrong Li
25bd2e6917
[None][doc] Add DeepSeek-V3.2-Exp document ( #9141 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-13 22:01:58 -08:00
Lizhi Zhou
8bd779171e
[ https://nvbugs/5631254 ][fix] avoid torch.compile for multiple times ( #9135 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-13 21:49:52 -08:00
TensorRT LLM
e90dbaf572
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-14 03:40:28 +00:00
Suyog Gupta
d12cb9436d
[None][feat] Autodeploy add triton configs and optimize mamba prefill ( #9083 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-13 19:15:43 -08:00
QI JUN
3c950910a0
[None][ci] waive test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0] ( #9162 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-13 18:56:37 -08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module ( #8682 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Tailing Yuan
cc4c980e03
[None][feat] Add Qwen3-Next to layer-wise benchmarks ( #9065 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-14 10:03:00 +08:00
JunyiXu-nv
fdb0787e85
[None][chore] Support json_schema in response_format ( #8934 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-14 09:43:13 +08:00
Erin
44d1c75701
[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server ( #8765 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-13 17:21:24 -08:00
Neta Zmora
34dc6869f3
[ #8732 ][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 ( #9011 )
...
Update TRTLLM Cutlass MoE kernels with ReLU2 activation.
Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function.
The PR adds this and adds an API to set the activation function, in general.
The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954 .
The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from
Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`).
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-13 16:54:45 -08:00
dongxuy04
a370643b26
[None][fix] support topk autotuner input for expert slot per group larger than 32 ( #9087 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-14 08:37:20 +08:00
Leslie Fang
daa31d78f4
[ https://nvbugs/5652552 ][fix] Log the llm args for main branch ( #9120 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-11-14 07:43:21 +08:00
Frida Hou
b51258acdd
[None][autodeploy] fix weight extraction for graph based quantized checkpoints ( #9109 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-13 13:14:24 -08:00
Frida Hou
e96a3d294d
[None][autodeploy] minor refactor to rmsnorm transforms ( #8657 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-13 13:13:58 -08:00
Jinyang Yuan
12f339f3bf
[None][fix] Fix the aux_stream in Llama4MinLatencyFusedMoE ( #9035 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-11-13 09:09:52 -08:00
Iman Tabrizian
9ef7eb70e0
[None][fix] Fix KV cache manager test warnings ( #9103 )
2025-11-13 07:23:04 -08:00
Ziyi Xiong
a7aaf50541
[TRTLLM-8084][feat] Enhance the overlap shceduler for two-model spec decoding ( #8706 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-13 10:20:16 -05:00
William Zhang
121140cfec
[None][fixes] Add tool call parsing fixes and Qwen3 coder parser ( #8817 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-11-13 04:34:38 -08:00
Kaiyu Xie
177ba7b0f1
[None] [fix] Disable UCC as WAR to MPI allgather issue before NGC PyTorch 25.12 upgrade ( #9126 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-13 02:25:30 -08:00
Lizhi Zhou
48a27c7bef
[ https://nvbugs/5633340 ][chore] unwaive test_auto_scaling.py::test_disagg_server_restart ( #9131 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-13 01:45:36 -08:00
Emma Qiao
d0ea417ec8
[None][infra] Waive failed tests for main 11/13 ( #9132 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-13 01:00:40 -08:00
xinhe-nv
548f5ce4bc
[None][fix] waive failed tests ( #9090 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 23:40:00 -08:00
xinhe-nv
8fa3c55c76
[None][chore] Remove closed bugs ( #9114 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-12 22:49:37 -08:00
ruodil
c86e36fe38
[None][test] add deepseek and qwen cases for rtx series ( #8839 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-11-12 22:28:02 -08:00
Chang Liu
c37924f37b
[None][fix] Clear indexer k cache reference before release cuda memory ( #9110 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-12 22:12:53 -08:00
HuiGao-NV
cde18c12da
[ https://nvbugs/5640873 ][fix] Move thop tests to pre-merge ( #9094 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-13 13:08:13 +08:00
Perkz Zheng
22c1748b80
[TRTLLM-8816][feat] add optimized trtllm-gen attention kernels on sm103 ( #9081 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-13 12:41:07 +08:00
Zhang Ge
49df731b96
[ #6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels ( #6917 )
...
Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-13 12:14:58 +08:00
Yan Chunwei
4fd93bdc2c
[None][ci] Waive test_llm_rpc and test_llm_rpc_streaming ( #9118 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-12 19:55:09 -08:00
cheshirekow
3ab24df815
[TRTLLM-9209][infra] Upgrade precommit-hooks to v6.0.0 ( #9097 )
...
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-11-12 19:52:34 -08:00
TensorRT LLM
fc5a28c1db
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-13 03:35:54 +00:00
Venky
c79b27851d
[None] [infra] Update CODEOWNERS for pre-commit-config.yaml ( #9108 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-11-12 19:33:16 -08:00
Yan Chunwei
8a8883bc73
[None][chore] Waive test_llm_rpc_streaming ( #9113 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-13 11:06:26 +08:00
QI JUN
d1b003d31e
[TRTLLM-9212][chore] move MoeLoadBalancerConfig to llm_args.py ( #9002 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-13 10:47:35 +08:00
Zhenhuan Chen
943b05e2d3
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number ( #9003 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-13 10:34:17 +08:00
QI JUN
3416efbc29
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill ( #9111 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-13 10:06:32 +08:00
Chenghao Zhang
f1d637ec69
[None][fix] AutoDeploy: Use tmp folder for the load_moe_align ( #9101 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-12 14:59:49 -08:00
dongxuy04
9241ccaf27
[None][feat] Enable EPLB for trtllm-gen and cutlass backend ( #8886 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-11-12 12:30:27 -08:00
Chenghao Zhang
5f26c31954
[ https://nvbugs/5636912 ][fix] AutoDeploy: Unwaive the test ( #9018 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-12 12:26:38 -08:00
Patrice Castonguay
8a751a0e56
[None][chore] Remove is_disaggregated param in executor request queue ( #9049 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-12 13:37:15 -05:00
Fanrong Li
780d4f9dc5
[None][feat] Add MTP>1 support for DS-v3.2 ( #9045 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-12 09:56:12 -08:00
Neta Zmora
53491ffdb1
[ #9023 ][feat] reduce AD graph optimization time for non-participating passes ( #9024 )
...
Shorten AD graph optimization by 30% (measured on Nemotron-6):
A bug in the transformation interface marked all passes as not clean, regardless of what was reported by the transformation
Fix how the optimization passes report the results of their actions. Many passes report that the graph is not clean even when they didn't participate in the optimization. Each graph cleaning invocation can take several seconds.
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-11-12 09:05:53 -08:00
Iman Tabrizian
cdde15b275
[TRTLLM-8540][feat] Add support for disagg in DSv3.2 ( #8735 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-11-12 08:21:11 -08:00
mpikulski
264d38e6c5
[TRTLLM-9175][test] ensure sampling is async ( #9076 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-12 15:27:52 +01:00
yufeiwu-nv
b7a2574c60
[ https://nvbugs/5568991 ][test] Remove Phi-3 models ( #9066 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-12 03:16:36 -08:00
Timothy Gao
96132b4274
[None] [doc] Add Mixed Precision Context and Generation section to Disagg ( #8769 )
...
Signed-off-by: Timothy Gao <35588167+timothygao8710@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-11-11 23:46:12 -08:00
QI JUN
4003dc7574
[None][ci] waive some test cases of disaggregated serving ( #9085 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-12 15:06:21 +08:00
Emma Qiao
bb6eb9510d
[None][infra] Waive a failed case of disaggregated/test_disaggregated.py ( #9074 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 19:38:32 -08:00
Zhanrui Sun
0b25d240a1
[TRTLLM-9018][infra] add mirror for Build-Docker-Images stage ( #9063 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-12 11:38:03 +08:00
TensorRT LLM
1af9b2ec6a
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-12 03:26:28 +00:00
Jiagan Cheng
1a56722697
[None][fix] Remove unnecessary attention workspace memory check ( #9064 )
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-11-12 11:18:50 +08:00
QI JUN
fd703fbb7b
[None][ci] run speculative unit tests serially ( #9080 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 19:06:44 -08:00
Chang Liu
0b81173efa
[TRTLLM-9259][perf] Use torch.compile to fuse copy + layernorm within the LayerNorm module ( #9052 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-11-11 18:11:00 -08:00
Lucas Liebenwein
aca56097cb
[None][fix] AutoDeploy: update nano3 accuracy test ( #9061 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-11 12:26:31 -08:00
QI JUN
524754b6fd
[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner ( #7572 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 10:13:45 -08:00
Chenghao Zhang
ec9cf715a2
[None][feat] AutoDeploy: Perf improvement for mamba layers ( #8991 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-11 08:27:07 -08:00
Wanli Jiang
ebdd1cc8e0
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm ( #8840 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-11-11 07:48:23 -08:00
mpikulski
20fd305bb6
[None][fix] type annotation ( #9071 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 07:20:20 -08:00
mpikulski
b151de4a8f
[TRTLLM-8377][test] unit tests for TorchSampler batched sampling ( #9012 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-11 07:16:42 -08:00
Guoming Zhang
b894dc2d70
[None][fix] Display the GPU memory information in GiB unit. ( #9070 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-11 06:24:59 -08:00
mpikulski
979b3ae9ce
[TRTLLM-7723][feat] sampling using FlashInfer.sampling ( #8581 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 03:21:19 -08:00
HuiGao-NV
23c388c58b
[ https://nvbugs/5616189 ][fix] Make more cases use local cached models ( #8935 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-11 03:14:05 -08:00
Emma Qiao
22f1523f9e
[None][infra] Only print and don't fail the check if there are duplicated items in waives.txt ( #9068 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 03:04:59 -08:00
QI JUN
0ce22ce928
[None][ci] waive test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False] ( #9069 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-11 02:11:15 -08:00
elvischenv
62a30bca25
[None][chore] Add tensorrt_llm/scripts to .gitignore ( #8895 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-11-11 11:10:02 +01:00
Yiqing Yan
b7d51c5549
[None][chore] Remove duplicated waive test ( #9067 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-11 16:49:49 +08:00
Yuxian Qiu
7aeac97e4e
[ https://nvbugs/5622938 ][fix] Use async send_requests_to_next_pp. ( #9041 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-11 14:19:44 +08:00
Lucas Liebenwein
6bf4e59267
[ #8763 ][feature] AutoDeploy: configurable dtype for caching ( #8812 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-10 22:17:14 -08:00
jiahanc
de6088e363
[None][doc] update llama and llama4 example doc ( #9048 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-11-10 22:04:26 -08:00
Bo Deng
0b9bc5aae8
[None][infra] install mooncake in docker images ( #8447 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
Co-authored-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-11-11 13:34:27 +08:00
Emma Qiao
da1f0e2465
[None][infra] Waive failed tests on main 11/11 ( #9058 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-11 13:19:30 +08:00
xinhe-nv
fac522056c
[None][chore] Add failed cases into waives.txt ( #8998 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-11-11 12:40:59 +08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling ( #8988 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
TensorRT LLM
c61b44e594
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-11 03:36:08 +00:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py ( #8938 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
dongfengy
972c21c142
[None][chore] Clean up unused and confusing code in moe test ( #9019 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-11-10 18:52:21 -08:00
Liao Lanyu
1fd11455d8
[ https://nvbugs/5556998 ][fix] init_hf_modules in worker_main for models with trust_remote=true ( #8931 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-11 10:30:37 +08:00
Yechan Kim
0938a3ad2a
[ https://nvbugs/5644187 ][fix] Llava-Next MMMU bugfix and Phi4 test bugfix ( #9034 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-11 10:24:31 +09:00
Frida Hou
f40e1f7496
[ https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp ( #9047 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-10 16:25:58 -08:00
xiweny
50c486367a
[ https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm ( #9042 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-10 08:12:14 -08:00
mpikulski
edc91ba819
[None][fix] Improve type annotations on ResourceManager.get_resource_manager ( #9013 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-10 15:06:16 +01:00
ChristinaZ
2e7769d1e8
[None][feat] Add customized topk and related unit tests for DSA ( #8882 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-10 03:35:35 -08:00
xinhe-nv
f848d844d9
[None][chore] Add failed cases into waives.txt ( #9030 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-09 23:36:05 -08:00
bhsueh_NV
e8d4a56dd0
[None][fix] fix eagle3 accuracy issue on sm120 ( #8944 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-11-10 14:02:03 +08:00
Fanrong Li
a7033a9193
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 ( #8943 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-10 12:16:01 +08:00
Yiqing Yan
78fac1f665
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 ( #9006 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-10 10:34:06 +08:00
Bo Li
67af7c15a5
[ https://nvbugs/5637037 ][fix] Update unwaive list. ( #9001 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-10 08:53:07 +08:00
Emma Qiao
183778d58a
[None][infra] Waive failed tests for main 11/07 ( #9008 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-08 08:51:35 -08:00
Emma Qiao
2af6a537ad
[TRTLLM-8999][infra] Reduce gb200 multi-node test stages ( #8778 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-08 06:34:24 -08:00
mpikulski
533add5056
[TRTLLM-8598][feat] enable n > 1 in OpenAI API with PyTorch backend ( #8951 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 17:47:35 -08:00
hvagadia
6ff82ea24e
[None][feat] Allow env variable to specify spawn process IPC address ( #8922 )
...
Signed-off-by: hvagadia <hvagadia@nvidia.com>
2025-11-07 15:45:57 -08:00
yuanjingx87
748c56a036
[None][infra] Update allowed list 2025.11.06 ( #8987 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-07 12:02:38 -08:00
Chang Liu
7081f254cf
[None][perf] Add custom indexer k cache scatter op ( #8960 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 11:24:26 -08:00
Guoming Zhang
c232ffd122
[None][doc] Replace the relative links with absolute links in README.md. ( #8995 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-08 00:23:42 +08:00
Patrice Castonguay
d8ea0b967f
[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout ( #8892 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-07 07:33:51 -08:00
Yuxian Qiu
7b82ba90da
[ https://nvbugs/5629790 ][chore] unwaive test. ( #8967 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-07 18:41:32 +08:00
Zhanrui Sun
e53be1564a
[TRTLLM-9213][infra] Fix boost issue ( #8996 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-07 01:27:05 -08:00
Yiqing Yan
c836ae5aaa
[None][chore] Bump version to 1.2.0rc3 ( #9004 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-11-07 01:24:32 -08:00
mpikulski
1944fb15af
[None][fix] add missing CLI option in multimodal example ( #8977 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:06:08 +01:00
mpikulski
5ef65872a3
[None][fix] type annotations in fuse_input_embeds ( #8976 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-07 09:04:08 +01:00
Stefan Niebler
326a201473
[ https://nvbugs/5508536 ][fix] Take Over ( #8627 ): Reintroduce: Move stop_criteria to sample_async ( #7041 ) ( #8794 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-11-07 09:01:15 +01:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely ( #8856 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Lizhi Zhou
b26e1617f2
[ https://nvbugs/5633340 ][fix] kill processes properly after test ( #8970 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-11-06 21:45:38 -08:00
Eran Geva
990e674b71
[None][fix] Switch AD AllReduce strategy to NCCL ( #8979 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-07 06:49:44 +02:00
xiweny
ee20e679a9
[ https://nvbugs/5636986 ][fix] Fix DeepGemmMoe get_buffer calls ( #8939 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-06 19:57:19 -08:00
Cao Dong
b53961e972
[None][feat] Return logprobs incrementally in torch backend ( #8785 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-07 10:23:39 +08:00
Simeng Liu
9f8d93f89a
[ https://nvbugs/5606136 ][ci] Remove tests for deprecating triton multimodal models. ( #8926 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-11-06 17:58:42 -08:00
Chang Liu
1c19fd6868
[ https://nvbugspro.nvidia.com/bug/5637012 ][fix] Bugfix when config is None for MLA ( #8978 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 09:37:19 +08:00
jthomson04
fcae852cef
[None][fix] Fix KV cache clearing with KV Connector API ( #8750 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-11-06 14:28:27 -08:00
Chenghao Zhang
1a78e7a3d6
[None][feat] AutoDeploy: Support Latent MOE for Nemotron ( #8955 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-06 12:40:19 -08:00
dhansen-nvidia
ada93f1187
[ https://nvbugs/5527655 ][feat] Add NUMA-aware CPU affinity autoconfig ( #8805 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-11-06 11:59:46 -08:00
Chenghao Zhang
ddf2d010e2
[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear ( #8820 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-06 11:00:10 -08:00
DylanChen-NV
b275635a9a
[ https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill ( #8910 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-11-06 07:41:21 -08:00
shuyixiong
c73efe12e7
[None][chore] Use cached model in all ray tests ( #8962 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-06 15:14:15 +01:00
Fanrong Li
d246f62868
[ https://nvbugs/5630345 ] [chore] skip deepseek-v3.2 fp8 kv tests on pre-Blackwell architectures ( #8973 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-06 03:41:37 -08:00
yunruis
51545560da
[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation ( #8495 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-11-06 17:39:57 +08:00
Yilin Fan
b7798bfab8
[None][feat] Add trtllm_ prefix for exposed metrics ( #8845 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-11-06 15:27:18 +08:00
xinhe-nv
e822184cd7
[None][feat] add waive by sm version ( #8928 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-05 19:20:43 -08:00
TensorRT LLM
1c8c771974
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-06 03:16:35 +00:00
yuanjingx87
18a4b985f1
[None][infra] allow to choose repo when generate lock files ( #8659 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-05 19:06:29 -08:00
Yi Sun
cc12d33393
[None][feat] Deep Research Implemented with Scaffolding ( #8452 )
...
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2025-11-06 10:33:28 +08:00
JadoTu
6bbb43f2b9
[None][feat] Add qwen3-next nvfp4 support ( #8526 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-06 09:45:44 +08:00
Lucas Liebenwein
7a552c450a
[ https://nvbugs/5606166 ][fix] AutoDeploy: unwaive test for use tuples for cudagraph shape lookup ( #8957 )
...
also updated test waive for another nvbug
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-05 16:27:00 -08:00
Frida Hou
fb7f9831d3
[ #8924 ][fix] Fix AutoDeploy pattern matcher for torch 2.9 ( #8920 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-05 13:29:20 -08:00
Lucas Liebenwein
b181568d6f
[TRTLLM-8201][feat] Nemotron H MoE Sharding ( #8744 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-05 12:35:29 -08:00
Perkz Zheng
222bc911cd
[None][feat] add swapsMmaAb sparseMla kernels ( #8913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-05 09:32:34 -08:00
Chang Liu
e57d83c5dc
[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt ( #8771 )
2025-11-05 07:57:09 -08:00
fredricz-20070104
fdd9e4fe00
[TRTLLM-7251][test] Get submit eplb slots empty key work ( #8945 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-05 05:21:02 -08:00
Fanrong Li
c2feed798a
[ https://nvbugs/5630345 ][chore] unwaive DS-v32 nvfp4 and fp8 tests ( #8887 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-05 03:49:23 -08:00
Chuang Zhu
595f78078c
[ https://nvbugs/5624367 ][fix] Fix disagg GPT-OSS test ( #8870 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-05 01:47:09 -08:00
Yiteng Niu
1ce83582f9
[None][infra] update github token name ( #8907 )
2025-11-05 00:55:28 -08:00
Yukun He
b9e5315dfb
[ https://nvbugs/5623960 ][fix] Fix the logger once key issue and further compress log in AutoTuner. ( #8873 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-05 15:25:43 +08:00