Mike Iovine
|
c53bc19f5e
|
[infra] Make test_chunked_prefill faster (#5248)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-17 04:19:47 +08:00 |
|
Simeng Liu
|
5c18160d27
|
chore: Waive CI failure. (#5252)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-06-16 20:47:05 +02:00 |
|
Izzy Putterman
|
e607768e45
|
Speculation: Draft Target in new FW (#4558)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-06-17 02:26:08 +08:00 |
|
tomeras91
|
cea5dd1e38
|
[TRTLLM-5835][feat] Optimized Mamba2Mixer prefill (#5128)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-06-16 16:29:17 +03:00 |
|
Yilin Fan
|
dd29063538
|
[feat] Add llm args to tune python gc threshold (#5141)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-06-16 17:45:22 +08:00 |
|
Tao Li @ NVIDIA
|
03f1a6a3d8
|
Update DeepSeek R1 perf numbers to latest release/0.20 results (#5235)
|
2025-06-16 17:42:13 +08:00 |
|
Ivy Zhang
|
64b7f04fdc
|
[test] split nemotron test cases from examples_test_list (#5238)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-06-16 16:36:33 +08:00 |
|
xinhe-nv
|
802f22cd12
|
test: [CI] Add failed cases into waives.txt (#5221)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-16 16:11:53 +08:00 |
|
Yiqing Yan
|
8445416c39
|
Waive L0 tests (#5233)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-16 15:19:03 +08:00 |
|
Robin Kobus
|
b6ca677741
|
refactor: remove decoder request from decoder interface (#5129)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-16 09:12:30 +02:00 |
|
Anthony Chang
|
4f9fa9f21d
|
feat: MoE trtllm backend kernel update (#5183)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-06-16 14:46:13 +08:00 |
|
Chuang Zhu
|
1d2b0d3d80
|
use file lock to avoid port conflict (#5123)
|
2025-06-16 14:15:37 +08:00 |
|
Robin Kobus
|
dda64166cd
|
refactor: Scheduling based on KV cache state (#4865)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-16 08:14:58 +02:00 |
|
Wanli Jiang
|
0acf23185e
|
[Stress test] Add DeepSeek-R1 stress test (#5033)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-06-16 11:54:31 +08:00 |
|
Tracin
|
ef3fdc8051
|
feat: Add w4a8_mxfp4_fp8 quantization recipe. (#4867)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-06-16 11:30:57 +08:00 |
|
Yi Zhang
|
9b616db13b
|
test: Add fixture to skip tests based on MPI world size (#5028)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-06-16 11:25:01 +08:00 |
|
ruodil
|
2848e012ae
|
test: add llama4 models for perf test (#5187)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-16 11:24:35 +08:00 |
|
ruodil
|
3d22f27063
|
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-16 11:23:20 +08:00 |
|
Enwei Zhu
|
babdd9ce06
|
test: Add json_mode_eval for guided decoding evaluation (#5179)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-16 10:03:55 +08:00 |
|
Yilin Fan
|
7a5e0fd300
|
[fix] Fix Llama4 min-latency import error (#5209)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-06-16 10:03:07 +08:00 |
|
Yan Chunwei
|
c84e41fd9d
|
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-15 17:51:56 -07:00 |
|
amitz-nv
|
109c426077
|
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130)
|
2025-06-15 18:54:04 +03:00 |
|
Fanrong Li
|
39bba63758
|
[TRTLLM-4983] feat: enable overlap scheduler between draft forwards (#4802)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-15 23:09:16 +08:00 |
|
qsang-nv
|
5a01ba5260
|
use cu for fmha_v2 (#4694)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
|
2025-06-15 18:40:44 +08:00 |
|
Omer Ullman Argov
|
4eade3ae33
|
[fix][test] Speedup Nemotron NAS unittests (#5202)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-15 11:26:03 +03:00 |
|
Fanrong Li
|
159ffc584e
|
fix: fix cuda graph max batch size for spec decoding cases. (#5076)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-15 14:57:28 +08:00 |
|
Kaiyu Xie
|
dce1dcc4f9
|
feat: Support post_proc for bench (#5122)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-15 13:02:38 +08:00 |
|
Enwei Zhu
|
63bc62ddf4
|
feat: Enable EPLB to existing MoE models (#5203)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-15 11:48:06 +08:00 |
|
Yuan Tong
|
6bce7337a9
|
perf: avoid dynamic import overhead in is_llm_response with duck typing (#5110)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-06-15 07:45:02 +08:00 |
|
ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
Aurelien Chartier
|
1389f5a4d3
|
feat: Add support for fp8 rowwise quantization (#4876)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
|
2025-06-14 06:37:48 -07:00 |
|
2ez4bz
|
dc52b67492
|
linting(python): Enable ruff on more files (wave 1/N) (#5140)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-06-14 19:19:34 +08:00 |
|
Tailing Yuan
|
0b60da2c45
|
feat: large-scale EP(part 7: DeepEP integration) (#4792)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-14 19:12:38 +08:00 |
|
Robin Kobus
|
443b2eb51f
|
refactor: Speculative decoding buffers (#5091)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-14 11:39:32 +02:00 |
|
yunruis
|
b99c5ce8c1
|
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
Signed-off-by: yunruis <yunruis@nvidia.com>
Signed-off-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
Signed-off-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
Co-authored-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
|
2025-06-14 17:36:22 +08:00 |
|
nv-guomingz
|
3b7b5a5ad5
|
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-14 14:23:13 +08:00 |
|
dongxuy04
|
97657bfda2
|
optimize memset before alltoall communication (#5188)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-14 10:49:47 +08:00 |
|
Aurelien Chartier
|
82e280f6f3
|
feat: add multi-node support for Triton with pytorch backend (#5172)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-13 13:27:58 -07:00 |
|
Enwei Zhu
|
5f2785fb90
|
fix: Fix waive list (#5205)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-13 23:33:23 +08:00 |
|
Yilin Fan
|
06342ffb4d
|
[feat] Implement model-agnostic one-engine eagle3 (#4778)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-06-13 08:11:41 -07:00 |
|
Mike Iovine
|
25aa3881d7
|
[nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len (#4879)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-13 11:06:36 -04:00 |
|
Perkz Zheng
|
3d87770e15
|
[https://nvbugspro.nvidia.com/bug/5295470] support headDim 256 for blackwell fmha kernels (#5164)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-06-13 23:01:01 +08:00 |
|
QI JUN
|
952f33dcad
|
CI: move all test cases of TensorRT backend into post merge (#5186)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-13 20:48:48 +08:00 |
|
Chuang Zhu
|
8e9937081d
|
ucxx only use ucp_feature_tag to aviod some issuse on some platform (#4994)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-06-13 19:14:25 +08:00 |
|
yunruis
|
e5be3a95b3
|
fix: fix license bug (#5200)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-06-13 18:58:15 +08:00 |
|
yunruis
|
e96d6863d8
|
add doc for open-sourced cutlass kernels (#5194)
Signed-off-by: yunruis
|
2025-06-13 18:51:27 +08:00 |
|
brb-nv
|
089be8912a
|
feat: Basic skeleton for Gemma3 VLM (#5108)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-06-13 17:27:04 +08:00 |
|
xinhe-nv
|
30d9d0fa71
|
test: [CI] Add failed cases into waives.txt (#5178)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-13 16:38:51 +08:00 |
|
nv-guomingz
|
b959618579
|
refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field from PytorchConfig (#5031)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-13 16:34:24 +08:00 |
|
yunruis
|
30c5b4183a
|
refactoring: port customized kernels with public cutlass version (#5027)
Signed-off-by: yunruis
Merge this to unblock others since the full CI has been run through
|
2025-06-13 16:19:31 +08:00 |
|