Commit Graph

1735 Commits

Author SHA1 Message Date
DylanChen-NV
74dca0aa7b
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-09 23:16:42 +08:00
peaceh-nv
52684d79f7
Fix : fix moe regression for sm120 (#5823)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-09 21:25:11 +08:00
tomeras91
5aa958a11a
[TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H (#5371)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-09 11:30:15 +03:00
ixlmar
10e686466e
fix: use current_image_tags.properties in rename_docker_images.py (#5846)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-09 17:07:52 +09:00
Omer Ullman Argov
a32f7083b4
[ci] parallelize torch unittests (#5714)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-09 11:05:57 +03:00
Dom Brown
3e3b1769ad
[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-07-09 08:21:58 +01:00
dongxuy04
dd3c736c7e
chore: some refactor on WideEP (#5727)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-07-09 14:26:57 +08:00
chenfeiz0326
64fd64fcf2
[TRTLLM-6262] Fix Llama4 Scout FP4 crash issue (#5834)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-07-09 14:23:21 +08:00
Chang Liu
4df5f96c8d
[Bugfix] LLama4: fix for llama4 multimodal support (#5809) 2025-07-09 13:03:40 +09:00
Erin
e277766f0d
chores: merge examples for v1.0 doc (#5736)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
Xianjie Qiao
5ab1cf5ae6
Remove unnecessary benchmarking results (#5852)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-07-09 11:19:06 +08:00
Lucas Liebenwein
d14dd2f597
[AutoDeploy] re-enable waive for flaky AD test (#5867)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-07-09 11:47:48 +09:00
Bo Li
9d894bc0cb
fix: [https://nvbugspro.nvidia.com/bug/5375656] Unwaive for bug 5375656. (#5842)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-09 10:17:05 +08:00
brb-nv
2bd09ed2d4
fix: Skip rope scaling for local layers in Gemma3 VLM (#5857)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-09 10:10:33 +08:00
jiahanc
c24eb67054
Doc: fix link in llama4 Maverick example (#5864)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 11:09:58 +09:00
Wanli Jiang
e1fb1de4d9
feat: TRTLLM-6224 update xgrammar version to 0.1.19 (#5830)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-09 09:59:14 +08:00
Jhao-Ting Chen
e4c777df7d
Add is_fp8_output key to XQA kernel cubin hashing (solves Eagle3-one-engine Hopper fp8 bug) (#5813)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-07-09 09:26:27 +08:00
Venky
e27215ca03
test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) (#5654)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-08 18:16:21 -07:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide (#5810)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
Omer Ullman Argov
d6d2ab2c99
[fix] Catch inference failures in trtllm-bench (#5841)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-09 03:53:03 +03:00
xavier-nvidia
b6013da198
Fix GEMM+AR fusion on blackwell (#5563)
Signed-off-by: xsimmons <xsimmons@nvidia.com>
2025-07-09 08:48:47 +08:00
Fridah-nv
a79b73f577
fix: [5376140] [AutoDeploy] Update unit tests: skip all_close assert for dropout in attention, increase tolerance for rope op test (#5855)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-07-09 09:13:31 +09:00
Iman Tabrizian
c508b994b6
Fix lost requests for disaggregated serving (#5815)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-09 08:42:45 +09:00
Yan Chunwei
e50d95c40d
chore [TRTLLM-6161]: add LLM speculative decoding example (#5706)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-09 07:33:11 +08:00
Pamela Peng
da8c7372d4
[TRTLLM-5366][feat]Add support for sm121 (#5524)
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

 Initial CI run failed a single step A30-CPP-3  due to timeout. Rerunning that step succeeded.
2025-07-08 14:27:00 -07:00
Chang Liu
08a3dfeb2b
[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava (#5814) 2025-07-08 09:53:11 -07:00
Dom Brown
e3ccca06e1
test: reduce redundant test cases for TRTLLM Gen FP8 MoE (#5845)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-07-09 00:40:33 +09:00
Kaiyu Xie
bb5b16fcb9
feat: Return context response immediately when stream_interval > 1 (#5836)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-09 00:19:57 +09:00
Yiteng Niu
3079e8cf0c
[TRTLLM-5878] update nspect version (#5832)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-07-08 22:00:09 +08:00
Raayan Dhar
e3268a4221
[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg (#5732)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-07-08 09:39:58 -04:00
Yukun He
e104f8bbb5
[5305318] fix: Fix the accuracy issue when reduce_fusion is enabled for GEMMA model. (#5801)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-08 19:51:05 +08:00
Yegor
b01d1c28f7
[feat] Detokenize option in /v1/completions request (#5382)
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
2025-07-08 19:36:04 +08:00
Tailing Yuan
ba0aea1da6
Fix a quote error introduced in #5534 (#5816)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-08 18:48:32 +08:00
Yiteng Niu
541ab77189
update namelist in blossom-ci (#5838)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-07-08 18:07:07 +08:00
Yiqing Yan
ec0d7e64b9
[Infra] - Waive L0 test (#5837)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-08 17:54:06 +08:00
xinhe-nv
89bbb230cc
tests: waive failed cases on main (#5781)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-08 19:44:12 +10:00
Tailing Yuan
035155df7c
Fix: ignore nvshmem_src_*.txz from confidentiality-scan (#5831)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-08 17:17:29 +09:00
xiweny
eaf8bec88b
fix: Disaggregate serving with attention DP (#4993)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-07-08 16:15:03 +08:00
nv-guomingz
c8fa08da5c
doc: update cuda_graph_config usage part in DS R1 docs (#5796)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-08 16:54:46 +09:00
Yiqing Yan
5203a0f6df
chore: bump version to 1.0.0rc3 (#5819)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-08 16:04:40 +09:00
Enwei Zhu
55f86ce7ab
[NvBug 5362426] fix: Fix prompt adapter TP2 case (#5782)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-08 16:01:36 +09:00
Venky
9258187e98
Waive some test_llama_eagle3 unittests (#5811)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-08 15:35:27 +09:00
Po-Wei (Vincent)
864de5b8b2
[None][infra] Set the label community action to only run on upstream TRTLLM (#5806)
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
2025-07-08 15:33:42 +09:00
Zhenhuan Chen
dee6644ed9
feat(scaffolding): add streaming scaffolding_llm.generate_async support (#5345)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-08 15:08:40 +09:00
JieXin Liang
664bf95892
[fix] improve fp4_block_scale_moe_runner type check (#5681)
Signed-off-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
2025-07-08 14:32:14 +09:00
liji-nv
95978e3044
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage (#5700)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-08 12:42:15 +08:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818) 2025-07-08 13:15:30 +09:00
Yechan Kim
5bc3a15f10
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-07 18:03:12 -07:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
davidclark-nv
a1235ee978
[feat] Adds optional module cache for TRT-LLM Gen Gemm interfaces (#5743)
Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-07-07 13:34:55 -07:00