Commit Graph

2732 Commits

Author SHA1 Message Date
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache (#7612)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Yanchao Lu
70aa4e28c1
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 23:48:04 +08:00
Yanchao Lu
89fc136972
[None][ci] Some improvements for Slurm CI (#7689)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 16:56:32 +08:00
Zhanrui Sun
1f43854496
[TRTLLM-6791][infra] Add check for uploading stage name and avoid overriding test result tar file (#6742)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-13 01:15:33 +08:00
Zhanrui Sun
7d73a89ad0
[TRTLLM-7169][infra] Fix Slurm multi-node test showing "Submit Test Results" in the test name (#6856)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-12 18:46:19 +08:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
Guoming Zhang
ef676fc71f
[https://nvbugs/5513192][fix] Add the missing param for kv_cache_tran… (#7679)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-11 19:00:16 +08:00
Chang Liu
3a9847eb84
[https://nvbugs/5498165][fix] fix permission error for config file lock (#7656)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-11 10:36:51 +08:00
Fan - Yunfan
e3117731b3
[None][fix] Fix the incorrect header file import in dataType.h (#7133)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
2025-09-11 08:59:04 +08:00
QI JUN
656f229b58
[None][ci] move some test cases from l40s to a30 (#7684)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-11 07:22:34 +08:00
Kanghwan
aa152ce8cf
[None][infra] Adjust labeling llm prompt for bug issues (#7385)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-09-11 05:10:31 +08:00
Emma Qiao
9986070044
[None][infra] Waive failed cases on main 0910 (#7676)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-11 01:43:29 +08:00
Dom Brown
fc9d426589
[https://nvbugs/5505402] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-10 18:30:48 +01:00
v-shobhit
0652514c6d
[None][feat] Use a shell context to install dependancies (#7383)
Signed-off-by: Shobhit Verma <shobhitv@nvidia.com>
Signed-off-by: v-shobhit <161510941+v-shobhit@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
2025-09-10 09:57:37 -07:00
nvamyt
222e01662c
[https://nvbugs/5488212][waive] Waive failed tests for L20 (#7664)
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-10 22:32:15 +08:00
Leslie Fang
d219a4f225
[None][chore] remove executor config in kv cache creator (#7526)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-10 21:14:44 +08:00
Linda
a4312ba743
[https://nvbugs/5477359][fix] Nanobind: Allow none types for fields in result (#7672)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-10 14:13:46 +01:00
xinhe-nv
207c5258c4
[https://nvbugs/5494698][fix] skip gemma3 27b on blackwell (#7505)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-10 21:09:27 +08:00
Bo Deng
bf57829acf
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-10 17:35:51 +08:00
Yiqing Yan
76c5e1a12f
[None][infra] Bump version to 1.1.0rc5 (#7668)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-10 16:06:54 +08:00
Kanghwan
758c22f832
[#7208][fix] Fix config type of MedusaConfig (#7320)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-09-09 23:25:17 -07:00
Frida Hou
bbb5ae3349
[#5861][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-09-10 13:00:06 +08:00
Zheyu Fu
c353ff342e
[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-09-10 12:53:59 +08:00
Chuang Zhu
f412f5c4b0
[None][fix]UCX zmq ip support ipv6 (#7530)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-10 10:24:41 +08:00
fredricz-20070104
ef620f3579
[https://nvbugs/5410687][test] Add deepseek r1-w4afp8 quickstart (#7645)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-10 10:21:01 +08:00
Guoming Zhang
beefd6413e
[None][fix] fix post-merge issue raised by #5488 (#7655)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-10 09:26:27 +08:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
Jin Li
d49374bc45
[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-09 12:18:56 -04:00
QI JUN
a0e1604898
[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-09 11:06:32 -04:00
Linda
0566df672d
[TRTLLM-6707][fix] nanobind fix for executor exit call (#7565)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-09 14:56:04 +01:00
Richard Huo
dcd110cfac
[None][chore] add TorchLlmArgs to the connector api (#7493)
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
2025-09-09 09:05:59 -04:00
NVJiangShao
cc7593987b
[https://nvbugs/5434424][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-09-09 08:58:15 -04:00
William Tambellini
a6ed0d17d6
[#6798][fix] fix compilation error in ub_allocator in single device build (#6874)
Signed-off-by: William Tambellini <wtambellini@sdl.com>
2025-09-09 07:13:53 -04:00
Liao Lanyu
af403848d7
[https://nvbugs/5445466][fix] unwaive DS R1 test cases with bug already fixed (#7429)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-09 17:25:49 +08:00
Perkz Zheng
da6cb541a2
[None][feat] Optimize MLA kernels with separate reduction kernels (#7597)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-09 16:58:44 +08:00
tomeras91
6e712dd1cc
[None][fix] enable NvFP4/FP8 quantization for Nemotron-H architecture (#7589)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-09-09 11:42:22 +03:00
Linda
9cb5410067
[https://nvbugs/5454559][fix] handle bias term in fuse_gate_mlp (#7449)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-09 10:26:17 +02:00
xinhe-nv
8a52015f50
[None][chore] Remove closed bugs (#7591)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-09 04:08:42 -04:00
Guoming Zhang
62b564ac3c
[None][fix] add the missing import raised by #7607 (#7639)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-09 03:42:42 -04:00
William Zhang
c53d1814a7
[None][feat] Extend VLM factory and add Mistral3 factory (#7583)
This commit:

* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
  to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
  conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
  sampling, but it is not included in the stop words' token IDs).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-09 02:47:18 -04:00
William Tambellini
6ba1c8421c
[#6529][feat] CMake option to link statically with cublas/curand (#7178)
Close #6529.

Signed-off-by: William Tambellini <wtambellini@sdl.com>
2025-09-09 14:26:45 +08:00
Zhanrui Sun
7a62df5f0b
[TRTLLM-4366][infra] Don't call reinstall_rockylinux_cuda when the base CUDA image is up to dated (#5980)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-09 02:15:39 -04:00
Tomer Shmilovich
ecc0e687c6
[None][feat] Nixl support for GDS (#5488)
Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com>
Signed-off-by: Guy Lev <glev@nvidia.com>
Co-authored-by: Guy Lev <glev@nvidia.com>
2025-09-09 13:00:38 +08:00
Guoming Zhang
7f3f658d5f [None][doc] Rename TensorRT-LLM to TensorRT LLM. (#7554)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Guoming Zhang
35dac55716 [None][doc] Update kvcache part (#7549)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Guoming Zhang
f53fb4c803 [TRTLLM-5930][doc] 1.0 Documentation. (#6696)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Yiqing Yan
5c616da2fd
[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-09 11:33:09 +08:00
Wanli Jiang
1e0669d27a
[https://nvbugs/5453709][fix] Remove transformers version limit in Qwen2VL (#7152)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-09 10:38:20 +08:00
Iman Tabrizian
d96c54d8ae
[None][test] Skip eagle3 test (#7627)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-08 17:23:53 -04:00
dongfengy
fdd5bd49fc
[https://nvbugs/5481080][fix] Fix GPTOSS W4A16 reference (#7323)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-08 13:59:28 -07:00