Chang Liu
|
308776442a
|
[nvbug/5308432] fix: extend triton exit time for test_llava (#5971)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-07-12 12:56:37 +09:00 |
|
juney-nvidia
|
63cf929188
|
Added code owners for LLM API (#5960)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
|
2025-07-12 10:30:17 +09:00 |
|
Thor Johnsen
|
041f1fa513
|
[TRTLLM-6264] Fix flaky test_e2e.py::test_openai_lora (#5885)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
|
2025-07-11 16:20:41 -07:00 |
|
2ez4bz
|
6304866ce8
|
[refactor] Move vision parts from processor to model for Gemma3 (#5888)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-11 15:13:51 -07:00 |
|
xinhe-nv
|
509363d858
|
tests: update sanity tests & fix tests (#5906)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-11 19:48:19 +10:00 |
|
Shi Xiaowei
|
f4e0425a7b
|
doc: update the link of the diagram (#5953)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-07-11 18:02:22 +09:00 |
|
Shi Xiaowei
|
49359574c1
|
[TRTLLM-5673] Doc: ensure the disagg doc is up to date (#5938)
|
2025-07-11 17:39:05 +09:00 |
|
ChristinaZ
|
c5fb692a7d
|
Refactor the rest routing part for the routing kernels in the MoE TRT-LLM backend (#5771)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-07-11 16:37:56 +08:00 |
|
Shi Xiaowei
|
37293e4dfd
|
blog: add qwen3 disagg perf metrics (#5822)
|
2025-07-11 16:41:45 +09:00 |
|
William Tambellini
|
fbb4cc7379
|
[TRTLLM-4770][feat] Enhance cpp executor cmake to listen to ENABLE_MU… (#5104)
...LTI_DEVICE
Signed-off-by: William Tambellini <wtambellini@sdl.com>
|
2025-07-11 10:59:44 +08:00 |
|
brb-nv
|
0385f89abc
|
test: Fix Gemma3 unit tests due to transformers upgrade (#5921)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-10 17:24:10 -07:00 |
|
Void
|
854655f2f7
|
deepEP fp4 post quant all2all dispatch (#5881)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-07-11 08:18:54 +08:00 |
|
Frank
|
aa4eebe973
|
[enhance] Add the ability to write a request timeline. (#5258)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
|
2025-07-10 17:15:30 -07:00 |
|
Zhihan Jiang
|
682acd40da
|
[nvbugs/5321981] Cherrypick fix: Fix the Llama3.1 405B hanging issue. (#5698) (#5925)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-07-11 07:51:43 +08:00 |
|
2ez4bz
|
c19840235d
|
[fix] Fix mistral unit tests due to transformers upgrade (#5904)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-10 10:45:27 -07:00 |
|
Iman Tabrizian
|
c32c9e2fad
|
doc: Add instructions for running gemma in disaggregated serving (#5922)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-10 10:21:19 -07:00 |
|
Linda
|
4d071eb2d1
|
feat: binding type build argument (pybind, nanobind) (#5802)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-11 00:48:50 +09:00 |
|
wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Zhanrui Sun
|
67a39dbd63
|
infra: [TRTLLM-6054][TRTLLM-5804] Fix two known NSPECT high vulnerability issues and reduce image size (#5434)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-07-10 23:24:46 +09:00 |
|
narutolhy
|
41ef1ade19
|
feat:enable kvcache to be reused during request generation (#4028)
Signed-off-by: narutolhy <582909902@qq.com>
|
2025-07-10 22:18:01 +09:00 |
|
Kaiyu Xie
|
7b09a415c1
|
fix: Make the bench serving script compatible with different usages (#5905)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-10 19:36:26 +08:00 |
|
Jinyang Yuan
|
8b9a030a5c
|
[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr (#5900)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-10 20:07:32 +09:00 |
|
Yiqing Yan
|
3aa53ec36c
|
[None] - Waive L0 tests (#5915)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-10 18:33:17 +08:00 |
|
Enwei Zhu
|
055c4a9fe6
|
[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-10 16:30:00 +08:00 |
|
CarstyYou
|
dc32f9ae73
|
[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2025-07-10 15:16:18 +08:00 |
|
Anthony Chang
|
7d21b55b5a
|
[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-07-10 14:06:50 +08:00 |
|
Aurelien Chartier
|
3ec3ff1d82
|
chore: remove support for llmapi + TRT backend in Triton (#5856)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-09 21:30:34 -07:00 |
|
QI JUN
|
e289a98d5a
|
avoid nesting NCCL group in allgather and reduce scatter OPs (#5866)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-10 12:32:59 +09:00 |
|
Yan Chunwei
|
07f6da763d
|
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-10 11:31:35 +08:00 |
|
Hanjun Cho
|
6490a27ad7
|
[feat] Add TensorRT-Engine Qwen3 (dense) model support (#5650)
Signed-off-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>
Signed-off-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>
|
2025-07-10 10:26:06 +08:00 |
|
Venky
|
f57b3d6829
|
Waive unittest failures introduced by PR#5345 (removal of ScaffoldingOutput class) (#5886)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-07-10 09:53:31 +08:00 |
|
peaceh-nv
|
76c3a12bcb
|
[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-10 09:20:30 +08:00 |
|
brb-nv
|
3209b31665
|
feat: Custom masking utils for Gemma3 VLM (#5853)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-10 06:18:04 +09:00 |
|
2ez4bz
|
87fe44fd29
|
feat(models): Mistral3.1 VLM pytorch backend support (#5529)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-09 13:17:40 -07:00 |
|
Chang Liu
|
b61a717275
|
[1/N][TRTLLM-5195][feat] Share PyTorch tensor between processes (#5396)
|
2025-07-10 05:12:53 +09:00 |
|
Wanli Jiang
|
3f7cedec7c
|
Update transformers to 4.53.0 (#5747)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-09 09:32:24 -07:00 |
|
DylanChen-NV
|
74dca0aa7b
|
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
|
2025-07-09 23:16:42 +08:00 |
|
peaceh-nv
|
52684d79f7
|
Fix : fix moe regression for sm120 (#5823)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-07-09 21:25:11 +08:00 |
|
tomeras91
|
5aa958a11a
|
[TRTLLM-5838][fix] fix max batch size and max tokens in kv cache estimations for Nemotron-H (#5371)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-07-09 11:30:15 +03:00 |
|
ixlmar
|
10e686466e
|
fix: use current_image_tags.properties in rename_docker_images.py (#5846)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-07-09 17:07:52 +09:00 |
|
Omer Ullman Argov
|
a32f7083b4
|
[ci] parallelize torch unittests (#5714)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-09 11:05:57 +03:00 |
|
Dom Brown
|
3e3b1769ad
|
[TRTLLM-5881] feat: Integrate TRT-LLM Gen FP4 block scale MoE with Pytorch workflow kernel autotuner (#5764)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-07-09 08:21:58 +01:00 |
|
dongxuy04
|
dd3c736c7e
|
chore: some refactor on WideEP (#5727)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-07-09 14:26:57 +08:00 |
|
chenfeiz0326
|
64fd64fcf2
|
[TRTLLM-6262] Fix Llama4 Scout FP4 crash issue (#5834)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-07-09 14:23:21 +08:00 |
|
Chang Liu
|
4df5f96c8d
|
[Bugfix] LLama4: fix for llama4 multimodal support (#5809)
|
2025-07-09 13:03:40 +09:00 |
|
Erin
|
e277766f0d
|
chores: merge examples for v1.0 doc (#5736)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-08 21:00:42 -07:00 |
|
Xianjie Qiao
|
5ab1cf5ae6
|
Remove unnecessary benchmarking results (#5852)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
2025-07-09 11:19:06 +08:00 |
|
Lucas Liebenwein
|
d14dd2f597
|
[AutoDeploy] re-enable waive for flaky AD test (#5867)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-07-09 11:47:48 +09:00 |
|
Bo Li
|
9d894bc0cb
|
fix: [https://nvbugspro.nvidia.com/bug/5375656] Unwaive for bug 5375656. (#5842)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-09 10:17:05 +08:00 |
|
brb-nv
|
2bd09ed2d4
|
fix: Skip rope scaling for local layers in Gemma3 VLM (#5857)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-09 10:10:33 +08:00 |
|