Harry Mellor
|
e28e369f78
|
Male Mergify comment less spammy (#44666)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 10:56:52 -07:00 |
|
yzong-rh
|
703fb17b13
|
[Bugfix] GPT-OSS instruction rendering (#44330)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-06-05 13:52:32 -04:00 |
|
Sting Lin
|
b593396c7a
|
Upgrade tpu-inference to v0.21.0 (#44621)
Signed-off-by: StingLin <sting.lin@cienet.com>
|
2026-06-05 16:12:49 +00:00 |
|
Flame
|
91e17d4315
|
Fix sarvam forward compatibility with transformers v5 (#38804)
Signed-off-by: vikrantpalle <vikrantpalle@gmail.com>
|
2026-06-05 11:51:44 -04:00 |
|
TJian
|
aa6fb8a329
|
[Bugfix] [ROCm] [Critical] fallback to regular abi for ROCm (#44648)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-06-05 15:51:17 +00:00 |
|
Effi Ofer
|
6a894574bf
|
Add objectstore as a secondary tier to multi-tier kv cache offloading (#41968)
Signed-off-by: Effi Ofer <effi.ofer@gmail.com>
|
2026-06-05 18:05:41 +03:00 |
|
Yan Ma
|
7f003a1285
|
Support MiniCPMV batched preprocessing (#44609)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-06-05 15:05:31 +00:00 |
|
Harry Mellor
|
ef0df7dbd6
|
[CI] Bump mypy version 1.19.1 -> 1.20.2 (#44647)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:56:27 +00:00 |
|
Harry Mellor
|
a80af24356
|
Speed up docs build (#44635)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:51:44 +00:00 |
|
Harry Mellor
|
c66b19800b
|
[CI] Bump mistral-common (#44649)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-06-05 14:18:50 +00:00 |
|
rishitdholakia13
|
6a11d72df7
|
[Reasoning][Structured Outputs] Add Command A plus tags for structural tags (#44588)
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-06-05 06:51:20 -07:00 |
|
Woosuk Kwon
|
02d2da0748
|
[DSV4] Move more ops out of eager breakpoint (#44561)
|
2026-06-05 06:42:41 -07:00 |
|
adhithyamulticoreware
|
bbb6c274c8
|
[Bugfix] Fix gemma4 crash on CPU: guard mem_get_info call (#44615)
Signed-off-by: ADHITHYA BALAKRISHNAN <adhithya.balakrishnan@multicorewareinc.com>
|
2026-06-05 12:47:56 +00:00 |
|
Harry Mellor
|
62215e72c6
|
Remove KV cache scale boilerplate from model weight loading methods (#43167)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-06-05 05:19:04 -07:00 |
|
Tianyu Zhang
|
7fe7800fa4
|
[BUG] Fix FP64 Gumbel precision coverage (#43150)
Signed-off-by: tianyu-z <zhangtianyupro@gmail.com>
Signed-off-by: Tianyu Zhang <53099276+tianyu-z@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-06-05 19:04:14 +08:00 |
|
HueCodes
|
8a83e6f2d7
|
[Rust Frontend] Batch auto-abort requests by engine (#44591)
Signed-off-by: Hugh Ryan <197298026+HueCodes@users.noreply.github.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-05 02:59:09 -07:00 |
|
Chunyang Wen
|
efc347f1b2
|
docs: fix tokenizer optimization typo (#44066)
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
|
2026-06-05 02:12:49 -07:00 |
|
Nicolò Lucchesi
|
d98b8f371c
|
[NixlConnector] Initiate deprecation cycle for kv_both role (#43874)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-06-05 11:08:17 +02:00 |
|
Chao-Ju Chen
|
e64237ae82
|
[Rust Frontend] Support include_reasoning=false (#44391)
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
|
2026-06-05 16:47:50 +08:00 |
|
XuZhou
|
d61d8566ec
|
[Bugfix] Update mistral tokenizer test for continue_final_message fix (#44622)
Signed-off-by: Xu Zhou <xuzhou9417@163.com>
Co-authored-by: Xu Zhou <xuzhou9417@163.com>
|
2026-06-05 16:13:26 +08:00 |
|
Uranus
|
d2f70da116
|
fix: pad dummy run query_start_loc (#44603)
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
|
2026-06-05 00:43:04 -07:00 |
|
XuZhou
|
6542d48964
|
[Bugfix] Fix test_invocations flaky failure with newer openai SDK (#44618)
Signed-off-by: Xu Zhou <xuzhou9417@163.com>
Co-authored-by: Xu Zhou <xuzhou9417@163.com>
|
2026-06-05 07:36:20 +00:00 |
|
Ting SUN
|
ca73293fa6
|
[Bugfix][Rust Frontend] Fix UTF-8 char-boundary panic in incremental detokenizer (#44620)
Signed-off-by: Ting Sun <suntcrick@gmail.com>
|
2026-06-05 07:36:17 +00:00 |
|
Vic Wen
|
ef3af56a97
|
Fix LLM.wait_for_completion output type docstring (#44617)
Signed-off-by: viiccwen <viiccwen@gmail.com>
|
2026-06-05 00:16:38 -07:00 |
|
Tuukka Sarvi
|
b4a6f26c90
|
[ROCm][perf] Use workspace manager for sparse indexer allocations (#41002)
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>
Signed-off-by: Tuukka Sarvi <tuukka.sarvi@amd.com>
Co-authored-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-06-04 23:46:29 -07:00 |
|
Han Lin
|
165b7864d0
|
[ROCM] [FEAT] Integrate Aiter hipBLASLt GEMM online tuning (#40426)
Signed-off-by: hanlin12 <hanlin12@amd.com>
Signed-off-by: Han Lin <hanlin12@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-06-04 23:45:36 -07:00 |
|
Li, Jiang
|
c505cd93ef
|
[CI/Build] Disable CPU-Compatibility Tests (#44605)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-06-05 13:14:26 +08:00 |
|
qizixi
|
96229fa99e
|
[KVConnector][1/N] PP-aware handshake aggregation and intermediate-PP output plumbing (#43720)
Signed-off-by: zixi-qi <zixi@inferact.ai>
|
2026-06-04 22:04:19 -07:00 |
|
Luciano Martins
|
da1daf40bf
|
[Bugfix] Exclude vision embedder from quantization in Gemma4 Unified (#44571)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2026-06-04 20:47:38 -07:00 |
|
Woosuk Kwon
|
4efd6ffde0
|
[DSV4] Refactor DeepseekV4Attention (#44569)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-06-04 20:23:07 -07:00 |
|
Chris Leonard
|
56aff0dd15
|
[10/n] Migrate cuda_view and silu_and_mul_per_block_quant kernels to torch stale ABI. (#44334)
|
2026-06-04 20:14:43 -07:00 |
|
zofia
|
063ce98fb7
|
[XPU][MoE] support block_fp8_moe on xpu (#42139)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>
|
2026-06-05 08:36:58 +08:00 |
|
Bugen Zhao
|
62d6f06e3d
|
[Rust Frontend] Skip loading multimodal processor if --language-model-only is specified (#44500)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-06-04 17:02:54 -07:00 |
|
Schwinn Saereesitthipitak
|
b7c5baf63d
|
fix: keep DeepSeek V4 RoPE cache on inv_freq device (#43926)
Signed-off-by: Schwinn Saereesitthipitak <schwinns@nvidia.com>
Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com>
|
2026-06-05 02:30:29 +04:00 |
|
Jiangyun Zhu
|
a55fccfc7c
|
[mamba] unify KDA conv states into one cache to match 2-state SSM layout (#44539)
|
2026-06-04 20:38:05 +02:00 |
|
Wentao Ye
|
41a4829f22
|
[Logs Refactor] Optimize shutdown logs, easier to follow and consistent (#43707)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-06-04 14:36:32 -04:00 |
|
Tushar Jain
|
38fd2405f3
|
use split_group for pytorch process group creation (#41980)
Signed-off-by: Tushar Jain <tushar00jain@users.noreply.github.com>
Co-authored-by: Tushar Jain <tushar00jain@users.noreply.github.com>
|
2026-06-04 14:36:07 -04:00 |
|
Agata Dobrzyniewicz
|
a947f7a420
|
[Kernel][Test] Extend lightning_attn and awq_triton kernel tests to XPU (#43307)
Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>
|
2026-06-04 14:25:59 -04:00 |
|
bnellnm
|
439203d32c
|
[Bugfix] Fix test_cutlass_moe.py (#44380)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-06-04 14:18:52 -04:00 |
|
Taneem Ibrahim
|
8d9536a775
|
[Misc] Add unit tests for pooler head classes (#44471)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-06-04 17:59:25 +00:00 |
|
Fadi Arafeh
|
3da29aa4a5
|
[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#34894)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-06-04 16:27:17 +00:00 |
|
Divakar Verma
|
06f94633e7
|
[ROCm][CI] Add test for Aiter unified attn kernel (#44436)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 16:15:05 +00:00 |
|
JianweiZheng
|
99ef652907
|
[Bugfix] Reject non-positive values for ParallelConfig int knobs (#44057)
Signed-off-by: jwzheng96 <jianweizheng@pku.edu.cn>
Signed-off-by: JianweiZheng <32029023+jwzheng96@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-06-04 11:46:50 -04:00 |
|
Tyler Michael Smith
|
4cc78c9d5d
|
[Core] Freeze garbage collector in workers after model initialization (#44363)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-06-04 08:39:04 -07:00 |
|
tc-mb
|
3dbb4e0ace
|
[Bugfix] MiniCPM-V-4.6 video inference crash: placeholder count mismatches visual embedding count (#44509)
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
|
2026-06-04 08:22:30 -07:00 |
|
Zvi Kons
|
b21443e23c
|
Add model support for granite speech plus (#43519)
Signed-off-by: Zvi Kons[WSL] <zvi@il.ibm.com>
Signed-off-by: Zvi Kons (BlueVela) <zvi@il.ibm.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
2026-06-04 14:47:48 +00:00 |
|
Michael Goin
|
06ee2d8433
|
[Quant] Support compressed-tensors WNA8O8Int linears and WNInt embeddings (#44340)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-06-04 07:40:33 -07:00 |
|
Yongye Zhu
|
b5235fca2e
|
[DSv4] Adding TRTLLM gen attention kernel (#43827)
|
2026-06-04 07:35:09 -07:00 |
|
Andreas Karatzas
|
3e77036768
|
[ROCm][CI] Specifying time outs for the lm eval models (#44255)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 22:35:00 +08:00 |
|
Andreas Karatzas
|
6f68ca3e91
|
[ROCm][CI] Stabilize memory-release in the Hybrid model generation tests (#44046)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-06-04 22:34:24 +08:00 |
|