Chauncey
|
87f12e5c7c
|
[Frontend]Responses API supports chat_template_kwargs (#43761)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-05-29 07:58:19 +00:00 |
|
kliuae
|
ab7521d77c
|
[ROCm][DSv4] Remove device pipeline stall in sparse attention (#43898)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2026-05-29 15:42:40 +08:00 |
|
Tianmu Li
|
94d3f4d205
|
[CPU Backend] CPU top-k and top-p sampling kernels using Triton (#43633)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-29 15:02:39 +08:00 |
|
Yintong Lu
|
04516eabc8
|
[XPU] add gelu_tanh to xpu moe backend supported activations (#42822)
Signed-off-by: yintong-lu <yintong.lu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-05-29 14:37:20 +08:00 |
|
Kevin H. Luu
|
648c3ebee6
|
[CI] Separate non-root smoke tests from image build step (#43712)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-05-28 23:34:16 -07:00 |
|
Chris Leonard
|
22a58640b4
|
[9/n] Migrate attention and cache kernels to torch stable ABI (continued) (#43717)
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
|
2026-05-29 04:44:45 +00:00 |
|
Wentao Ye
|
710f077617
|
[Refactor] Remove dead code (#43234)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-29 00:29:56 -04:00 |
|
Itay Etelis
|
d63108fb18
|
[kv_offload] Skip decode-phase blocks in CPU offload (#43797)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
|
2026-05-29 06:39:43 +03:00 |
|
Qiming Zhang
|
9636709372
|
[XPU] add scale transpose to prepare_fp8_moe_layer_for_xpu and bump up kernels (#43277)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-05-29 03:22:51 +00:00 |
|
Weida Hong
|
dfe8ba7c80
|
Adjust design around encoder_cudagraph_forward (#42288)
Signed-off-by: Weida Hong <wdhongtw@google.com>
|
2026-05-29 03:02:52 +00:00 |
|
Jared Wen
|
212deff2ec
|
[feat] add GlmgaProcessor specific logits in glm4_1v.py (#43575)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
|
2026-05-29 02:56:02 +00:00 |
|
Woosuk Kwon
|
7bd45da585
|
[DSv4] Move mHC tilelang kernels & Don't use CustomOP in dsv4/nvidia (#43905)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-05-29 10:25:02 +08:00 |
|
Vadim Gimpelson
|
bf18d7e0b4
|
[Misc][NUMA] Auto-bind to PCT priority cores on DGX B300 + widen EngineCore across shard NUMA nodes (#43270)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Cursor <noreply@cursor.com>
|
2026-05-29 10:07:44 +08:00 |
|
Bugen Zhao
|
1521173c17
|
[Rust Frontend] Add /version endpoint using engine-reported value (#43854)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-05-29 00:32:27 +00:00 |
|
ltd0924
|
b690b2bb67
|
[Model]Support Step-3.7-Flash (#43859)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-28 17:01:48 -07:00 |
|
yzong-rh
|
325a1ec4fb
|
[CI] Enable prefix caching in BFCL benchmark (#43925)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-05-28 23:36:31 +00:00 |
|
Harshal Janjani
|
69c9f19957
|
fix(frontend): Add multimodal placeholders to Gemma4 tool message template (#41459)
Signed-off-by: Harshal Janjani <harshaljanjani@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
|
2026-05-28 14:48:12 -07:00 |
|
rasmith
|
9769e2df2a
|
[AMD][CI][BugFix] Fix Distributed Compile Unit Tests (2xH100-2xMI300) group (#43120)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-05-28 14:39:01 -07:00 |
|
Michael Goin
|
03f03f9630
|
Refactor output filename handling in ci-fetch-log.sh (#43901)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-05-28 14:20:12 -07:00 |
|
Benjamin Chislett
|
9202ea6fda
|
[Spec Decode] Allow causal DFlash (#43445)
|
2026-05-28 21:18:44 +00:00 |
|
Woosuk Kwon
|
69b8956dcd
|
[Model Refactoring] Remove unncessary torch op registration for DSv4 (#43891)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-05-28 14:04:55 -07:00 |
|
Ronen Schaffer
|
a3ed5ab10c
|
[KV Offload] Add per-request offloading policy via on_new_request lifecycle hook (#43205)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-28 20:45:18 +00:00 |
|
Nick Hill
|
7e53283b1c
|
[Core] Cleanup KVConnector handling with PP + fix MRV2 (#43732)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-28 13:12:03 -07:00 |
|
Raj Joshi
|
9090368b65
|
[Feat] Add support for per GPU worker RDMA NIC selection (#42083)
Signed-off-by: Raj Joshi <rajjoshi@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-28 12:45:23 -07:00 |
|
Harry Mellor
|
085ac221a3
|
Deprecate JAISLMHeadModel (#43784)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-28 18:29:12 +00:00 |
|
Hua Huang
|
9006204e90
|
[MM][CG] Avoid over-padding Qwen2.5-VL encoder cudagraph window metadata (#42796)
Signed-off-by: Hua Huang <huah@nvidia.com>
|
2026-05-28 11:22:56 -07:00 |
|
JohnQinAMD
|
ed7fe831da
|
[ROCm] Enable the aiter top-k/top-p sampler by default (#43331)
Signed-off-by: John Qin <yanyuan.qin@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-05-28 13:19:59 -05:00 |
|
Nicolò Lucchesi
|
5b115bb8a3
|
[Attention][AMD] Standardize kv layout to blocks first for AMD (#43660)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-05-28 12:28:50 -05:00 |
|
Mike G
|
53a2088675
|
Allow native KV cache dtype in Triton cache update (#43330)
Signed-off-by: Michael Gschwind <mgschwind@nvidia.com>
Co-authored-by: Michael Gschwind <mgschwind@nvidia.com>
|
2026-05-28 16:51:40 +00:00 |
|
Chao-Ju Chen
|
099024762c
|
[Rust Frontend] Optimize multimodal prompt expansion (#43670)
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
|
2026-05-28 09:46:18 -07:00 |
|
MaciejBalaNV
|
9aa131f944
|
Add Cosmos3 Reasoner model (#43356)
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-05-28 09:43:55 -07:00 |
|
Micah Williamson
|
1b5437cec8
|
[ROCm] Bump ROCm to 7.2.3 (#43136)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-05-28 09:42:43 -07:00 |
|
Jason Elie Bou Kheir
|
3207e7680e
|
[XPU][MoE] Add WNA16 oracle backend for GPTQ sym-int4 (xpu_fused_moe) (#41426)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-05-28 16:30:48 +00:00 |
|
Matthias Gehre
|
a9ec46d4b7
|
[ROCm][Perf] Support N=5 in wvSplitK skinny GEMM kernels for speculative decoding (#40687)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-05-28 16:28:21 +00:00 |
|
Ronen Schaffer
|
4bfa0f2b14
|
[KV Offload] Rename SecondaryTierManager.get_finished() to get_finished_jobs() (#43870)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-05-28 16:00:18 +00:00 |
|
Vadim Gimpelson
|
5d126dd155
|
[Bugfix] Exclude Ray DP from #42585's deferred port allocation (#43864)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-05-28 15:55:14 +00:00 |
|
Majid
|
c08ebebf30
|
[Perf] Add do_not_specialize to Mamba SSD chunk kernels (#43803)
Signed-off-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: Majid Taheri Andani <tahemaji@amazon.com>
|
2026-05-28 15:40:02 +00:00 |
|
Wentao Ye
|
be4062fd6c
|
[Bug] Fix tests/distributed/test_elastic_ep.py - assert False (#43813)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-28 11:00:56 -04:00 |
|
Will.hou
|
577d693838
|
[rust] fix: aggregate is_sleeping and reset_prefix_cache across DP engines (#43429)
Signed-off-by: Will.hou <1205157517@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-28 07:56:56 -07:00 |
|
Bugen Zhao
|
61a1e30473
|
[Rust Frontend] Reduce Gemma4 tool parser args scan complexity (#43850)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-05-28 14:52:29 +00:00 |
|
Bugen Zhao
|
3a282230ee
|
[Rust Frontend] Add hy_v3 tool parser (#43872)
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
|
2026-05-28 14:42:47 +00:00 |
|
Li, Jiang
|
20d69d100a
|
[CPU] Migrate cpu_awq into awq_marlin (#43841)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-05-28 22:36:31 +08:00 |
|
Simon Danielsson
|
552eb81918
|
[Bugfix][ROCm] Resolve MoRI connector hangs at high concurrency (#40344)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2026-05-28 14:30:21 +00:00 |
|
Woosuk Kwon
|
9957e4d240
|
[Model Refactoring] Remove torch compile dependency in DSv4 (#43746)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-05-28 14:26:25 +00:00 |
|
Angelo Ruocco
|
864990e8d9
|
Add token-offset based selective offload in OffloadConnector (#39983)
Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2026-05-28 14:11:02 +00:00 |
|
zexplorerhj
|
f3b2a819f7
|
[Perf][KDA] Fuse gate softplus, chunk-local cumsum, and RCP_LN2 scaling (#43667)
Signed-off-by: haojiangzheng <justineric096@gmail.com>
Co-authored-by: haojiangzheng <justineric096@gmail.com>
|
2026-05-28 13:47:08 +00:00 |
|
Wentao Ye
|
64e1218673
|
[Perf] Optimize moe permute by pre-allocate buffer, 9~14% kernel performance improvement (#43014)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-28 06:18:26 -07:00 |
|
Julien Denize
|
02606b0b09
|
[BUGFIX] Multimodal benchmark with MistralTokenizer (#42965)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
|
2026-05-28 05:36:24 -07:00 |
|
Harry Mellor
|
19af4e6dd4
|
Fix OlmoHybridForCausalLM not initialising (#43846)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-28 05:33:31 -07:00 |
|
omerpaz95
|
811d805195
|
[EC Connector] Add shutdown API to EC Connector. (#42423)
Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
|
2026-05-28 12:28:01 +00:00 |
|