Andreas Karatzas
|
4db300e95f
|
[ROCm][CI] Removed problematic command override mechanism (#42807)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-16 17:35:05 +08:00 |
|
Zhewen Li
|
657b42b592
|
[Docker][KVConnector] Build mooncake-transfer-engine from source (#42114)
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: khluu <khluu000@gmail.com>
|
2026-05-16 00:26:25 -07:00 |
|
Michael Goin
|
de2d76f352
|
[Build] Switch CUDA 12.9 wheel builds to PyTorch manylinux_2_28 base (#41668)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-05-15 13:46:16 -07:00 |
|
Andreas Karatzas
|
d735968f6d
|
[ROCm][CI] Stage B gating (#42025)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-15 01:49:27 -07:00 |
|
Cyrus Leung
|
2676ab1e0b
|
[Deprecation] Remove old locations of get_tokenizer and resolve_hf_chat_template (#35024)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-05-15 00:13:32 -07:00 |
|
Louie Tsai
|
e30f39c4f1
|
Update Intel Xeon model list and vLLM Benchmark Suite BKMs (#42607)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-05-15 05:14:03 +00:00 |
|
Chengyi Nie
|
fa2a33b893
|
[Quant] Consolidate GPTQ: rename gptq_marlin.py to auto_gptq.py (#38288)
Signed-off-by: Chengyi Nie <cnie@roblox.com>
Co-authored-by: Chengyi Nie <cnie@roblox.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-15 08:25:52 +08:00 |
|
zhanqiuhu
|
24337fb860
|
PD disagg with NIXL Connector: GDN support (Qwen3.5) (#41869)
Signed-off-by: Zhanqiu Hu <zhu@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-05-14 16:33:01 +02:00 |
|
Libin Tang
|
9946c38b7f
|
[XPU] Fix double-transpose in XPUFP8ScaledMMLinearKernel for W8A8 quant method (#41689)
Signed-off-by: Libin Tang <libin.tang@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-14 17:17:39 +08:00 |
|
liuzhenwei
|
b26558d4a3
|
[CI][XPU] skip ut of offload connector (#42598)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-05-14 13:13:53 +08:00 |
|
Michael Goin
|
2f821faeae
|
[Spec Decode] Support hybrid attention models in extract_hidden_states (#39949)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-05-13 10:45:53 -07:00 |
|
Wentao Ye
|
e35c0d4c63
|
[Feature] Support compile mode for batch invariance on SM80 (#42456)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-13 11:02:39 -04:00 |
|
Kevin H. Luu
|
f6e868fbdf
|
[CI] Use uv with Python 3.12 for PyPI wheel upload (#42470)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-05-13 02:12:06 -07:00 |
|
Yifan Qiao
|
9ce74042d3
|
[Bugfix][SimpleCPUOffloadBackend] Dedup in-flight CPU offload stores across scheduler steps (#41289)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-05-13 01:53:32 -07:00 |
|
Alec
|
07534b8782
|
[PD] Bump NIXL connector dependency to 1.x (#42364)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
|
2026-05-12 18:05:01 -07:00 |
|
Kevin H. Luu
|
8c4fc4202a
|
[CI] Inline build artifact annotations in release pipeline (#42357)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-12 15:57:43 -07:00 |
|
Giancarlo Delfin
|
fe5b4e0fe7
|
[Model Runner V2] Apply synthetic mode to probabilistic rejection sampler (#41035)
|
2026-05-12 13:37:03 -07:00 |
|
Kevin H. Luu
|
379f0ec369
|
[CI] Migrate 6 verified jobs from gpu_1_queue to h200_18gb MIG (#42446)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-05-12 11:52:01 -07:00 |
|
shanjiaz
|
6ccb10d794
|
Added peagle speculators support (#41826)
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
|
2026-05-12 07:55:57 -07:00 |
|
Michael Goin
|
d077622d60
|
[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython (#41516)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-12 10:27:29 -04:00 |
|
Kevin H. Luu
|
e1c8776e90
|
[CI] Move DockerHub and PyPI publish steps to end of release pipeline (#42355)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-12 09:17:42 +00:00 |
|
Kevin H. Luu
|
1ff9d33535
|
[CI] Migrate remaining B200 jobs to b200-k8s with test fixes (#42387)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-12 02:00:37 -07:00 |
|
Kevin H. Luu
|
f69644caf8
|
[CI] Migrate more B200 jobs to b200-k8s queue (#42356)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-12 00:38:31 -07:00 |
|
wang.yuqi
|
a0dc7a0f36
|
[CI] Consolidate Speech to Text tests (#42274)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-05-11 19:50:17 +00:00 |
|
Flora Feng
|
639cbfd274
|
[CI] Add tests/parser to CI coverage (#41877)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-05-11 19:08:54 +00:00 |
|
haosdent
|
17ed5e61f5
|
[CI] Make Python-only Installation optional (#42293)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-05-11 09:47:16 +00:00 |
|
Jee Jee Li
|
05d610e5cd
|
[CI/Build] Reduce LoRA model tests. (#42266)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-11 14:49:08 +08:00 |
|
Andreas Karatzas
|
0a309b5ee9
|
[ROCm] Cap Triton paged attention block size to fix ROCm shared memory OOM (#38502)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-10 10:03:00 +00:00 |
|
Jee Jee Li
|
84f7a55340
|
[CI] Trigger LoRA test when changing MoE code. (#42196)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
|
2026-05-10 01:26:09 -07:00 |
|
Andreas Karatzas
|
f2840120f6
|
[ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics (#41313)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-10 14:50:16 +08:00 |
|
Andreas Karatzas
|
fb1ac806c5
|
[ROCm][CI] Stabilize ROCm shutdown and distributed compile CI (#41573)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-10 03:47:40 +00:00 |
|
Lucas Wilkinson
|
b1728c1e66
|
[Attention][Cleanup] Remove tree attention (#42121)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-05-08 18:36:19 -07:00 |
|
Kevin H. Luu
|
0c2e9d4892
|
[CI] Narrow misc.yaml source dependencies (#42059)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-08 15:10:12 -07:00 |
|
Kevin H. Luu
|
d2f22dfc9f
|
[CI] Narrow engine.yaml source dependencies (#42055)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-08 14:55:33 -07:00 |
|
Kevin H. Luu
|
f4dd5c116c
|
[CI] Narrow Platform Tests (CUDA) source dependencies (#42054)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-08 14:54:06 -07:00 |
|
Kevin H. Luu
|
f47ccc8b1c
|
[CI] Narrow pytorch.yaml compile job source dependencies (#42057)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-05-08 14:43:17 -07:00 |
|
liuzhenwei
|
f2bbd575e2
|
[CI][XPU] Skip fork-dependent logits processor test (#42013)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-05-08 06:10:19 -07:00 |
|
Chaojun Zhang
|
19df11f5d1
|
[CI][XPU]Ignore some lora tests from LoRA Intel CI pipeline (#42010)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
|
2026-05-08 17:34:27 +08:00 |
|
haosdent
|
36b2c79d4b
|
[CI][Bugfix] Drop duplicated examples/ prefix in tensorize_vllm_model command (#42039)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-05-08 02:23:22 -07:00 |
|
wang.yuqi
|
1d694e78c9
|
[Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-07 19:42:12 -07:00 |
|
Chaojun Zhang
|
805e9f7b77
|
[XPU] Fix lora bugs & enable UTs under tests/lora (#38206)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
|
2026-05-07 05:58:00 -07:00 |
|
Li, Jiang
|
b3945cc316
|
[CPU] Bump up to the latest CPU kernels (#41924)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-05-07 05:45:59 -07:00 |
|
Fadi Arafeh
|
b20731d0ae
|
[CI][Arm] skip e2e model tests if HF_TOKEN is not set (#41919)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-05-07 11:31:50 +00:00 |
|
Yuwen Zhou
|
713b28bd0b
|
[CPU] Add FP8 W8A16 MoE support (#41314)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
|
2026-05-06 23:17:07 -07:00 |
|
Fadi Arafeh
|
51f22dcfd0
|
[Feat][CPU] Enable Gated DeltaNet Attention (Qwen 3.5 / 3.6) (#41025)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
|
2026-05-07 12:57:09 +08:00 |
|
Micah Williamson
|
7a576e2c72
|
[ROCm][CI] Remove TORCH_NCCL_BLOCKING_WAIT=1 After Bugfix In ROCm 7.2 (#41840)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-05-06 16:37:11 -07:00 |
|
Flora Feng
|
f3f8efa73a
|
[CI] Enable gemma4 parser test on CI (#41857)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-05-06 20:25:34 +00:00 |
|
Nicolò Lucchesi
|
e43a791284
|
[Bugfix][CI] Fix Disaggregated test area path (#41794)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-05-06 17:41:24 +08:00 |
|
Yuwen Zhou
|
809b98e5b7
|
[CPU] Add FP8 W8A16 linear support (#41186)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
|
2026-05-06 07:05:27 +00:00 |
|
Andreas Karatzas
|
91740ca5ea
|
[ROCm][CI] Refine gating tests (#37243)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-05-05 22:05:20 -07:00 |
|