1373 Commits

Author SHA1 Message Date
Andreas Karatzas 4db300e95f [ROCm][CI] Removed problematic command override mechanism (#42807)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-16 17:35:05 +08:00
Zhewen Li 657b42b592 [Docker][KVConnector] Build mooncake-transfer-engine from source (#42114)
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: khluu <khluu000@gmail.com>
2026-05-16 00:26:25 -07:00
Michael Goin de2d76f352 [Build] Switch CUDA 12.9 wheel builds to PyTorch manylinux_2_28 base (#41668)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-15 13:46:16 -07:00
Andreas Karatzas d735968f6d [ROCm][CI] Stage B gating (#42025)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-15 01:49:27 -07:00
Cyrus Leung 2676ab1e0b [Deprecation] Remove old locations of get_tokenizer and resolve_hf_chat_template (#35024)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-05-15 00:13:32 -07:00
Louie Tsai e30f39c4f1 Update Intel Xeon model list and vLLM Benchmark Suite BKMs (#42607)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
2026-05-15 05:14:03 +00:00
Chengyi Nie fa2a33b893 [Quant] Consolidate GPTQ: rename gptq_marlin.py to auto_gptq.py (#38288)
Signed-off-by: Chengyi Nie <cnie@roblox.com>
Co-authored-by: Chengyi Nie <cnie@roblox.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 08:25:52 +08:00
zhanqiuhu 24337fb860 PD disagg with NIXL Connector: GDN support (Qwen3.5) (#41869)
Signed-off-by: Zhanqiu Hu <zhu@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-05-14 16:33:01 +02:00
Libin Tang 9946c38b7f [XPU] Fix double-transpose in XPUFP8ScaledMMLinearKernel for W8A8 quant method (#41689)
Signed-off-by: Libin Tang <libin.tang@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-14 17:17:39 +08:00
liuzhenwei b26558d4a3 [CI][XPU] skip ut of offload connector (#42598)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2026-05-14 13:13:53 +08:00
Michael Goin 2f821faeae [Spec Decode] Support hybrid attention models in extract_hidden_states (#39949)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 10:45:53 -07:00
Wentao Ye e35c0d4c63 [Feature] Support compile mode for batch invariance on SM80 (#42456)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-13 11:02:39 -04:00
Kevin H. Luu f6e868fbdf [CI] Use uv with Python 3.12 for PyPI wheel upload (#42470)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 02:12:06 -07:00
Yifan Qiao 9ce74042d3 [Bugfix][SimpleCPUOffloadBackend] Dedup in-flight CPU offload stores across scheduler steps (#41289)
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-05-13 01:53:32 -07:00
Alec 07534b8782 [PD] Bump NIXL connector dependency to 1.x (#42364)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
2026-05-12 18:05:01 -07:00
Kevin H. Luu 8c4fc4202a [CI] Inline build artifact annotations in release pipeline (#42357)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-12 15:57:43 -07:00
Giancarlo Delfin fe5b4e0fe7 [Model Runner V2] Apply synthetic mode to probabilistic rejection sampler (#41035) 2026-05-12 13:37:03 -07:00
Kevin H. Luu 379f0ec369 [CI] Migrate 6 verified jobs from gpu_1_queue to h200_18gb MIG (#42446)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 11:52:01 -07:00
shanjiaz 6ccb10d794 Added peagle speculators support (#41826)
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
2026-05-12 07:55:57 -07:00
Michael Goin d077622d60 [Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython (#41516)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 10:27:29 -04:00
Kevin H. Luu e1c8776e90 [CI] Move DockerHub and PyPI publish steps to end of release pipeline (#42355)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-12 09:17:42 +00:00
Kevin H. Luu 1ff9d33535 [CI] Migrate remaining B200 jobs to b200-k8s with test fixes (#42387)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-12 02:00:37 -07:00
Kevin H. Luu f69644caf8 [CI] Migrate more B200 jobs to b200-k8s queue (#42356)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-12 00:38:31 -07:00
wang.yuqi a0dc7a0f36 [CI] Consolidate Speech to Text tests (#42274)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-05-11 19:50:17 +00:00
Flora Feng 639cbfd274 [CI] Add tests/parser to CI coverage (#41877)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-11 19:08:54 +00:00
haosdent 17ed5e61f5 [CI] Make Python-only Installation optional (#42293)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-05-11 09:47:16 +00:00
Jee Jee Li 05d610e5cd [CI/Build] Reduce LoRA model tests. (#42266)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-11 14:49:08 +08:00
Andreas Karatzas 0a309b5ee9 [ROCm] Cap Triton paged attention block size to fix ROCm shared memory OOM (#38502)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-10 10:03:00 +00:00
Jee Jee Li 84f7a55340 [CI] Trigger LoRA test when changing MoE code. (#42196)
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
2026-05-10 01:26:09 -07:00
Andreas Karatzas f2840120f6 [ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics (#41313)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-10 14:50:16 +08:00
Andreas Karatzas fb1ac806c5 [ROCm][CI] Stabilize ROCm shutdown and distributed compile CI (#41573)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-10 03:47:40 +00:00
Lucas Wilkinson b1728c1e66 [Attention][Cleanup] Remove tree attention (#42121)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-05-08 18:36:19 -07:00
Kevin H. Luu 0c2e9d4892 [CI] Narrow misc.yaml source dependencies (#42059)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-08 15:10:12 -07:00
Kevin H. Luu d2f22dfc9f [CI] Narrow engine.yaml source dependencies (#42055)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-08 14:55:33 -07:00
Kevin H. Luu f4dd5c116c [CI] Narrow Platform Tests (CUDA) source dependencies (#42054)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-08 14:54:06 -07:00
Kevin H. Luu f47ccc8b1c [CI] Narrow pytorch.yaml compile job source dependencies (#42057)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-08 14:43:17 -07:00
liuzhenwei f2bbd575e2 [CI][XPU] Skip fork-dependent logits processor test (#42013)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2026-05-08 06:10:19 -07:00
Chaojun Zhang 19df11f5d1 [CI][XPU]Ignore some lora tests from LoRA Intel CI pipeline (#42010)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
2026-05-08 17:34:27 +08:00
haosdent 36b2c79d4b [CI][Bugfix] Drop duplicated examples/ prefix in tensorize_vllm_model command (#42039)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-05-08 02:23:22 -07:00
wang.yuqi 1d694e78c9 [Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
Chaojun Zhang 805e9f7b77 [XPU] Fix lora bugs & enable UTs under tests/lora (#38206)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
2026-05-07 05:58:00 -07:00
Li, Jiang b3945cc316 [CPU] Bump up to the latest CPU kernels (#41924)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-05-07 05:45:59 -07:00
Fadi Arafeh b20731d0ae [CI][Arm] skip e2e model tests if HF_TOKEN is not set (#41919)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-05-07 11:31:50 +00:00
Yuwen Zhou 713b28bd0b [CPU] Add FP8 W8A16 MoE support (#41314)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
2026-05-06 23:17:07 -07:00
Fadi Arafeh 51f22dcfd0 [Feat][CPU] Enable Gated DeltaNet Attention (Qwen 3.5 / 3.6) (#41025)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
2026-05-07 12:57:09 +08:00
Micah Williamson 7a576e2c72 [ROCm][CI] Remove TORCH_NCCL_BLOCKING_WAIT=1 After Bugfix In ROCm 7.2 (#41840)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-05-06 16:37:11 -07:00
Flora Feng f3f8efa73a [CI] Enable gemma4 parser test on CI (#41857)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-05-06 20:25:34 +00:00
Nicolò Lucchesi e43a791284 [Bugfix][CI] Fix Disaggregated test area path (#41794)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-05-06 17:41:24 +08:00
Yuwen Zhou 809b98e5b7 [CPU] Add FP8 W8A16 linear support (#41186)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
2026-05-06 07:05:27 +00:00
Andreas Karatzas 91740ca5ea [ROCm][CI] Refine gating tests (#37243)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-05 22:05:20 -07:00