1373 Commits

Author SHA1 Message Date
Kevin H. Luu f653761252 [CI] Route part of B200 jobs to b200-k8s (#41453)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: OpenAI Codex <noreply@openai.com>
2026-05-05 19:00:30 -07:00
Andreas Karatzas 4a8ae26e53 [ROCm][CI] Use vLLM generation defaults for DeepSeek prefetch-offload eval (#41575)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-05-06 01:08:12 +00:00
Kevin H. Luu 1333864408 [CI] Automate Docker Hub release image publishing (#40415)
Signed-off-by: khluu <khluu000@gmail.com>
2026-05-06 00:15:23 +00:00
Artem Perevedentsev 8b9ea2f881 [Feature] Add Triton kernel JIT compilation monitor for inference (#40137)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2026-05-05 14:08:57 +04:00
Gregory Shtrasberg e724b0ea8d [ROCm] ROCm7.2.2 + profiler fix + AITER 0.1.12.post2 (#41386)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Rohan138 <rohanpotdar138@gmail.com>
2026-05-04 13:07:19 -05:00
Michael Goin 4f7309fcc0 [CI] Add ci-fetch-log.sh helper for Buildkite job logs (#41517)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 15:23:59 -07:00
Michael Goin cfd2573f23 [Build] Switch CUDA 13.0 wheel builds to PyTorch manylinux_2_28 base (#41416)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-05-02 05:51:28 -07:00
Michael Goin 3ccc1ff495 [Eval][CI] Add basic mrcr eval to tests/evals/ (#40164)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-05-01 12:00:38 -04:00
vllmellm 529c671e80 [ROCm][FEAT] AITER Fused Allreduce + RMSNorm (#37646)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
Signed-off-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com>
Co-authored-by: junkang1991 <junkangchow@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-05-01 23:07:18 +08:00
Stefano Castagnetta 92a7c121b6 [CI] Add MTP coverage: Qwen3.5 correctness + no-sync spec decode (#40472)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-30 12:24:09 -07:00
Chenxi Qian 54146a9bf9 [Bugfix] correct h matrix layout in chunk_kda output kernel (#40956)
Signed-off-by: ChenxiQian <chenxi.qian.cq@outlook.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-30 16:22:41 +08:00
Kevin H. Luu 0ab67c0222 [CI] Add key field to all test_areas pipeline steps (#41201)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-29 16:59:16 -07:00
Rishi Puri ccfb620c62 Create tests/distributed/test_mnnvl_alltoall.py (#35241)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Claude <claude@anthropic.com>
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
2026-04-29 21:56:56 +00:00
yzong-rh 93da1fe97a [CI] Add temperature to bfcl eval, default greedy (#41059)
Signed-off-by: Yifan Zong <yzong@redhat.com>
2026-04-29 14:01:57 -07:00
Artem Perevedentsev b92ef9ec5a [Perf] Enable FlashInfer top-k/top-p sampler by default (#40376)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
2026-04-29 19:10:34 +04:00
Alec 3f1a4bb639 build: embed image provenance metadata in vLLM containers (#40653)
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
2026-04-29 03:07:41 -07:00
haosdent ef70057ca7 [CI][CPU] Split CPU-Distributed Tests into per-scenario labels (#41203)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-04-29 01:28:45 -07:00
Shengqi Chen e48cb85185 [CI/Build] Auto-detect manylinux ABI tag for nightly wheels (#41149)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-29 00:37:14 -07:00
wang.yuqi a8208e6a81 [Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-28 00:33:41 -07:00
Kunshang Ji 407b34be26 [xpu] bump up vllm-xpu-kernel v0.1.7 (#41019)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-04-28 08:04:54 +08:00
wang.yuqi 8d8062d0a7 [Examples] Resettle generate examples. (#36464)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-27 07:48:37 +00:00
ojhaanshika 592ae6805c Cutlass W4A16 (Machete) Tests (#35450)
Signed-off-by: Anshika Ojha <anshikao@nvidia.com>
2026-04-27 05:15:29 +00:00
Dmitry Tokarev 6dec49f27e [Build] Bump CUDA to 13.0.2 to match PyTorch 2.11.0 (#40669)
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
2026-04-24 10:27:11 +00:00
Shanshan Shen b5587e1013 [CI/Build] Add e2e test for ViT CUDA graph (#40780)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-04-24 18:12:14 +08:00
xiangdong 01acf96c6f [XPU][CI] Fix Docker cleanup races on Intel CI runners (#40761)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
2026-04-24 14:08:45 +08:00
Nicolò Lucchesi 8824f50f1f [CI] Split disaggregated tests into own test-area (#40623)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-04-23 23:20:12 +08:00
xiangdong 01cb41dcf5 [XPU][CI]Temporary disable 3 cases on Intel GPU in CI (#40683)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
2026-04-23 21:42:22 +08:00
Shengqi Chen 3ed5231c6a [Build] Switch default CUDA to 13.0, update CUDA architecture lists, clean up stale build-args (#39878)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 15:51:28 +08:00
Rishi Puri 9f39b380d0 [Bugfix] Fix spec decode test failures on Blackwell (SM100+) (#39546)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2026-04-21 18:21:19 +00:00
xiangdong b2a5518679 [XPU][CI] Add misc, engine and lora cases on Intel GPU in CI (#39887)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-04-21 22:30:46 +08:00
Sage Moore def8f52200 [CI][EPLB] Add Async EPLB end-to-end integration test to CI (#40168)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-04-20 10:22:54 -04:00
Andreas Karatzas a943839e9a [ROCm][CI] Introducing new MI300 nodes (#39531)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-04-20 16:09:58 +08:00
Kevin H. Luu 629d45eacb [ci] Make ecr authenticate non blocking (#40305)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-04-19 15:37:53 -07:00
Michael Goin a8bffaa133 [Kernel] Add MXFP4 W4A4 CUTLASS MoE kernel for SM100 (#37463)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-04-17 16:42:32 -07:00
Ryan Rock 58da4ee047 [AMD][CI] Update DeepEP branch (#38396)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2026-04-17 14:30:20 -05:00
Li, Jiang d02421a7db [CPU] Refactor CPU affinity and memory management (#39781)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-04-17 21:01:08 +08:00
Sumanth R Hegde adf9bb3c57 [CI] Add weight transfer tests to CI (#39821)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-04-16 15:51:45 -04:00
Li, Jiang 324a3d2bd8 [CI/Build] Improve stability of CPU tests (#39966)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-04-16 21:50:36 +08:00
Yanan Cao edc3648966 [Kernel][Helion] Fix inductor fusion of Helion HOP (#39944)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 04:41:26 -07:00
Fadi Arafeh 445b7093fd [perf][cpu] Accelerate BF16 GELU with LUT impl on Arm CPUs (#37469)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-15 22:26:17 -07:00
Harry Mellor 03f8d3a548 Update to transformers v5 (#30566)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: khluu <khluu000@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
2026-04-15 16:29:15 -07:00
zhanqiuhu 0b790a2501 [Speculative Decoding] Add DFlash speculators config parsing (#38300)
Signed-off-by: Zhanqiu Hu <zhu@redhat.com>
2026-04-15 16:22:15 -04:00
Kevin H. Luu 102d51c9f3 [CI] Only build release Docker images when NIGHTLY=1 (#39882)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:01:13 +00:00
Monishver 21e5a9f48e Bug/test eagle dp v2 (#39838)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
2026-04-15 17:48:12 +00:00
Vibhav Agarwal f4b42df048 [Attention Backend] TurboQuant: 2-bit KV cache compression with 4x capacity (#38479)
Signed-off-by: vibhavagarwal5 <vibhavagarwal5@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-04-14 19:57:13 -07:00
Andrey Talman b569620f72 [CI] Add PyTorch nightly build and test pipeline (#37226)
Signed-off-by: atalman <atalman@fb.com>
2026-04-14 17:13:24 -07:00
Wentao Ye 2ad1029233 [Bug] Fix batch invariance nvfp4 support (#39820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-04-14 17:08:17 -04:00
bnellnm e1e318af01 [MoE Refactor] Remove MoE DP chunking (#39107)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-04-14 09:48:05 -04:00
Monishver 8213e8f880 Bug/test eagle dp v0 (#38938)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-04-13 20:50:08 +00:00
Andreas Karatzas 4e4ad41d11 [ROCm][CI] Removed stale tests and extended acceptance test (#39651)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-04-13 10:40:26 +08:00