109 Commits

Author SHA1 Message Date
wang.yuqi b623f7ea95 [Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-06-02 06:30:21 -07:00
Ilya Markov 4aaba00f92 [EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-05-29 18:07:16 +00:00
wang.yuqi 1b26fa361e [Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-25 13:44:39 +08:00
wang.yuqi 257af77bc2 [Docs] Reorganize online serving docs. (#41907)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-19 14:43:18 +08:00
wang.yuqi 1d694e78c9 [Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-07 19:42:12 -07:00
wang.yuqi 51c1ee9b7c [Examples] Resettle Disaggregated examples. (#40759)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-06 01:20:38 -07:00
Wentao Ye 577b9623e6 [Bug] Fix status update address for non-MOE model within external dp mode (#40839)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-05-04 16:37:16 -07:00
Chauncey ae3b4deb8a [Doc] Add Codex usage example (#41358)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-05-01 22:27:43 -07:00
wang.yuqi a8208e6a81 [Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-28 00:33:41 -07:00
wang.yuqi 8d8062d0a7 [Examples] Resettle generate examples. (#36464)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-04-27 07:48:37 +00:00
Ilya Markov 50dd4cb427 [EPLB] Add nixl-based eplb communicator (#36276)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
2026-04-20 10:24:23 +00:00
z1ying d0697cc7b6 [Doc] Add Realtime Transcription section to supported_models.md (#39845)
Signed-off-by: Ziying Tao <tzzying@outlook.com>
2026-04-18 03:26:14 +00:00
wang.yuqi 4e8c3f1c19 [Frontend][last/5] Improve pooling entrypoints | clean up. (#39675)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-16 07:53:23 -07:00
Vedant V Jhaveri 2e56975657 Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-31 16:02:11 -07:00
bnellnm 91be5f9be3 [MoE Refactor] Rename "naive" all2all backend (#36294)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-03-19 15:50:34 -04:00
wang.yuqi f9e2a38386 [Docs] Reorganize pooling docs. (#35592)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-19 11:25:47 +00:00
Walter Beller-Morales 061980c36a [Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
2026-03-16 19:55:53 -04:00
leo-cf-tian 2754231ba3 [Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
2026-03-15 23:45:32 -07:00
Nick Hill 262b76a09f [Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 01:20:34 +00:00
Alex Brooks 65a4da1504 [Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
2026-03-09 05:46:23 +00:00
wang.yuqi dcf8862fd4 [Examples][1/n] Resettle basic examples. (#35579)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-08 20:22:53 -07:00
Wentao Ye 384425f84e [Dependency] Remove default ray dependency (#36170)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-08 20:06:22 -07:00
Harry Mellor a0f44bb616 Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-08 20:05:24 -07:00
lif 00b814ba5a [V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
2026-03-07 22:09:55 +08:00
Martin Hickey b602e4f299 [Doc] Fix link to Llama chat template for usability (#35525)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-27 17:51:09 +00:00
Tyler Michael Smith eb19955c37 [WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 14:30:10 -08:00
wang.yuqi 22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-09 06:42:38 +00:00
Patrick von Platen 15e0bb9c42 [Streaming -> Realtime] Rename all voxtral related classes, fn, files (#33415)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2026-01-31 04:49:00 +00:00
Patrick von Platen 10152d2194 [Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-30 18:41:29 +08:00
graftim d697581a7c [Doc] Update outdated link to Ray documentation (#32660)
Signed-off-by: graftim <38649219+graftim@users.noreply.github.com>
2026-01-29 00:56:06 -08:00
Didier Durand 31b25f6516 [Doc]: fixing multiple typos in diverse files (#33256)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 16:52:03 +08:00
ruizcrp c0d820457a Auth_token added in documentation as it is required (#32988)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-24 03:03:05 +00:00
sangbumlikeagod 9b77bb790d [Frontend] add logprob, compression_rate to 'verbose_json' features (#31059)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
2026-01-23 16:35:13 +00:00
wang.yuqi 05f3d714db [Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest (#32905)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
wang.yuqi 328cbb2773 [Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-22 10:32:44 +00:00
wang.yuqi c88860d759 [Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-19 14:07:46 +00:00
wang.yuqi 4ae77dfd42 [Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-16 06:17:04 +00:00
Andrew Bennett f243abc92d Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
2026-01-13 03:41:47 +00:00
RickyChen / 陳昭儒 a5f89ae296 [Doc] Add documentation for offline API docs feature (#32134)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
2026-01-12 10:33:48 +00:00
wang.yuqi 60446cd684 [Model] Improve multimodal pooling examples (#32085)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-12 07:54:09 +00:00
Chauncey 1da3a5441a [Docs]: update claude code url (#31971)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-08 14:04:55 +00:00
Michael Goin 6b2a672e47 [Doc] Add Claude code usage example (#31188)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-08 13:50:23 +08:00
Jakub Zakrzewski 23daef548d [Frontend] Support using chat template as custom score template for reranking models (#30550)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-23 11:19:16 +00:00
Michael Goin 6d518ffbaa [CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests (#31182)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-22 15:40:35 -08:00
Andrew Xia 4c054d89aa [Doc][ResponsesAPI] add documentation (#30840)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-17 01:53:02 -08:00
Didier Durand 1a55cfafcb [Doc]: fixing typos in various files (#30540)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 02:14:37 -08:00
Isotr0py 7c16f3fbcc [Doc] Add documents for multi-node distributed serving with MP backend (#30509)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-13 18:02:29 +00:00
lif ddbfbe5278 [Docs] Clarify Expert Parallel behavior for attention and MoE layers (#30615)
Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-13 08:37:59 -09:00
Harry Mellor 93db3256a4 Give pooling examples better names (#30488)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 16:22:58 +00:00
Seiji Eicher b9e0951f96 [docs] Improve wide-EP performance + benchmarking documentation (#27933)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-12-10 22:15:54 +00:00