wang.yuqi
|
b623f7ea95
|
[Frontend] Consolidate dev entrypoints. (#44170)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-06-02 06:30:21 -07:00 |
|
Ilya Markov
|
4aaba00f92
|
[EPLB] Make async EPLB default (#43219)
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-05-29 18:07:16 +00:00 |
|
wang.yuqi
|
1b26fa361e
|
[Docs] Reorganize offline inference docs. (#43552)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-25 13:44:39 +08:00 |
|
wang.yuqi
|
257af77bc2
|
[Docs] Reorganize online serving docs. (#41907)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-05-19 14:43:18 +08:00 |
|
wang.yuqi
|
1d694e78c9
|
[Examples][last/6] Resettle examples. (#41084)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-07 19:42:12 -07:00 |
|
wang.yuqi
|
51c1ee9b7c
|
[Examples] Resettle Disaggregated examples. (#40759)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-06 01:20:38 -07:00 |
|
Wentao Ye
|
577b9623e6
|
[Bug] Fix status update address for non-MOE model within external dp mode (#40839)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-05-04 16:37:16 -07:00 |
|
Chauncey
|
ae3b4deb8a
|
[Doc] Add Codex usage example (#41358)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-05-01 22:27:43 -07:00 |
|
wang.yuqi
|
a8208e6a81
|
[Examples] Resettle features examples. (#40995)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-28 00:33:41 -07:00 |
|
wang.yuqi
|
8d8062d0a7
|
[Examples] Resettle generate examples. (#36464)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-27 07:48:37 +00:00 |
|
Ilya Markov
|
50dd4cb427
|
[EPLB] Add nixl-based eplb communicator (#36276)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
|
2026-04-20 10:24:23 +00:00 |
|
z1ying
|
d0697cc7b6
|
[Doc] Add Realtime Transcription section to supported_models.md (#39845)
Signed-off-by: Ziying Tao <tzzying@outlook.com>
|
2026-04-18 03:26:14 +00:00 |
|
wang.yuqi
|
4e8c3f1c19
|
[Frontend][last/5] Improve pooling entrypoints | clean up. (#39675)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-16 07:53:23 -07:00 |
|
Vedant V Jhaveri
|
2e56975657
|
Generative Scoring (#34539)
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-31 16:02:11 -07:00 |
|
bnellnm
|
91be5f9be3
|
[MoE Refactor] Rename "naive" all2all backend (#36294)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-03-19 15:50:34 -04:00 |
|
wang.yuqi
|
f9e2a38386
|
[Docs] Reorganize pooling docs. (#35592)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 11:25:47 +00:00 |
|
Walter Beller-Morales
|
061980c36a
|
[Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-16 19:55:53 -04:00 |
|
leo-cf-tian
|
2754231ba3
|
[Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
|
2026-03-15 23:45:32 -07:00 |
|
Nick Hill
|
262b76a09f
|
[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-12 01:20:34 +00:00 |
|
Alex Brooks
|
65a4da1504
|
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
2026-03-09 05:46:23 +00:00 |
|
wang.yuqi
|
dcf8862fd4
|
[Examples][1/n] Resettle basic examples. (#35579)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:22:53 -07:00 |
|
Wentao Ye
|
384425f84e
|
[Dependency] Remove default ray dependency (#36170)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-08 20:06:22 -07:00 |
|
Harry Mellor
|
a0f44bb616
|
Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:05:24 -07:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Martin Hickey
|
b602e4f299
|
[Doc] Fix link to Llama chat template for usability (#35525)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-27 17:51:09 +00:00 |
|
Tyler Michael Smith
|
eb19955c37
|
[WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-26 14:30:10 -08:00 |
|
wang.yuqi
|
22b64948f6
|
[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-09 06:42:38 +00:00 |
|
Patrick von Platen
|
15e0bb9c42
|
[Streaming -> Realtime] Rename all voxtral related classes, fn, files (#33415)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2026-01-31 04:49:00 +00:00 |
|
Patrick von Platen
|
10152d2194
|
[Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-30 18:41:29 +08:00 |
|
graftim
|
d697581a7c
|
[Doc] Update outdated link to Ray documentation (#32660)
Signed-off-by: graftim <38649219+graftim@users.noreply.github.com>
|
2026-01-29 00:56:06 -08:00 |
|
Didier Durand
|
31b25f6516
|
[Doc]: fixing multiple typos in diverse files (#33256)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 16:52:03 +08:00 |
|
ruizcrp
|
c0d820457a
|
Auth_token added in documentation as it is required (#32988)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-24 03:03:05 +00:00 |
|
sangbumlikeagod
|
9b77bb790d
|
[Frontend] add logprob, compression_rate to 'verbose_json' features (#31059)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
|
2026-01-23 16:35:13 +00:00 |
|
wang.yuqi
|
05f3d714db
|
[Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest (#32905)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-23 12:03:44 +00:00 |
|
wang.yuqi
|
328cbb2773
|
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-22 10:32:44 +00:00 |
|
wang.yuqi
|
c88860d759
|
[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-19 14:07:46 +00:00 |
|
wang.yuqi
|
4ae77dfd42
|
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-16 06:17:04 +00:00 |
|
Andrew Bennett
|
f243abc92d
|
Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
|
2026-01-13 03:41:47 +00:00 |
|
RickyChen / 陳昭儒
|
a5f89ae296
|
[Doc] Add documentation for offline API docs feature (#32134)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
|
2026-01-12 10:33:48 +00:00 |
|
wang.yuqi
|
60446cd684
|
[Model] Improve multimodal pooling examples (#32085)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-12 07:54:09 +00:00 |
|
Chauncey
|
1da3a5441a
|
[Docs]: update claude code url (#31971)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-08 14:04:55 +00:00 |
|
Michael Goin
|
6b2a672e47
|
[Doc] Add Claude code usage example (#31188)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-08 13:50:23 +08:00 |
|
Jakub Zakrzewski
|
23daef548d
|
[Frontend] Support using chat template as custom score template for reranking models (#30550)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-23 11:19:16 +00:00 |
|
Michael Goin
|
6d518ffbaa
|
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests (#31182)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-22 15:40:35 -08:00 |
|
Andrew Xia
|
4c054d89aa
|
[Doc][ResponsesAPI] add documentation (#30840)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-12-17 01:53:02 -08:00 |
|
Didier Durand
|
1a55cfafcb
|
[Doc]: fixing typos in various files (#30540)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-12-14 02:14:37 -08:00 |
|
Isotr0py
|
7c16f3fbcc
|
[Doc] Add documents for multi-node distributed serving with MP backend (#30509)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-13 18:02:29 +00:00 |
|
lif
|
ddbfbe5278
|
[Docs] Clarify Expert Parallel behavior for attention and MoE layers (#30615)
Signed-off-by: majiayu000 <1835304752@qq.com>
|
2025-12-13 08:37:59 -09:00 |
|
Harry Mellor
|
93db3256a4
|
Give pooling examples better names (#30488)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 16:22:58 +00:00 |
|
Seiji Eicher
|
b9e0951f96
|
[docs] Improve wide-EP performance + benchmarking documentation (#27933)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-10 22:15:54 +00:00 |
|