obscura/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/vllm-project/vllm.git synced 2026-06-06 00:16:14 +00:00

Author	SHA1	Message	Date
Chunyang Wen	efc347f1b2	docs: fix tokenizer optimization typo (#44066 ) Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>	2026-06-05 02:12:49 -07:00
Chunyang Wen	f191d5630e	docs: clarify ITL acronym in optimization docs (#43922 ) Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>	2026-05-29 07:40:05 -07:00
Nick Hill	0f66623b0d	[Frontend] Rework fastokens integration (#43168 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-05-21 15:36:58 -07:00
wang.yuqi	257af77bc2	[Docs] Reorganize online serving docs. (#41907 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-05-19 14:43:18 +08:00
AlonKejzman	2a16ece2d3	tokenizer: Add fastokens support (#41741 ) Signed-off-by: AlonKejzman <alonkeizman@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-05-07 22:49:42 +08:00
wang.yuqi	a8208e6a81	[Examples] Resettle features examples. (#40995 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-04-28 00:33:41 -07:00
pschlan-amd	0098db9ec1	[ROCm] Implement GPU-to-NUMA-node detection (#40015 ) Signed-off-by: Patrick Schlangen <pschlan@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-04-23 10:08:48 -05:00
Shengqi Chen	75e01a39a1	[Feature] NUMA binding support for GPU workers (#38635 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Co-authored-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-04-08 09:55:24 -07:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00
Michael Goin	c39ee9ee2b	[Docs] Add sections on process architecture and minimum CPU resources (#33940 ) It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-06 15:26:43 +00:00
Matthew Bonanni	77c4f45c6c	[7/N][Attention][Docs] Add documentation for attention backends (#32477 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-28 17:20:22 -05:00
Vincent Gimenes	0b53bec60b	[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109 ) Signed-off-by: Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com>	2026-01-27 03:05:02 +00:00
Harry Mellor	0b544e6476	[Docs] Fix some snippets (#31378 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-26 12:47:41 +00:00
Didier Durand	1a55cfafcb	[Doc]: fixing typos in various files (#30540 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-14 02:14:37 -08:00
Cyrus Leung	389aa1b2eb	[Doc] Update more docs with respect to V1 (#29188 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-23 10:58:48 +08:00
Rob Mulla	dd39f91edb	[Doc] cleanup TPU documentation and remove outdated examples (#29048 ) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-21 00:05:59 +00:00
Harry Mellor	97cfa99d59	[Docs] Take env var definition out of folded admonition (#29005 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 03:32:04 -08:00
Cyrus Leung	89d3679221	[Doc] Fix failing doc build (#28772 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-15 05:33:27 -08:00
Rob Mulla	70bfbd7b16	Docs update tpu install instructions (#27824 ) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-31 10:29:55 -07:00
Harry Mellor	483ea64611	[Docs] Replace all explicit anchors with real links (#27087 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:22:06 -07:00
Harry Mellor	4ffd6e8942	[Docs] Reduce custom syntax used in docs (#27009 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 20:05:34 -07:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Cyrus Leung	ef9676a1f1	[Doc] ruff format some Python examples (#26767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 03:21:53 -07:00
Wenlong Wang	43ab8cfaa5	[MM][Doc] Add documentation for configurable mm profiling (#26200 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-10-08 23:21:20 -07:00
Cyrus Leung	633f943e30	[Doc] Update Batch-level DP docs (#25757 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-26 02:37:40 -07:00
YiwenC	52bc9d5b3e	[Model] enable data parallel for InternVL vision encoder (#23909 ) Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu> Signed-off-by: YiwenC <54658925+666even666@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-17 21:11:46 -07:00
dongluw	a5b84f1cbf	[Core] Shared memory based object store for Multimodal data caching and IPC (#20452 ) Signed-off-by: donglu <donglu@cohere.com>	2025-09-12 07:54:17 -07:00
co63oc	1bd007f234	fix some typos (#24071 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-02 20:44:50 -07:00
WeiQing Chen	2f0bab3f26	[Model] Support dp on ViT on GLM-4.5V (#23168 ) Signed-off-by: David Chen <530634352@qq.com>	2025-09-02 10:48:18 +00:00
WeiQing Chen	a0e0efd6bd	[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Junhong <liujunhong11@huawei.com> Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-09-01 16:56:56 +00:00
Jiangyun Zhu	3a6acad431	[Model] Enable encoder DP for MiniCPM-V (#23948 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-08-30 06:31:26 -07:00
Cyrus Leung	fe8d7b6f03	[Model] Interface to enable batch-level DP support (#23733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-27 06:41:22 -07:00
Michael Yao	1f7a9c95e4	[Docs] Fix a 1-2-3 list and style issues in tpu.md (#23729 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-08-27 05:37:52 -07:00
Michael Yao	5bd9f84158	[Docs] Fix an admonition important (#23726 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-08-27 02:50:09 -07:00
Cyrus Leung	69244e67e6	[Core] Use key-only cache for `BaseMultiModalProcessor` (#23018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-27 14:19:13 +08:00
Didier Durand	7c04779afa	[Doc]: fix various spelling issues in multiple files (#23636 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-26 14:05:29 +00:00
Cyrus Leung	e269be2ba2	[Doc] Add caution for API server scale-out (#23550 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 06:14:15 -07:00
WeiQing Chen	23c939fd30	[Model] Support DP for ViT on MiniCPM-V-4 (#23327 ) Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>	2025-08-23 02:14:41 +00:00
Cyrus Leung	5cc54f7c5b	[Doc] Fix batch-level DP example (#23325 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-21 06:16:38 -07:00
Cyrus Leung	5efd6905bc	[CLI][Doc] Formalize `--mm-encoder-tp-mode` (#23190 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-20 23:42:28 +08:00
Tialo	2c3f557f08	[Doc] use power of 2 (#23172 )	2025-08-19 03:16:23 -07:00
Harry Mellor	bc1d02ac85	[Docs] Add comprehensive CLI reference for all large `vllm` subcommands (#22601 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 00:13:33 -07:00
Harry Mellor	00976db0c3	[Docs] Fix warnings in docs build (#22588 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-10 05:49:51 -07:00
Harry Mellor	56186474f6	[Docs] Reduce noise in docs and `--help` from the JSON tip (#22567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 08:31:32 -07:00
Cyrus Leung	139d155781	[Frontend] Use engine argument to control MM cache size (#22441 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 09:47:10 -07:00
Cyrus Leung	766bc8162c	[Core] Store only the keys for multi-modal data in P0 (#22198 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 01:45:04 -07:00
Chen Zhang	76080cff79	[DOC] Fix path of v1 related figures (#21868 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-29 19:45:18 -07:00
Harry Mellor	ba5c5e5404	[Docs] Switch to better markdown linting pre-commit hook (#21851 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 19:45:08 -07:00

1 2

61 Commits