Ziyi Xiong
|
de472828b9
|
[TRTLLM-6637][feat] Resolve KV cache divergence issue (#6628)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-08-09 23:15:04 +08:00 |
|
Daniel Cámpora
|
efca359b66
|
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-08-07 22:19:37 -04:00 |
|
Iman Tabrizian
|
82276167e6
|
[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-08-07 17:28:14 -07:00 |
|
pcastonguay
|
453a06e6ab
|
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events (#6563)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-08-07 14:17:07 +02:00 |
|
hlu1
|
8207d5fd39
|
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-07 03:04:18 -04:00 |
|
amitz-nv
|
85af62184b
|
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6510)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-08-07 09:05:36 +03:00 |
|
ixlmar
|
1ebceb790d
|
[TRTLLM-5508][feat] check input tokens + improve error handling (#5170)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-08-05 18:27:43 +01:00 |
|
Olya Kozlova
|
13cc1c4878
|
[TRTLLM-5271][feat] best_of/n for pytorch workflow (#5997)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
2025-08-04 14:08:06 +02:00 |
|
Yuan Tong
|
a2f271c8e0
|
[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-08-04 13:51:01 +08:00 |
|
Robin Kobus
|
918fedf952
|
[None][refactor] Simplify finish reasons handling in DecoderState (#6524)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-08-02 07:17:43 +02:00 |
|
Robin Kobus
|
d3c14682f0
|
refactor: Remove unused buffers and bindings from sampler (#6484)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-08-01 00:43:03 -04:00 |
|
Michal Guzek
|
08d57123f9
|
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-25 18:10:40 -04:00 |
|
Linda
|
9a99e6d6d7
|
fix: integration tests with nanobind (#6326)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-25 09:23:20 +08:00 |
|
Linda
|
60073731ca
|
fix: bindings unit tests for nanobind (#6221)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-22 14:51:43 +01:00 |
|
Linda
|
3efad2e58c
|
feat: nanobind bindings (#6185)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-21 08:56:57 +01:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Robin Kobus
|
ec2b953e7e
|
refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-18 12:12:08 +02:00 |
|
Iman Tabrizian
|
b75e53ab69
|
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-18 10:12:54 +08:00 |
|
Linda
|
5bff317abf
|
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-17 22:42:52 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|
qixiang-99
|
e09e409dfb
|
Fix: Enhance ModelConfig for kv cache size calculations (#5868)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
|
2025-07-16 14:41:31 -07:00 |
|
Robin Kobus
|
6d4b045d1f
|
refactor: Remove enforced sorted order of batch slots (#3502)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-14 17:23:02 +02:00 |
|
Linda
|
4d071eb2d1
|
feat: binding type build argument (pybind, nanobind) (#5802)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-11 00:48:50 +09:00 |
|
xavier-nvidia
|
b6013da198
|
Fix GEMM+AR fusion on blackwell (#5563)
Signed-off-by: xsimmons <xsimmons@nvidia.com>
|
2025-07-09 08:48:47 +08:00 |
|
Daniel Cámpora
|
1260e2f33f
|
feat: Optimize TRTLLM Sampler perf single beam single step (#5550)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-07-07 15:44:47 +02:00 |
|
Robin Kobus
|
ae27261094
|
refactor: decoding inputs (#5679)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-06 08:21:02 +02:00 |
|
jthomson04
|
1b588f8390
|
feat: KV events for sliding window attention (#5580)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
|
2025-07-05 06:05:20 +08:00 |
|
Stefan Niebler
|
d1112aac37
|
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-05 01:35:13 +09:00 |
|
Robin Kobus
|
07f9cf1519
|
fix: Improve chunking test and skip empty kernel calls (#5710)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-04 09:08:15 +02:00 |
|
Robin Kobus
|
1a3bd140ed
|
chore: Remove unused isFullContextRequest method (#5666)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-03 15:08:09 +02:00 |
|
qixiang-99
|
ca7b6ec8d8
|
Feat/pytorch vswa kvcachemanager (#5151)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
|
2025-07-02 15:58:00 +08:00 |
|
Robin Kobus
|
9bdc5951f8
|
refactor: decoder state setup (#5093)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-30 11:09:43 +02:00 |
|
Robin Kobus
|
a8141a4513
|
refactor: Speculative decoding buffers part 2 (#5316)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-27 17:41:48 +02:00 |
|
Aurelien Chartier
|
833c0dea4a
|
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI (#5497)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-27 17:03:05 +02:00 |
|
wili
|
56cdfe5c6c
|
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-06-27 23:00:17 +08:00 |
|
Robin Kobus
|
8dfa31c71d
|
refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-26 19:45:52 +08:00 |
|
dongxuy04
|
490d2e5819
|
feat: large-scale EP(part 8: Online EP load balancer integration for PCIe fp8) (#5226)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-25 22:25:13 -07:00 |
|
Robin Kobus
|
e2a8cbc80b
|
refactor: manage cache indirection in decoder state (#5315)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-24 09:15:59 +02:00 |
|
Robin Kobus
|
b3045c44b9
|
refactor: remove TrtGptModelOptionalParams (#5165)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-20 10:31:40 +02:00 |
|
jellysnack
|
0623ffe3bc
|
feat: Add LLGuidance Support for PyTorch Backend (#5214)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-18 19:33:34 +08:00 |
|
Robin Kobus
|
627062c265
|
refactor: Update decoder buffer and logits management (#4450)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-18 08:10:32 +08:00 |
|
QI JUN
|
f899c4d294
|
Re-implement LlmResponse in Python to reduce host overhead of pybind (#5224)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-17 21:28:09 +08:00 |
|
Robin Kobus
|
b6ca677741
|
refactor: remove decoder request from decoder interface (#5129)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-16 09:12:30 +02:00 |
|
Robin Kobus
|
443b2eb51f
|
refactor: Speculative decoding buffers (#5091)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-14 11:39:32 +02:00 |
|
liji-nv
|
10ab9791ec
|
[fix] Do not reuse dummy request KVCache (#4804)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-06-12 15:24:50 +08:00 |
|
Netanel Haber
|
e692779ead
|
Solve underallocation in VSWA+/VGQA (#4667)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-06-12 12:12:46 +08:00 |
|
Tracin
|
6c91f1c7ac
|
Mxfp8xmxfp4 quant mode(#4978)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-10 22:01:37 +08:00 |
|
Daniel Cámpora
|
d68b8180d3
|
feat: port MakeDecodingBatchInputOutput to python in TRTLLMSampler (#4828)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-06-10 07:28:34 +08:00 |
|
Chang Liu
|
f70815c945
|
[TRTLLM-5007][feat] Add multimodal hashing support (image hashing) (#4145)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
|
2025-06-10 01:59:56 +08:00 |
|
dongxuy04
|
1e369658f1
|
feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4) (#4818)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-08 10:25:18 +08:00 |
|