Yan Chunwei
|
5eae3184fa
|
[None][chore] add missing tests to test list (#6590)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-08-06 22:12:27 +08:00 |
|
Pengyun Lin
|
79fc2f48c0
|
[None][chore] Enhance trtllm-serve example test (#6604)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-08-06 20:30:35 +08:00 |
|
ixlmar
|
1ebceb790d
|
[TRTLLM-5508][feat] check input tokens + improve error handling (#5170)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-08-05 18:27:43 +01:00 |
|
Venky
|
61da2daeb4
|
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-08-05 07:14:24 -07:00 |
|
Haohang Huang
|
c9eebcb454
|
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
|
2025-08-05 07:47:41 +00:00 |
|
brb-nv
|
87e4e9f468
|
[None][chore] Add unit test for Gemma3 lora (#6560)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-04 04:56:57 -04:00 |
|
Pengyun Lin
|
a15e33351d
|
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens (#6259)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-08-04 15:09:51 +08:00 |
|
Leslie Fang
|
b9fe0fa7ec
|
[None][infra] Enable test of chunked prefill with logit post processor (#6483)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-04 01:46:07 -04:00 |
|
Richard Huo
|
31802de0b0
|
[None][fix] Serialize the window_size in the kv event (#6526)
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
|
2025-08-01 15:25:18 -07:00 |
|
shaharmor98
|
0c42f54a39
|
Bugfix/fix nemotron nas lora support (#6380)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-07-31 13:39:35 -04:00 |
|
amitz-nv
|
1ee7a08d2b
|
[5830][feat] Improve LoRA cache memory control (#6220)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-31 09:26:38 +03:00 |
|
nv-guomingz
|
03e38c9087
|
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 11:11:06 -04:00 |
|
nv-guomingz
|
a5540acfce
|
chore: add trtllm-serve json schema example into doc. (#6418)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 04:33:08 -04:00 |
|
Yan Chunwei
|
ad662ddcdd
|
chore: disallow arbitrary in llm_args.Configs (#6367)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-29 16:16:52 -04:00 |
|
Michal Guzek
|
7efe3cb0cd
|
[fix] Add detokenization-based stop word logic to LLM API (#5948)
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
2025-07-29 10:16:59 -07:00 |
|
Yan Chunwei
|
45d441e60c
|
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 15:57:07 +08:00 |
|
nv-guomingz
|
b8d4cb8beb
|
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
|
2025-07-25 12:55:56 -04:00 |
|
pcastonguay
|
3805976e90
|
fix: Fixing kv_cache_events unit tests [nvbug 5362412] (#6265)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-25 08:55:44 -04:00 |
|
xiaoqi
|
a0aecf0476
|
[feat]: support logit_bias (#5354)
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-25 09:37:41 +00:00 |
|
Venky
|
9538c8d0e5
|
Add basic Nemo Ckpt Lora Loading in pytorch flow (#6019)
|
2025-07-22 19:42:45 -07:00 |
|
Pengyun Lin
|
48ddc3d4b9
|
[fix]: Revert commit 388b491 (#6143)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Yan Chunwei
|
f194b65f3e
|
fix [nvbug/5351244]: address remote mpi session submit (#5664)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Bo Li
|
db77d83a2a
|
bug: [https://nvbugs/5368507] Fix test_generate_with_seed. (#6206)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:28:38 +08:00 |
|
Pengyun Lin
|
9832bef07d
|
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-21 21:09:43 +08:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Emma Qiao
|
77acb4f753
|
[Infra] - Waive failed tests in post-merge (#6176)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-18 17:34:34 +08:00 |
|
nv-guomingz
|
9b45499caa
|
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-17 18:05:45 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|
Wanli Jiang
|
9354114f68
|
fix: Update trtllm args issues with extra nested config (#5996)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-16 12:41:45 -04:00 |
|
Yan Chunwei
|
7568deb2f1
|
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig (#6001)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-16 16:05:38 +08:00 |
|
nv-guomingz
|
4e4d18826f
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-15 15:50:03 +09:00 |
|
Kaiyu Xie
|
aa97fbb2ad
|
[Nvbug/5383670] fix: switch test case to non-fp4 ckpt for more GPU coverage (#5882)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-14 20:21:46 +09:00 |
|
Pengyun Lin
|
3fcaa8a310
|
[nvbug 5327706][fix] fix mgmn postprocess error (#5835)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Yan Chunwei
|
3e1fd983c3
|
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
388b4919b8
|
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Pengyun Lin
|
6992616c1f
|
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
dominicshanshan
|
c9e7f831dc
|
Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default (#5480)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-07-14 16:42:23 +08:00 |
|
QI JUN
|
ce39409530
|
fix cancel request logic (#5800)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-14 10:23:20 +08:00 |
|
wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Yiqing Yan
|
3aa53ec36c
|
[None] - Waive L0 tests (#5915)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-10 18:33:17 +08:00 |
|
Enwei Zhu
|
055c4a9fe6
|
[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-10 16:30:00 +08:00 |
|
Yan Chunwei
|
07f6da763d
|
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-10 11:31:35 +08:00 |
|
Kaiyu Xie
|
bb5b16fcb9
|
feat: Return context response immediately when stream_interval > 1 (#5836)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-09 00:19:57 +09:00 |
|
Yegor
|
b01d1c28f7
|
[feat] Detokenize option in /v1/completions request (#5382)
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
|
2025-07-08 19:36:04 +08:00 |
|
Yiqing Yan
|
ec0d7e64b9
|
[Infra] - Waive L0 test (#5837)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-08 17:54:06 +08:00 |
|
Enwei Zhu
|
55f86ce7ab
|
[NvBug 5362426] fix: Fix prompt adapter TP2 case (#5782)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-08 16:01:36 +09:00 |
|
nv-guomingz
|
0be41b6524
|
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818)
|
2025-07-08 13:15:30 +09:00 |
|
Yechan Kim
|
5bc3a15f10
|
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-07 18:03:12 -07:00 |
|
nv-guomingz
|
5a8173c121
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-08 08:52:36 +08:00 |
|
Bo Li
|
9db2e9ee47
|
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-07 14:58:32 +08:00 |
|