Kaiyu Xie
|
bb5b16fcb9
|
feat: Return context response immediately when stream_interval > 1 (#5836)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-09 00:19:57 +09:00 |
|
Yegor
|
b01d1c28f7
|
[feat] Detokenize option in /v1/completions request (#5382)
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
|
2025-07-08 19:36:04 +08:00 |
|
Yiqing Yan
|
ec0d7e64b9
|
[Infra] - Waive L0 test (#5837)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-07-08 17:54:06 +08:00 |
|
Enwei Zhu
|
55f86ce7ab
|
[NvBug 5362426] fix: Fix prompt adapter TP2 case (#5782)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-08 16:01:36 +09:00 |
|
nv-guomingz
|
0be41b6524
|
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818)
|
2025-07-08 13:15:30 +09:00 |
|
Yechan Kim
|
5bc3a15f10
|
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-07 18:03:12 -07:00 |
|
nv-guomingz
|
5a8173c121
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-08 08:52:36 +08:00 |
|
Bo Li
|
9db2e9ee47
|
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-07 14:58:32 +08:00 |
|
Shunkangz
|
32339d1b20
|
Raise shut down error for each request (#4936)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-04 18:58:24 +09:00 |
|
Emma Qiao
|
a0135c0f6f
|
[Infra] - Waive failed cases on release/0.21 (#5674)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-04 13:14:13 +08:00 |
|
nv-guomingz
|
d0b3d2ac65
|
fix:https://nvbugs/5362398 (#5609)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Yan Chunwei
|
77288d3671
|
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Omer Ullman Argov
|
c72856188c
|
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-03 08:06:10 -04:00 |
|
Emma Qiao
|
2a5fdebf10
|
[Infra] - Waive failed tests for main 0702 (#5671)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 22:05:07 -04:00 |
|
Yan Chunwei
|
2d69b55fe8
|
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-02 14:21:37 +08:00 |
|
Emma Qiao
|
178fc3f655
|
[Infra][release/0.21] - waive failed tests (#5537)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 20:12:55 +08:00 |
|
Yan Chunwei
|
ee7fcbf20e
|
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Emma Qiao
|
65c2b93284
|
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 05:01:32 -04:00 |
|
nv-guomingz
|
6e48ac25a6
|
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 12:23:14 -04:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Talor Abramovich
|
70e34a3291
|
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376)
Signed-off-by: Talor Abramovich <talora@nvidia.com>
|
2025-06-29 12:46:30 +00:00 |
|
Emma Qiao
|
9db769ee62
|
[Infra] - Add import pytest (#5565)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-29 11:06:14 +08:00 |
|
Aurelien Chartier
|
833c0dea4a
|
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI (#5497)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-27 17:03:05 +02:00 |
|
Emma Qiao
|
980030c816
|
[Infra] - Waive failed case in post-merge (#5536)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-27 13:55:49 +08:00 |
|
Yibin Li
|
0f3bd7800e
|
[TRTLLM-4971]: Use safe deserialization in ParallelConfig (#4630)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-06-27 09:58:41 +08:00 |
|
Omer Ullman Argov
|
6bae76d7ca
|
[fix][ci] move torch tests to run under torch stage (#5473)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 14:31:38 +03:00 |
|
QI JUN
|
3a2c4ca77b
|
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 04:32:46 +08:00 |
|
Netanel Haber
|
3ca2f6ac51
|
start OAIServer with max_beam_width=1 for TorchSampler (#5427)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-06-25 15:52:06 +08:00 |
|
Emma Qiao
|
475272046a
|
[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-24 17:19:31 +08:00 |
|
Yan Chunwei
|
9bd42ecf9b
|
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-20 03:01:10 +08:00 |
|
Yan Chunwei
|
3946e798db
|
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-19 06:13:53 +08:00 |
|
Wanli Jiang
|
3a02489e86
|
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-06-18 15:12:49 +08:00 |
|
Enwei Zhu
|
babdd9ce06
|
test: Add json_mode_eval for guided decoding evaluation (#5179)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-16 10:03:55 +08:00 |
|
Yan Chunwei
|
c84e41fd9d
|
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-15 17:51:56 -07:00 |
|
ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
Yibin Li
|
b79eb34bfe
|
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-06-13 11:37:50 +08:00 |
|
Bo Li
|
1b79041f5d
|
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlock. (#4264)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-06-11 09:38:10 +08:00 |
|
Yukun He
|
137fe35539
|
fix: Fix warmup phase batch size out of range. (#4986)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-09 19:19:16 +08:00 |
|
QI JUN
|
1b963c17c0
|
CI: waive test_llm_multi_node_with_postproc (#4977)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-06 14:19:56 +08:00 |
|
QI JUN
|
b8c5e3892b
|
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 17:43:30 +08:00 |
|
QI JUN
|
91e8d43d66
|
CI: waive test_llm_get_queued_stats (#4945)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 16:44:56 +08:00 |
|
xinhe-nv
|
1c3091c63b
|
tests: [TRTQA-2906] add benchmark serving tests (#4901)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-05 14:33:03 +08:00 |
|
Netanel Haber
|
ddbaa5ef80
|
Only pass fast_build=true to non-pytorch backend (#4920)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-06-05 13:30:17 +08:00 |
|
Yan Chunwei
|
8e0d96fcc6
|
fix: LLM invalid arg in a test (#4922)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-05 08:00:32 +08:00 |
|
Yan Chunwei
|
ac20159d32
|
fix: build_config in TorchLlmArgs and avoid invalid args (#4600)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-04 13:17:29 +08:00 |
|
Shi Xiaowei
|
b13f8c9cba
|
Fix: NVBug 5302895 (#4835)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-06-04 09:31:39 +08:00 |
|
pcastonguay
|
01f29ce38b
|
[nvbug 5294316] fix: Fix queued request stats (#4714)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-06-03 08:33:08 -04:00 |
|
Enwei Zhu
|
0087bd27ba
|
[fix] Fix SamplingParams check on n and best_of (#4655)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-01 09:11:55 +08:00 |
|
Enwei Zhu
|
ee916da8f1
|
test: Waive test_llm_loading_from_ckpt_for_tp2 (#4797)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-05-30 15:43:00 +08:00 |
|
Jhao-Ting Chen
|
fcadce9f8d
|
[fix] Eagle-2 LLMAPI pybind argument fix. (#3967)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
|
2025-05-29 12:23:25 -07:00 |
|