Emma Qiao
65c2b93284
[Infra] - Add some timeout and unwaive a test which dev fixed ( #5631 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-01 05:01:32 -04:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Talor Abramovich
70e34a3291
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve ( #5376 )
...
Signed-off-by: Talor Abramovich <talora@nvidia.com>
2025-06-29 12:46:30 +00:00
Emma Qiao
9db769ee62
[Infra] - Add import pytest ( #5565 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-29 11:06:14 +08:00
Aurelien Chartier
833c0dea4a
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI ( #5497 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-27 17:03:05 +02:00
Emma Qiao
980030c816
[Infra] - Waive failed case in post-merge ( #5536 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-27 13:55:49 +08:00
Yibin Li
0f3bd7800e
[TRTLLM-4971]: Use safe deserialization in ParallelConfig ( #4630 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-06-27 09:58:41 +08:00
Omer Ullman Argov
6bae76d7ca
[fix][ci] move torch tests to run under torch stage ( #5473 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 14:31:38 +03:00
QI JUN
3a2c4ca77b
chore: split _build_model method for TorchLlm and TrtLlm ( #5418 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 04:32:46 +08:00
Netanel Haber
3ca2f6ac51
start OAIServer with max_beam_width=1 for TorchSampler ( #5427 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-06-25 15:52:06 +08:00
Emma Qiao
475272046a
[Infra] - Waive failed tests in post-merge and increase some timeout setting ( #5424 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-24 17:19:31 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Yan Chunwei
3946e798db
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances ( #4727 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-19 06:13:53 +08:00
Wanli Jiang
3a02489e86
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support ( #5159 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-18 15:12:49 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation ( #5179 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
ixlmar
e055af1bc9
chore: improve disagg test failure detection ( #4738 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-15 01:28:26 +08:00
Yibin Li
b79eb34bfe
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn ( #5074 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-06-13 11:37:50 +08:00
Bo Li
1b79041f5d
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlock. ( #4264 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-11 09:38:10 +08:00
Yukun He
137fe35539
fix: Fix warmup phase batch size out of range. ( #4986 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-09 19:19:16 +08:00
QI JUN
1b963c17c0
CI: waive test_llm_multi_node_with_postproc ( #4977 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-06 14:19:56 +08:00
QI JUN
b8c5e3892b
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" ( #4949 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 17:43:30 +08:00
QI JUN
91e8d43d66
CI: waive test_llm_get_queued_stats ( #4945 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 16:44:56 +08:00
xinhe-nv
1c3091c63b
tests: [TRTQA-2906] add benchmark serving tests ( #4901 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-05 14:33:03 +08:00
Netanel Haber
ddbaa5ef80
Only pass fast_build=true to non-pytorch backend ( #4920 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-06-05 13:30:17 +08:00
Yan Chunwei
8e0d96fcc6
fix: LLM invalid arg in a test ( #4922 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-05 08:00:32 +08:00
Yan Chunwei
ac20159d32
fix: build_config in TorchLlmArgs and avoid invalid args ( #4600 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-04 13:17:29 +08:00
Shi Xiaowei
b13f8c9cba
Fix: NVBug 5302895 ( #4835 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-04 09:31:39 +08:00
pcastonguay
01f29ce38b
[nvbug 5294316] fix: Fix queued request stats ( #4714 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-03 08:33:08 -04:00
Enwei Zhu
0087bd27ba
[fix] Fix SamplingParams check on n and best_of ( #4655 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-01 09:11:55 +08:00
Enwei Zhu
ee916da8f1
test: Waive test_llm_loading_from_ckpt_for_tp2 ( #4797 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-30 15:43:00 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. ( #3967 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
Yiqing Yan
7f29a70f53
Waive L0 test ( #4748 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-29 11:05:27 +08:00
Yan Chunwei
ac17142495
chore: rename ExecutorBindingsWorker/Proxy ( #4716 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 10:32:35 +08:00
Erin
820c39041f
chore: [nvbug_5273941] unwaive test_llm_loading_from_ckpt_for_tp2 ( #4725 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-29 06:54:32 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Pengyun Lin
971d16a2ee
[TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT backend ( #4623 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-28 11:36:44 +08:00
Yiqing Yan
2fee408536
Waive L0 tests ( #4645 )
...
* Waive L0 tests
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* Apply suggestions from code review
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 11:05:01 +08:00
bhsueh_NV
6527c055cf
chore: fix bug of llama lora test ( #4566 )
...
* fix bug of llama lora test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* Update test_llm.py
fix bug detected by pre-commit
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-23 14:06:40 +08:00
coldwaterq
1cf0e672e7
fix: [nvbugs/5066257] serialization improvments ( #3869 )
...
* added a restricted pcikler and depickler in a sepparate serialization function.
Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
* updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests.
Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
* removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list.
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
* cleaned up a couple files to reduce conflicts with main.
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
* fix unit tests
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* reorder BASE_ZMQ_CLASSES list alphabetically
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* fix tests and move LogitsProcessor registration to base class
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* revert changes to import log of tensorrt_llm._torch.models
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
* fix tests and move LogitsProcessor registration to base class
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* additional comments for multiprocess approved list sync
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* add dataclass from tests
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
---------
Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Co-authored-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-05-23 13:06:29 +08:00
Kaiyu Xie
2898d268f9
feat: add health_generate route to openai serving (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856 ) ( #4349 )
...
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Dhruv Singal <dhruvsingalabc@gmail.com>
2025-05-22 11:46:06 +08:00
Yan Chunwei
4798d088d9
chore: Partition LlmArgs into TorchLlmArgs and TrtLlmArgs ( #3823 )
...
* partition LlmArgs
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* update backend
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-22 09:40:56 +08:00
Yan Chunwei
174c5188a2
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu ( #4428 )
...
* add test
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-20 20:16:14 +08:00
Yan Chunwei
5b1c88de8d
chore: cleanup perf_evaluator code ( #3833 )
...
* chore: cleanup perf_evaluator code
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* up
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-19 13:21:36 +08:00
Pengyun Lin
039f7e3118
[ https://nvbugspro.nvidia.com/bug/5243740 ][fix] deduce default max_tokens for trtllm-serve ( #4265 )
...
* Deduce default max_tokens for trtllm-serve
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
* Improve executor_config.max_seq_len assignment in TRT workflow
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
* Enhance error message
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
* Add deduced max_tokens test
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
---------
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-19 00:34:40 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API ( #4180 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Yechan Kim
c6e2111f4e
feat: enhance trtllm serve multimodal ( #3757 )
...
* feat: enhance trtllm serve multimodal
1. made the load_image and load_video asynchronous
2. add image_encoded input support to be compatible with genai-perf
3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix bandit
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* trimming uils
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* trimming for test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* genai perf command fix
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* command fix
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* refactor chat_utils
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* stress test genai-perf command
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-05-15 16:16:31 -07:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
pcastonguay
9643be5f20
[TRTLLM-5050][feat] Enable per-request stats with PyT backend ( #4156 )
...
* feat: Add per-request stats support with PyT backend
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding unit test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing stats unit test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing test with overlap
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-12 21:35:15 -04:00