Martin Marciniszyn Mehringer
47a765d732
doc: Include NGC release containers in quick-start-guide.md ( #5334 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-06-19 15:41:57 +08:00
Emma Qiao
75aa06b446
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 ( #5360 ) ( #5362 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 15:29:05 +08:00
Emma Qiao
473125679c
[Infra] Cherry-pick for 5080 / 5090 gpu name fixing ( #5332 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-18 20:54:55 +08:00
Kaiyu Xie
7965842954
[doc] Update Perf-Overview.MD with V0.20 Release Data (cherry-pick #5176 ) ( #5324 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-06-18 17:44:03 +08:00
Yingge He
109f28ed3f
test: Deprecate gpt_model_type "v1" static batching from triton_backend L0_backend_trtllm ( #5229 )
...
Signed-off-by: Yingge He <yinggeh@nvidia.com>
2025-06-17 14:47:03 +08:00
ruodil
e05b3ff427
test: add deepseek_v3_lite rcca cases ( #5225 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-16 13:39:26 +08:00
nv-guomingz
cbc6455266
doc:add release notes for v0.20.0 ( #5150 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-16 09:27:57 +08:00
ruodil
3f284f1a3a
test: add deepseek rcca cases ( #5195 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-15 16:20:15 +08:00
Kaiyu Xie
746394e990
[TRTLLM-5516] perf: replicate dummy request for cuda graph padding (cherry-pick #4729 ) ( #5190 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-14 00:36:15 +08:00
Fanrong Li
bfa3b59bb6
[ https://nvbugs/5277592 ][fix] fix cuda graph padding for spec decoding (only for 0.20) ( #5058 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-11 02:14:14 +08:00
Ivy Zhang
b626186241
tests: fix some typo and limitation on test cases ( #4989 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-10 10:47:50 +08:00
Yechan Kim
9f5b23ae77
fix: [nvbugs/5324954, nvbugs/5304229] fix Qwen2-VL video and Qwen2.5-VL image test case ( #4976 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-09 15:25:26 +08:00
Yukun He
0b4f7182fb
[5289904] chore: Unwaive test for Qwen model. ( #4657 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-06-09 14:06:59 +08:00
Kaiyu Xie
d90fe3c69c
doc: Minor fixes and clarification ( #4975 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-09 14:06:09 +08:00
Yukun He
5ee14657b4
[5310329] chore: Unwaive test_e2e.py::test_openai_reasoning. ( #4981 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-06-09 14:05:21 +08:00
Yanchao Lu
9f45e806b2
[Infra] - Update JNLP container config ( #5009 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-08 18:09:35 +08:00
Stefan Niebler
a6e53bf4e0
ci: waive testcase [NVBUG 5247271] ( #4992 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-06-08 16:47:06 +08:00
Fanrong Li
213af8f80a
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5004 ( #5005 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-08 11:23:56 +08:00
liji-nv
ff4212377c
[fix] Fix illegal mem access and possible accuracy lose ( #4943 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-08 11:19:42 +08:00
Robin Kobus
20425deb3b
[ https://nvbugs/5238105 ] fix: ModelRunnerCpp num_return_sequences ( #3951 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-06 12:31:11 +02:00
Frank
6cbeb7724b
[ https://nvbugspro.nvidia.com/bug/5323820 ] Fix chunking equation for disabled case. ( #4964 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-06-06 15:51:10 +08:00
Yukun He
fa20ffc5d4
[5310329] fix: Fix warmup phase batch size out of range. ( #4912 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-06-06 12:26:05 +08:00
Erin
e9d360180c
fix: [nvbug 5321627] handle cases when TRT backend return more logits than output tokens ( #4921 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-06-06 07:12:42 +08:00
Yiqing Yan
f9082a7168
Downgrade NCCL version from 2.26.5 to 2.25.1 ( #4931 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-05 14:03:39 +08:00
Zheng Duan
c4c7dd3517
fix: cache-aware router related test fix ( #4911 )
...
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-05 13:07:24 +08:00
Gabriel Wu
df0aeae0cd
Fix DeepGEMM NVCC Path ( #4886 )
...
Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com>
2025-06-05 11:55:37 +08:00
Stanley Sun
a23cdc4c1b
test: fix potential teardown error ( #4908 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-06-05 10:39:57 +08:00
WeiHaocheng
500e28281e
[TRTLLM-5340] fix: remove the accuracy assert on run_majority_vote_ai… ( #4907 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-06-05 06:40:46 +08:00
Daniel Cámpora
64d5eba9c7
Fix: max_num_sequences calculation with overlap scheduling into release/0.20 ( #4889 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-04 22:33:12 +08:00
Yuxian Qiu
3af8159133
fix: [nvbugs/5312750] Keep embed_tokens for last pp rank if tie_word_embeddings. ( #4902 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-06-04 19:49:08 +08:00
Stanley Sun
33cd27f114
test: fix rss increasement test case issue ( #4868 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-06-04 10:35:06 +08:00
Yiqing Yan
b1ce7f0765
Waive L0 test ( #4862 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-03 18:37:21 +08:00
Yiqing Yan
95e6ad579d
Waive L0 test ( #4857 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-03 15:58:26 +08:00
Yechan Kim
565abb6887
fix: [nvbugs/5298600] fix illegal memory access on mrope_position_deltas ( #4830 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-03 14:56:50 +08:00
Fanrong Li
6e46e13523
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379 ( #4833 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 12:30:01 +08:00
Fanrong Li
82d918b93e
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4536 ( #4834 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 12:29:54 +08:00
Yanchao Lu
36116f09f6
[Infra] - Better utilize multi-GPU CI resources ( #4850 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-03 12:25:20 +08:00
ruodil
7c47714a39
test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test ( #4796 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 10:20:55 +08:00
Stanley Sun
b58556e2d9
test: remove invalid triton integration test cases ( #4801 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 09:39:23 +08:00
Michal Guzek
4e68be2da7
[TRTLLM-4932] Remove moe- related arguments from Llama-3_1-Nemotron-Ultra-253B-v1 CLI accuracy test ( #4808 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-02 12:16:28 -07:00
Faraz
10d5af06e0
[NVBUG-5291971] JIT path for XQA ( #4675 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-06-02 16:24:59 +02:00
pcastonguay
ddd704f39c
fix: Fix queued req stats for release/0.20 ( #4806 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-02 08:32:24 -04:00
brb-nv
7a2cd255bc
fix: Skip dummy medusa/eagle tests when WORLD_SIZE env variable is missing ( #4786 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-06-02 02:21:24 -07:00
QI JUN
555118f783
[ https://nvbugs/5303634 ] skip evaluating empty batch_input_ids in summarize.py ( #4676 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-02 16:16:05 +08:00
Yan Chunwei
55170ec83a
fix: llmapi-launch add add trtllm-bench test with engine building (#4… ( #4550 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-01 08:38:01 +08:00
Iman Tabrizian
00e0837e5c
Remove disaggregated cuda graph waived test ( #4707 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-31 07:24:00 +08:00
Yanchao Lu
86779213db
[Docs] - Add date and commit info ( #4448 ) ( #4752 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-30 15:58:49 +08:00
Yiqing Yan
830d68d101
Waive l0 tests ( #4795 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-30 15:56:58 +08:00
Ivy Zhang
9980e73afa
tests: waive failed case ( #4785 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-30 11:24:25 +08:00
xinhe-nv
1bc3dfa490
tests: fix 5250460 ( #4751 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-30 10:13:45 +08:00