Commit Graph

422 Commits

Author SHA1 Message Date
Barry Kang
20b42912ce
[TRTLLM-3330][feat] Support DeepSeek-R1 W4A8 on Hopper (#4123)
Support DeepSeek-R1 W4A8 on Hopper

Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-05-14 15:48:07 +08:00
Zongfei Jing
bb17649517
test: Add UT for moe trtllmgen (#4258)
* Add ut for moe trtllmgen

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

* Update tests/unittest/_torch/modeling/test_modeling_deepseek.py

Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>

---------

Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-05-14 15:22:58 +08:00
bhsueh_NV
1a9298bc66
CI: add fp8/fp4 ci on Qwen3-30B-A3B (#4266)
add fp8/fp4 ci on Qwen3-30B-A3B

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-14 14:38:04 +08:00
brb-nv
8280c3d4f2
feat: Support Gemma3-1b-it in Pytorch workflow (#3999)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 14:02:44 +08:00
Yi Zhang
86ae506b9d
[fix] Enable pp tests (#3978)
Fix misrebase issue

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-05-14 10:51:20 +08:00
Fridah-nv
21dbd163a7
[TRTLLM-5188] fix: [AutoDeploy] unwaive AD build test (#4273)
* unwaive small build test

Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>

* unwaive mutigpu/integration tests

Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>

* fix for torch.compile+flashinfer attention

Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>

---------

Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>
2025-05-14 10:40:12 +08:00
brb-nv
1ef117688c
test: Validate FP8 and LoRA for Gemma3 (#3670)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-13 17:28:02 -07:00
Iman Tabrizian
f408de2d99
Waive disagg kv cache load balancer test (#4276)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-14 06:03:24 +08:00
brb-nv
cd5b3d21a0
feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 03:47:22 +08:00
Yiqing Yan
290649b6aa
[Infra] Waive L0 test (#4269)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 23:06:13 +08:00
Yiqing Yan
bfa16a63d4
[Infra] Waive L0 test (#4268)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 22:43:17 +08:00
dominicshanshan
44d6adfb68
Waive stress test. (#4262)
* Waive stress test.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 21:01:57 +08:00
Enwei Zhu
8f68d56cc1
[https://nvbugs/5220763] [test] Unwaive Mixtral FP8 TP2 test (#4252)
unwaive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 15:55:33 +08:00
Yiqing Yan
fda8b0277a
[Infra][TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 (#4049)
* [TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix review

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* update images

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* Update jenkins/L0_Test.groovy

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* update image name

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-13 14:59:12 +08:00
ruodil
d555fe2530
test: fix for perf test script issue (#4230)
fix for perf test script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-13 10:29:20 +08:00
xinhe-nv
0cebc16139
test: [CI] Add failed cases into waives.txt (#4205)
waive tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:22:42 +08:00
xinhe-nv
7ebae4dcaa
test: [CI] Add failed cases into waives.txt (#4203)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:08:02 +08:00
pcastonguay
9643be5f20
[TRTLLM-5050][feat] Enable per-request stats with PyT backend (#4156)
* feat: Add per-request stats support with PyT backend

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Adding unit test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing stats unit test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing test with overlap

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-12 21:35:15 -04:00
Simeng Liu
286a789549
feat: Add heuristic for GroupRMSNorm kernel selection. (#4047)
* feat: Add heuristic for GroupRMSNorm kernel selection.

Implements a logistic regression model to dynamically select between:
- GroupRMSNormBaseKernel: Allocates warps proportional to sum of dimensions
  (better SM occupancy in most cases)
- GroupRMSNormLargeBatch: Allocates warps proportional to max dimension
  (better block scheduling in large batch scenarios)

Selection heuristic considers batch size, allocated warps, and scheduling
efficiency on the current GPU architecture. Models for Compute Capability
9.x and 10.x are trained base on nsys kernel runtime data.
The default kernel selection is the base kernel.

The python operator group_rms_norm will use the heuristic by default.
User can pick to use the base or large batch kernels as well.

Signed-off-by: Simeng Liu <simengl@nvidia.com>

* Address the comments.

Signed-off-by: Simeng Liu <simengl@nvidia.com>

---------

Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-13 08:52:53 +08:00
Enwei Zhu
035d915fea
[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090)
* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* normalize mtp_nextn

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update test_durations

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 07:41:51 +08:00
wili
eba3623a54
Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979)
* feat/vbws-part4-v1.8: rebase

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* feat/vbws-part4-v1.9: fix incorrect output when using short output length

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.1: remove useless variables

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.2:fix incorrect output when using short output length

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.3: rebase

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.4: rebase

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

* v1.9.5: remove API change

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

---------

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-12 22:32:29 +02:00
Enwei Zhu
c31ca1688c
[https://nvbugs/5214229] [fix] Unwaive lm_head quantization case (#4222)
unwaive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 20:23:06 +08:00
Zheng Duan
c9e2a963e0
feat: add kv cache aware router (#3831)
* kv cache aware router

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* add tests

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* router config

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* eviction test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

add test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* eviction detect in worker test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* move worker tests to single gpu

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* reduce memory fraction

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* fix partial block

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

---------

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding (#4066)
* finish

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* exc overlap scheduler

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix api ref

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Yechan Kim
3e9bda3a09
[feat] Support HyperCLOVAX-SEED-Text language part (#3902)
* feat: support HyperCLOVAX-SEED-Text language part

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add Pytorch flow and remove test file

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* revert summarize

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix summarize

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove from pytorch example

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-05-12 16:05:14 +08:00
Zhenhuan Chen
9212e9a740
[TRTLLM-4911] feat(scaffolding): make sampling_params only setable by controller (#4151)
feat(scaffolding): make sampling_params only setable by controller

Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-05-12 15:29:09 +08:00
Ivy Zhang
ee92edf2b4
[https://nvbugspro.nvidia.com/bug/5270564][test] skip per-hopper for llama4 (#4211)
skip per-hopper for llama4

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-12 15:27:15 +08:00
ruodil
9c03a7ab74
test: add llama_3.2_1B model and fix for test lora script issue (#4139)
* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* add llama_3.2_1B model and fix for lora script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-12 14:51:59 +08:00
xinhe-nv
849d9c343c
tests: https://nvbugs/5219534 remove failed tests from test list (#4113)
remove unsupported tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-12 14:13:40 +08:00
Yiqing Yan
3c54e84e47
[Infra] Waive L0 test (#4212)
Waive L0 test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-12 11:37:49 +08:00
QI JUN
f021afa241
[CI] waive two multi-gpu test cases (#4206)
waive two multi-gpu test cases

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-12 08:04:48 +08:00
Enwei Zhu
7db368c72c
test: Remove CNN Dailymail tasks in favor of GSM8K (#4187)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-10 09:02:07 +08:00
Dom Brown
2d0f93a054
Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027)
* Refactor: Restructure C++ tests for better modularisation of non-shared code

Start cleanup of pytest code for C++ tests

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

Clean up names and remove references to test_cpp.py

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

Move multi-GPU code

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

Update doc and try un-waiving

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Update multi GPU file check

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Address minor multi-GPU setup bug

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

---------

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-09 19:16:51 +01:00
Mike Iovine
4b8ba7ad61
[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue (#4069)
[fix] Fix llama 4 test lists

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-09 22:45:14 +08:00
Tracin
446f62bbab
chore: Deprecate evaltool (#4173)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-09 20:31:53 +08:00
ruodil
bf5b2a2e0a
test: amend regex match for perf throughput (#4186)
amend regex match for perf throughput

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 17:33:25 +08:00
WeiHaocheng
0f01826dde
feat: support task collection for to collect information (#3328) (#3824)
Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>
2025-05-09 17:09:01 +08:00
xinhe-nv
9082411a50
test: [CI] Add failed cases into waives.txt (#4165)
wavie oom tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-09 16:56:30 +08:00
ruodil
5ce5b81281
test: amend default pytorch extra-llm-api-config.yml in perf test (#4176)
* amend default pytorch extra-llm-api-config.yml

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* add print info to separate cases in output log

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 16:46:48 +08:00
xinhe-nv
1d26a3fd7c
test: skip tests on b200 (#3913)
* skip tests on b200

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* skip phi-3-128k

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-09 14:51:55 +08:00
Fanrong Li
77f8e43592
[fix] Fix relaxed acceptance to support enabling it in context phase (#4126)
* fix relaxed acceptance to support enable this feature in context phase.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

* fix sample_and_accept_draft_tokens unit test.

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

---------

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-09 14:11:14 +08:00
Bo Li
e3cf3fd15f
test: Add fp8kv to DS-v3-lite integration tests. (#3950)
* Add fp8 kv cache tests to DSV3-Lite integration tests.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update gsm8k.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update CI list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update TestDeepSeekR1.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Fix test list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Need quant_config besides pytorch_config.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update waive list (bug 5239087).

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update waive list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Correct test name.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update waive list.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

---------

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <bobboli0202@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-09 13:35:04 +08:00
Ivy Zhang
c91d03fa0a
test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440)
* add mistral-7b-v0.1 torch flow test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* rearrange mistral

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* rearrange mixtral case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove api function test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* move mistral nemo cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* move mixtral cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix failure

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix name

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix failure cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update list

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove awq llmapi test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* adjust threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix ci

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix partial comments

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix path

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update thres

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove duplicate test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix ci

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:32:02 +08:00
Ivy Zhang
c2d4c2adb6
[https://nvbugspro.nvidia.com/bug/5260676]test: skip fp8 quantization case for pre-ada (#4095)
skip pre ada

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:30:16 +08:00
Yi Zhang
91bf5e6a8e
[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support (#3804)
Add Piecewise CUDA Graph Support

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-05-09 11:04:01 +08:00
Stanley Sun
fb31f91e15
test: add qwen3 and disaggregated serving accuracy tests to qa test list (#4083)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-09 11:03:02 +08:00
Yukun He
5b61486d87
chore: Clean up the legacy DeepseekAllreudceFusionOp. (#4081)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-09 10:20:41 +08:00
pcastonguay
836c142e1b
[feat] Allow overriding cli args with yaml file in trtllm-serve (#4164)
feat: Allow overriding cli args with yaml file in trtllm-serve

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-08 21:19:05 -04:00
Mike Iovine
9afe510367
[fix] Fix llama4 + eagle3 (#3998)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-08 19:20:27 -04:00
chenfeiz0326
7f5716ef83
Cherry-pick trtllm-gen from feat/llama4 to main (#4086)
* feat: TRT-LLM Gen FP8 MoE Llama4

Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>

* feat: TRT-LLM Gen llama4 MoE Top1 routing

Signed-off-by: Jiqun Tu <jtu@nvidia.com>

* feat: add per tensor FP8 TRT-LLM Gen GEMMs

Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>

* Update

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

* Update

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

* Add license for cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/gemmCubins

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

* Add guard for routingIndicesClusterKernel

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

* Guard sm90+ for routingkernels

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

* Guard sm90+ for routingkernels

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

---------

Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Nikita Korobov <nkorobov@nvidia.com>
Co-authored-by: Jiqun Tu <jtu@nvidia.com>
2025-05-08 14:13:01 -07:00