Commit Graph

292 Commits

Author SHA1 Message Date
bhsueh_NV
ec4190fb71
infra: Add qwen3 235B tests into QA (#4483)
* add qwen3 qa test

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* add qwen3 test into qa list

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-20 17:37:09 +08:00
ruodil
b5edf13b33
test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro (#4282)
* add cases for rtx_pro_6000 and update test filter

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-20 10:58:05 +08:00
Michal Guzek
0a342a42f7
[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant (#4362)
* Add CLI TestLlama3_3_70BInstruct acc tests

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add tests to qa lists

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add comment

Signed-off-by: moraxu <mguzek@nvidia.com>

* Fix test names

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update yaml files

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update cli file

Signed-off-by: moraxu <mguzek@nvidia.com>

---------

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-20 09:48:14 +08:00
xinhe-nv
402385588d
test: [CI] Add failed cases into waives.txt (#4429)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive id

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-20 09:43:55 +08:00
Yuxian Qiu
c8e062bfd3
fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-19 14:25:36 -07:00
Venky
bb02d86b54
test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) (#4128)
* changes to run llama-v3.3-nemotron-super-49b

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* yapf

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* address review comments pt 1

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* re-add cpp super tests 

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-19 12:00:48 -07:00
Faraz
7656af1b57
[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) (#4335)
* add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* update cutlass versions

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* added internal cutlass with fix and docker update

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* added mixtral to pro 6000

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

---------

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 08:56:21 -07:00
liji-nv
58e405624a
[https://nvbugs/5123103][fix] Fix torch compile for DeepSeekV3 (#3952)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-19 22:12:25 +08:00
Iman Tabrizian
c6074c47da
Add llama4 disagg accuracy tests (#4336)
* Add llama4 disagg accuracy tests

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Make it async and add GSM8K benchmark

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 21:55:08 +08:00
Shi Xiaowei
001704cc6a
fix: temp disable the problem test (#4445)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-19 21:54:32 +08:00
Dom Brown
c45f414bbf
Test: Improve model re-use in C++ DGX tests for CI stability (#4263)
* Fix padded vocab size for Llama

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Refactor multi GPU llama executor tests, and reuse the built model engines

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Fix test list typo

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Further WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* WIP

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Update test lists and readme

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Try parametrize for asymmetric

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Parametrize + skip unsupported combinations

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

* Update test list

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

* Reduce environment duplicated code

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

---------

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-19 14:20:21 +01:00
Shi Xiaowei
df2798e0c3
feat: NIXL interface integration (#3934)
NIXL interfaces

Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-19 18:18:22 +08:00
Kaiyu Xie
a43914619f
fix: wrong argument name enable_overlap_scheduler (#4433)
Fix wrong argument

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-19 15:02:22 +08:00
Yan Chunwei
5b1c88de8d
chore: cleanup perf_evaluator code (#3833)
* chore: cleanup perf_evaluator code

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* up

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-19 13:21:36 +08:00
Ivy Zhang
58d2508b89
tests: Add test cases for rcca cases (#4347)
* add qwen2_0_5_instruct cp4 test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add qwen2.5 fp8 kvcache test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add ds distill qwen cpp runner test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 12:06:43 +08:00
Ivy Zhang
c4a0d768b5
tests: add qa test mentioned in docs (#4357)
* add nemotron-h and llama_70b cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add llm decoder quick_start case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update nemotron-h test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add qwen3 quickstart test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add trtllm_decoder accuracy test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove quickstart test for llm_decoder

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix import error

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* nemotronh fp8 trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix name

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove nemotronh-fp8

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 10:06:51 +08:00
Faraz
791c209006
[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) (#4363)
* added nemotron 49b fp8 for B40 release

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* add tests to QA list

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* pre-commit changes

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

---------

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 09:30:24 +08:00
Iman Tabrizian
7de90a66bc
Remove vila test (#4376)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 09:02:39 +08:00
Yanchao Lu
0d7269e2a7
[Infra][Docs] - Some clean-up for the CI pipeline and docs (#4419)
* [Docs] - Some clean-up for the docs

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

* [Infra] - Some clean-up for the CI pipeline

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

---------

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 00:07:45 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API (#4180)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Venky
fb663b637a
Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) (#4195)
* add ll-nm-nano tests that map to nim requirements

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

* prune some pytorch cases (fp8)

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

* removing pyt backend test changes

- When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging.
- Therefore don't want to block this PR, hence removing them.
- Seeing

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-17 22:46:21 +08:00
Yuxian Qiu
cc1bba1686
test: Waive tests for nvbugs/5286795. (#4409)
* Waive tests for nvbugs/5286795.

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

---------

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-17 19:41:05 +08:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible (#3439)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00
hlu1
befb93cbff
[Deepseek] Add accuracy test references for fp8 kvcache (#4374)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-17 11:23:00 +08:00
liji-nv
fb437ed709
[CI] waive accuracy/test_cli_flow.py::TestTinyLlama1_1BChat::test_pp4 (#4397)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-16 20:18:07 +08:00
Emma Qiao
27bdd0c82d
[TRTLLM-4886][infra]Try another timeout opt to exit test thread directly instead of gracefully (#4341)
* Try another timeout opt to kill test thread

Signed-off-by: qqiao <qqiao@nvidia.com>

* Return true when try to delete non-existing result file

Signed-off-by: qqiao <qqiao@nvidia.com>

* quick test for the result file

Signed-off-by: qqiao <qqiao@nvidia.com>

* Change back the global timeout setting

Signed-off-by: qqiao <qqiao@nvidia.com>

* Try to kill test in internal pytest

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-16 17:56:40 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 (#4255)
* fix: Fix/fused moe 0.19 (#3799)

* fix bug of stream init

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix bug

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* fix: Add pre-download of checkpoint before benchmark. (#3772)

* Add pre-download of checkpoint before benchmark.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Add missing remote code flag.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Move from_pretrained to throughput benchmark.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Move download and use snapshot_download.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Removed trusted flag.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Fix benchmark command in iteration log test.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

---------

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* [https://nvbugspro.nvidia.com/bug/5241495][fix] CUDA Graph padding with overlap scheduler (#3839)

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fuse

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* TRTLLM-4875 feat: Add version switcher to doc (#3871)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* waive a test (#3897)

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939)

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

* fix: remote mpi session abort (#3884)

* fix remote mpi session

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* skip fp8 gemm for pre-hopper (#3931)

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler (#3975)

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update multigpu list

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix namings

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* Doc: Fix H200 DeepSeek R1 perf doc (#4006)

* fix doc

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

* update perf number

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

---------

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

* Fix the perf regression caused by insufficient cache warmup. (#4042)

Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* doc: Update 0.19.0 release notes (#3976)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060)

The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* Update switcher (#4098)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* doc: update release notes (#4108)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* docs:update 0.19 doc. (#4120)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

* docs:add torch flow supported model list. (#4129)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

* doc: Release V0.19 Perf Overview Update (#4166)

Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>

* Fix readme of autodeploy.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Update tensorrt_llm/_torch/pyexecutor/llm_request.py

Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>

* Revert mgmn worker node.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Change to disable_overlap_scheduler.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
HuiGao-NV
d5578b37fc
Change the method to calculate kv memory size in tests (#4332)
* Change the method to calculate kv memory size in tests
* Set larger peak memory size to llama case

Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-16 15:35:40 +08:00
xinhe-nv
500b43e90c
test: [CI] remove closed bugs (#4345)
update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-16 13:47:42 +08:00
Stanley Sun
11aa50d1ea
test: add kv cache aware test cases to qa test list (#4257)
add kv cache_aware test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-16 12:47:01 +08:00
QI JUN
c4cd403af9
[CI] waive test_chunked_prefill test cases (#4380)
waive test_chunked_prefill

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-16 10:27:20 +08:00
Iman Tabrizian
4c7191af67
Move Triton backend to TRT-LLM main (#3549)
* Move TRT-LLM backend repo to TRT-LLM repo

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Address review comments

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* debug ci

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Update triton backend

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* Fixes after update

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-16 07:15:23 +08:00
yuxianq
4f8afe4cc6
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-16 04:16:53 +08:00
Venky
adb0839a33
test(perf): Add Phi-4-mini-instruct to perf tests (#4267)
* add phi-4-mini-instruct

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* trim tests

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-15 21:27:03 +08:00
yuxianq
0e87fcc228
refactor: use x is None instead of x == None. (#4244)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-15 20:00:04 +08:00
Yanchao Lu
5ce1102a02
Revert "[test] add qa test mentioned in docs" (#4355)
Revert "[test] add qa test mentioned in docs (#4248)"

This reverts commit b0ce1371ee.
2025-05-15 18:47:30 +08:00
Stanley Sun
9d3e05486b
test: add qa test list for rtx5090 and rtx_pro_6000 (#4254)
* add test list for rtx5090 and rtx_pro_6000

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* add 2gpu llama70b test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* remove duplicate and invalid test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* add 2gpus test cases

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

---------

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-15 17:57:31 +08:00
zhhuang-nv
d6b741ddfe
[fix] test_no_kv_cache_reuse for overlap_scheduler (#4350)
fix test_no_kv_cache_reuse for overlap_scheduler

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-15 16:43:53 +08:00
xinhe-nv
14bfb5e0d6
test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus (#4283)
* update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-15 15:57:44 +08:00
zhhuang-nv
97bc680cd8
feat: support kv cache reuse for MLA (#3571)
* support kv cache reuse for MLA

load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* add CI test

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

* resolve comments

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* use GPTJ style RoPE for MLA

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix rebase error and some docs

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix kv_lens

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* tiny fix

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix torch compile

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix: use normal device memory instead of pinned memory for unit test

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

* fix L0 tests

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* fix torch compile after rebase

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* resolve comments

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* resolve comments again

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

---------

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com>
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-05-15 15:22:21 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default (#4174)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
dominicshanshan
404fbe9b32
[https://nvbugs/5277113][fix]genai-perf API change stress test (#4300)
* fix bug 5277113.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* fix bug 5277113 and 5278517.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-15 14:12:34 +08:00
Ivy Zhang
b0ce1371ee
[test] add qa test mentioned in docs (#4248)
* add nemotron-h and llama_70b cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* trial

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add llm decoder quick_start case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update nemotron-h test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add qwen3 quickstart test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add trtllm_decoder accuracy test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove quickstart test for llm_decoder

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-15 13:37:11 +08:00
hlu1
3ea42e7519
[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus (#4346)
Reorganize TestDeepSeekR1::test_nvfp4_8gpus

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-15 13:09:13 +08:00
Mike Iovine
f9adac3dea
[feat] Enable chunked context for flashinfer (#4132)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-15 10:59:38 +08:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092)
* chore: Remove GptSession/V1 from TRT workflow

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove stateful decoders

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession buffers

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession utils

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession kernels

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove V1 GPT models from tests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove gptSessionBenchmark from scripts and docs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove gptSession IO classes

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession from test lists

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession from docs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove useless encoder test

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove mActualBatchSize from DecoderState

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove static batching from ExecutorTest

- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
Faraz
42de79d49e
test: Added tests for Llama3.1-70B-BF16 on SM120 (#4198)
* Added tests for Llama3.1-70B-BF16 on SM120

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

* solve conflicts add more tests

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>

---------

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-14 11:57:49 -04:00
Yanchao Lu
504f4bf779
[Infra] - Update the upstream PyTorch dependency to 2.7.0 (#4235)
[Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-14 22:28:13 +08:00
Kaiyu Xie
6c45586c51
chore: Remove deprecated Python runtime benchmark (#4171)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-14 18:41:05 +08:00
HuiGao-NV
f4059c6e2e
Add test case for kv memory estimation (#4158)
* Add test case for kv memory estimation
* Dump running log into file and parse kv cache memory size from file
* Set bigger peak memory size for mixed percision case and test_ptp_quickstart_advanced_eagle3 case
* Revert change to usage of fraction
* use context manager to guard temp files

Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-14 18:39:25 +08:00