Yi Zhang
9b616db13b
test: Add fixture to skip tests based on MPI world size ( #5028 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-16 11:25:01 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation ( #5179 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
nv-guomingz
3b7b5a5ad5
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) ( #5032 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-14 14:23:13 +08:00
xinhe-nv
d9be419f45
tests: update tests for b200 ( #5180 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-13 11:25:33 +08:00
Michal Guzek
53983ad273
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests ( #4933 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 15:06:28 +08:00
bhsueh_NV
505678a286
update the free_gpu_mem_fraction for H100 qwen3 qa test ( #5114 )
...
Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>
2025-06-12 14:40:57 +08:00
Michal Guzek
0daa70999a
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs ( #4961 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 14:32:04 +08:00
xinhe-nv
11b94feff8
test: skip disaggregated tests on arm ( #5070 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-11 17:00:10 +08:00
Stanley Sun
74b0e71ef4
test: add more disaggregated serving tests into QA testlist ( #5036 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-06-10 09:24:53 +08:00
liji-nv
1d4f748773
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … ( #5017 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-09 17:50:57 +08:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 ( #4898 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
Ivy Zhang
7dce328ad6
[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow ( #4940 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
2025-06-07 11:18:32 +08:00
Fanrong Li
75d020cf07
fix: fix cuda graph padding for spec decoding ( #4853 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-06 22:21:42 +08:00
Anthony Chang
eeb555e37b
chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM ( #4826 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-06 16:13:54 +08:00
ixlmar
a1526356aa
[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests ( #4859 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-05 10:46:29 +01:00
Yi Zhang
1fca654bfd
tests: Update gb200 test case ( #4754 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-04 18:49:20 +08:00
Nikita Korobov
8043d7a03c
feat: update DeepSeek FP8 TRT-LLM Gen cubins ( #4643 )
...
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-06-03 14:07:54 -07:00
Robin Kobus
b9263a8e10
fix: max_num_sequences calculation with overlap scheduling ( #4532 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-03 09:31:22 +02:00
Fanrong Li
380a5d1690
[ https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue ( #4536 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 10:03:34 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. ( #3967 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
Ivy Zhang
ed3c67e34a
tests: [ https://nvbugspro.nvidia.com/bug/5289908 ] run maverick bf16 on blackwell ( #4722 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 22:05:51 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
amirkl94
fbec0c3552
Release 0.20 to main ( #4577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
xinhe-nv
59f7622281
test: rcca https://nvbugs/5223130 ( #4510 )
...
* add rcca tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip tests on blackwell
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 09:59:47 +08:00
Enwei Zhu
88190faa34
feat: large-scale EP(part 4: Static EP load balancer integration) ( #4615 )
...
* MoeLoadBalancerConfig
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* MoeLoadBalancer integration
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* config file
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-26 18:25:11 +08:00
hlu1
4a236d107d
[Fix][Deepseek] Fix bugs in TestDeepSeekR1 ( #4413 )
...
[Deepseek] Fix bugs in TestDeepSeekR1
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-24 09:52:57 +08:00
zhhuang-nv
8452775db8
[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA ( #4535 )
...
* optimize kv cache reuse workflow for MLA
write kv cache first and only call up-projection GEMM once
relax contiguous requirements of k/v for setting paged kv cache
return two contiguous tensors when loading MLA KV Cache
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* support fp8 kv cache for MLA kv cache reuse
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
---------
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-23 19:47:50 +08:00
Anthony Chang
bbea2647b1
Qwen3 supports TRTLLM FP4 MoE backend ( #4530 )
...
* MoE TRTLLM backend for Qwen3
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* add extra moe_backend to test
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* address comments
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* conditionally compile kernels on newer archs
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* missing positional arg
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* Update the routing kernels
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* Revise usage of TLLM_LOG_ERROR
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* Add unit test for Qwen3 moe (trtllm_gen backend)
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* improve weight processing speed of moe_backend=TRTLLM; roughly 2x
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* tidy and minor fix
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* temporarily disable accuracy test that has known issue
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
---------
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
Co-authored-by: Christina Zhang <christinaz@nvidia.com>
2025-05-23 18:31:08 +08:00
Mike Iovine
14fc48ada7
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler ( #4402 )
...
[fix] Fix chunked prefill + overlap scheduler
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-23 04:38:22 +08:00
xinhe-nv
22c01d5b21
test: [CI] Add failed cases into waives.txt ( #4549 )
...
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* fix test issues
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-22 17:18:53 +08:00
Michal Guzek
9033dd987d
[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct ( #4415 )
...
Add phi-4-mini CLI acc test
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-22 09:56:48 +08:00
QI JUN
2372589689
Chore: waive torch compile test cases of deepseek v3 lite ( #4508 )
...
waive torch compile test cases of deepseek v3 lite
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 10:43:31 +08:00
bhsueh_NV
ec4190fb71
infra: Add qwen3 235B tests into QA ( #4483 )
...
* add qwen3 qa test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* add qwen3 test into qa list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-20 17:37:09 +08:00
Michal Guzek
0a342a42f7
[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant ( #4362 )
...
* Add CLI TestLlama3_3_70BInstruct acc tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests to qa lists
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add comment
Signed-off-by: moraxu <mguzek@nvidia.com>
* Fix test names
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update cli file
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-20 09:48:14 +08:00
liji-nv
58e405624a
[ https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 ( #3952 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-19 22:12:25 +08:00
Iman Tabrizian
c6074c47da
Add llama4 disagg accuracy tests ( #4336 )
...
* Add llama4 disagg accuracy tests
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Make it async and add GSM8K benchmark
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 21:55:08 +08:00
Ivy Zhang
58d2508b89
tests: Add test cases for rcca cases ( #4347 )
...
* add qwen2_0_5_instruct cp4 test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen2.5 fp8 kvcache test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add ds distill qwen cpp runner test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 12:06:43 +08:00
Ivy Zhang
c4a0d768b5
tests: add qa test mentioned in docs ( #4357 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix import error
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* nemotronh fp8 trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix name
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove nemotronh-fp8
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 10:06:51 +08:00
hlu1
befb93cbff
[Deepseek] Add accuracy test references for fp8 kvcache ( #4374 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-17 11:23:00 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 ( #4255 )
...
* fix: Fix/fused moe 0.19 (#3799 )
* fix bug of stream init
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix: Add pre-download of checkpoint before benchmark. (#3772 )
* Add pre-download of checkpoint before benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Add missing remote code flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move from_pretrained to throughput benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move download and use snapshot_download.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Removed trusted flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Fix benchmark command in iteration log test.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5241495 ][fix] CUDA Graph padding with overlap scheduler (#3839 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fuse
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* TRTLLM-4875 feat: Add version switcher to doc (#3871 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* waive a test (#3897 )
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939 )
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* fix: remote mpi session abort (#3884 )
* fix remote mpi session
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* skip fp8 gemm for pre-hopper (#3931 )
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5247148 ][fix] Attention DP with overlap scheduler (#3975 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update multigpu list
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix namings
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Doc: Fix H200 DeepSeek R1 perf doc (#4006 )
* fix doc
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* update perf number
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
---------
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* Fix the perf regression caused by insufficient cache warmup. (#4042 )
Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* doc: Update 0.19.0 release notes (#3976 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060 )
The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Update switcher (#4098 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* doc: update release notes (#4108 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* docs:update 0.19 doc. (#4120 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* docs:add torch flow supported model list. (#4129 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* doc: Release V0.19 Perf Overview Update (#4166 )
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
* Fix readme of autodeploy.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Update tensorrt_llm/_torch/pyexecutor/llm_request.py
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
* Revert mgmn worker node.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Change to disable_overlap_scheduler.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
QI JUN
c4cd403af9
[CI] waive test_chunked_prefill test cases ( #4380 )
...
waive test_chunked_prefill
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-16 10:27:20 +08:00
yuxianq
4f8afe4cc6
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism ( #4034 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-16 04:16:53 +08:00
Yanchao Lu
5ce1102a02
Revert "[test] add qa test mentioned in docs" ( #4355 )
...
Revert "[test] add qa test mentioned in docs (#4248 )"
This reverts commit b0ce1371ee .
2025-05-15 18:47:30 +08:00
zhhuang-nv
d6b741ddfe
[fix] test_no_kv_cache_reuse for overlap_scheduler ( #4350 )
...
fix test_no_kv_cache_reuse for overlap_scheduler
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-15 16:43:53 +08:00
zhhuang-nv
97bc680cd8
feat: support kv cache reuse for MLA ( #3571 )
...
* support kv cache reuse for MLA
load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* add CI test
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* use GPTJ style RoPE for MLA
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix rebase error and some docs
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix kv_lens
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* tiny fix
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: use normal device memory instead of pinned memory for unit test
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* fix L0 tests
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile after rebase
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments again
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
---------
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com>
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-05-15 15:22:21 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
Ivy Zhang
b0ce1371ee
[test] add qa test mentioned in docs ( #4248 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-15 13:37:11 +08:00
hlu1
3ea42e7519
[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus ( #4346 )
...
Reorganize TestDeepSeekR1::test_nvfp4_8gpus
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-15 13:09:13 +08:00
Mike Iovine
f9adac3dea
[feat] Enable chunked context for flashinfer ( #4132 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-15 10:59:38 +08:00
DylanChen-NV
206f82115d
[bug/5247505] fix: CP accuracy on Blackwell ( #4188 )
...
* fix xqa params for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* try adding B200 multi gpu test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add accuracy tests for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
---------
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-05-14 17:40:50 +08:00