Yanchao Lu
f4cdbfcdf0
None - Some clean-ups for the automation pipeline ( #5245 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 21:08:24 +08:00
QI JUN
ccd9adbe33
CI: move multi-gpu test cases of tensorrt backend to h200 ( #5272 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:37:37 +08:00
QI JUN
517c1ecf72
move some test cases of TensorRT backend back ( #5232 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:03:11 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation ( #5179 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
Tailing Yuan
0b60da2c45
feat: large-scale EP(part 7: DeepEP integration) ( #4792 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-14 19:12:38 +08:00
QI JUN
952f33dcad
CI: move all test cases of TensorRT backend into post merge ( #5186 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-13 20:48:48 +08:00
Iman Tabrizian
01bd4c00b4
Add two MTP disaggregated test ( #4546 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-13 12:17:45 +08:00
Shi Xiaowei
88cba5f354
test: waive the NIXL related tests ( #5153 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-12 17:02:27 +08:00
Fanrong Li
4d070d3862
chore: fix typo in tests ( #5092 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-12 15:11:26 +08:00
Zheng Duan
580a92521e
test: conditional disagg and cache aware balancing for deepseek v3 ( #4522 )
...
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-11 09:44:29 +08:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 ( #4898 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
Yanchao Lu
9e05613679
[Infra] - Update JNLP container config ( #5008 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-08 16:44:09 +08:00
QI JUN
5ee0de7f2a
Resubmit #4894 ( #4969 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-08 04:42:15 +08:00
Fanrong Li
75d020cf07
fix: fix cuda graph padding for spec decoding ( #4853 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-06 22:21:42 +08:00
Anthony Chang
eeb555e37b
chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM ( #4826 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-06 16:13:54 +08:00
QI JUN
ec50684d80
Revert "fix a bug of global cuda graph dummy request" ( #4970 )
2025-06-06 08:54:45 +08:00
QI JUN
154f7cc40a
fix a bug of global cuda graph dummy request ( #4894 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 19:47:40 +08:00
Shunkangz
3eae58ca36
Add disaggregated unittest ( #4899 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-05 19:14:31 +08:00
Lucas Liebenwein
f9d45e03a4
[AutoDeploy] deprecate CI post-merge tests and keep them for local testing ( #4892 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 08:27:17 +08:00
Yi Zhang
1fca654bfd
tests: Update gb200 test case ( #4754 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-04 18:49:20 +08:00
Iman Tabrizian
141467d4b6
Add pre-merge Triton backend tests ( #4842 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-03 00:47:58 -04:00
Fanrong Li
380a5d1690
[ https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue ( #4536 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 10:03:34 +08:00
Yanchao Lu
8166649d03
[Infra] - Minor clean-up and test Ubuntu mirrors ( #4829 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-02 20:18:20 +08:00
amirkl94
8039ef45d3
CI: Performance regression tests update ( #3531 )
2025-06-01 09:47:55 +03:00
Emma Qiao
c945e92fdb
[Infra]Remove some old keyword ( #4552 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-31 13:50:45 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. ( #3967 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
yuanjingx87
2c48ff5898
[feat] add b200 support via slurm ( #4709 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 14:49:46 +08:00
Yan Chunwei
33a9ba55f5
fix: test trtllm-bench mgmn ( #4613 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 14:43:47 +08:00
QI JUN
058f83e47b
CI: move post-merge multi GPU test of PyTorch backend to H200 ( #4733 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:15:56 +08:00
amirkl94
fbec0c3552
Release 0.20 to main ( #4577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
xinhe-nv
59f7622281
test: rcca https://nvbugs/5223130 ( #4510 )
...
* add rcca tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip tests on blackwell
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 09:59:47 +08:00
yuanjingx87
732d92ff62
[Infra] - Multi-GPU testing support with Slurm ( #4454 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 19:44:19 +08:00
Enwei Zhu
88190faa34
feat: large-scale EP(part 4: Static EP load balancer integration) ( #4615 )
...
* MoeLoadBalancerConfig
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* MoeLoadBalancer integration
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* config file
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-26 18:25:11 +08:00
Anthony Chang
bbea2647b1
Qwen3 supports TRTLLM FP4 MoE backend ( #4530 )
...
* MoE TRTLLM backend for Qwen3
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* add extra moe_backend to test
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* address comments
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* conditionally compile kernels on newer archs
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* missing positional arg
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* Update the routing kernels
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* Revise usage of TLLM_LOG_ERROR
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* Add unit test for Qwen3 moe (trtllm_gen backend)
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* improve weight processing speed of moe_backend=TRTLLM; roughly 2x
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* tidy and minor fix
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* temporarily disable accuracy test that has known issue
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
---------
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
Co-authored-by: Christina Zhang <christinaz@nvidia.com>
2025-05-23 18:31:08 +08:00
pcastonguay
d7d455e7ea
[feat][TRTLLM-5018] Dis serving python runtime trt backend ( #4243 )
...
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-22 22:01:06 -04:00
Mike Iovine
14fc48ada7
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler ( #4402 )
...
[fix] Fix chunked prefill + overlap scheduler
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-23 04:38:22 +08:00
HuiGao-NV
bc9f1dbede
fix[nvbug-5228840]: Remove test cases of feature not supported anymore ( #3972 )
...
* Remove waived cases
* Remove test cases of not supported feature
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-22 11:18:58 +08:00
Chuang Zhu
44cfd757b2
Agent interface impl for NIXL ( #4125 )
...
* agentConnection
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
recv
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
agentState
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
NIXL interfaces
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
update cmakelists
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
nixl improve
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
remove cppzmq
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
transferAgent remove register
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
work for cache Test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
reduce sleep time
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
intergarte
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
nixl env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix rebase error
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
cpp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
stash for send metaData
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
loadRemoteMD after fetchRemoteMD
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
workaround for mixed gen and context
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
test_env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
avoid port conflict in test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* format
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use std::string
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* typo
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* fix transferAgentTest
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 09:09:41 +08:00
Dom Brown
1cffa99792
test: Split test_simple into mpi_utils and cache transceiver tests for DGX ( #4451 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-22 04:26:21 +08:00
Yan Chunwei
9199793848
fix: llmapi-launch add add trtllm-bench test with engine building ( #4091 )
...
* add trtllm-bench mgmn test
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-21 10:18:01 +08:00
Zheng Duan
77a0189554
feat: conditional disaggregation in disagg server ( #3974 )
2025-05-21 09:57:46 +08:00
Faraz
7656af1b57
[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) ( #4335 )
...
* add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* update cutlass versions
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* added internal cutlass with fix and docker update
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* added mixtral to pro 6000
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 08:56:21 -07:00
liji-nv
58e405624a
[ https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 ( #3952 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-19 22:12:25 +08:00
Iman Tabrizian
c6074c47da
Add llama4 disagg accuracy tests ( #4336 )
...
* Add llama4 disagg accuracy tests
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Make it async and add GSM8K benchmark
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 21:55:08 +08:00
Dom Brown
c45f414bbf
Test: Improve model re-use in C++ DGX tests for CI stability ( #4263 )
...
* Fix padded vocab size for Llama
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Refactor multi GPU llama executor tests, and reuse the built model engines
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Fix test list typo
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Further WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update test lists and readme
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Try parametrize for asymmetric
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Parametrize + skip unsupported combinations
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
* Update test list
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
* Reduce environment duplicated code
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-19 14:20:21 +01:00
Yan Chunwei
5b1c88de8d
chore: cleanup perf_evaluator code ( #3833 )
...
* chore: cleanup perf_evaluator code
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* up
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-19 13:21:36 +08:00
Faraz
791c209006
[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) ( #4363 )
...
* added nemotron 49b fp8 for B40 release
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* add tests to QA list
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* pre-commit changes
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 09:30:24 +08:00
Iman Tabrizian
7de90a66bc
Remove vila test ( #4376 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 09:02:39 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API ( #4180 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible ( #3439 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00