Commit Graph

572 Commits

Author SHA1 Message Date
Enwei Zhu
ee916da8f1
test: Waive test_llm_loading_from_ckpt_for_tp2 (#4797)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-30 15:43:00 +08:00
xinhe-nv
53794b26f8
test: skip test_llm_hf_gemma_quantization_1gpu_vswa on A100 (#4779)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-30 15:12:12 +08:00
Aurelien Chartier
36b87b8671
chore: fix llm_root when LLM_ROOT is not set (#4741)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-29 19:44:34 -07:00
Jinyang Yuan
5339d367ce
[perf] Reduce the workspace size of FP4 activation scales for MoE (#4303)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-30 09:03:52 +08:00
Yilin Fan
31bb650298
Cherry pick feat/llama4 to main (#4739)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-05-30 05:28:40 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. (#3967)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
yuanjingx87
2c48ff5898
[feat] add b200 support via slurm (#4709)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 14:49:46 +08:00
Yan Chunwei
33a9ba55f5
fix: test trtllm-bench mgmn (#4613)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 14:43:47 +08:00
ruodil
500aca4f44
test: remove perf test l40s/l20 oom test cases and unwaive tests (#4755)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 13:58:47 +08:00
QI JUN
058f83e47b
CI: move post-merge multi GPU test of PyTorch backend to H200 (#4733)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:15:56 +08:00
Yiqing Yan
7f29a70f53
Waive L0 test (#4748)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-29 11:05:27 +08:00
Yan Chunwei
ac17142495
chore: rename ExecutorBindingsWorker/Proxy (#4716)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 10:32:35 +08:00
Arthur Rasmusson
812b1abf86
feature: KV Cache GPUDirect Storage (#3209)
Signed-off-by: Arthur Rasmusson <47877520+arthurrasmusson@users.noreply.github.com.>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-28 23:27:43 +00:00
Erin
820c39041f
chore: [nvbug_5273941] unwaive test_llm_loading_from_ckpt_for_tp2 (#4725)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-29 06:54:32 +08:00
Aurelien Chartier
6cf1e4d0a9
chore: add -f to pkill calls (#4711)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-29 02:54:31 +08:00
Ivy Zhang
ed3c67e34a
tests: [https://nvbugspro.nvidia.com/bug/5289908] run maverick bf16 on blackwell (#4722)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 22:05:51 +08:00
xinhe-nv
93283484c2
test: [CI] Add failed cases into waives.txt (#4688)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-28 22:04:35 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
amirkl94
fbec0c3552
Release 0.20 to main (#4577)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
Pengyun Lin
971d16a2ee
[TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT backend (#4623)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-28 11:36:44 +08:00
Yuxian Qiu
5700a4ffcd
feat: Add vanilla MOE. (#4682)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-28 10:44:14 +08:00
xinhe-nv
bb3d998eb1
test: [CI] remove closed bugs (#4638)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 18:07:59 +08:00
Lucas Liebenwein
5cdd6bb10f
[AutoDeploy] Increased Model Coverage Mass Migration Week 1 (#4468)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 16:43:15 +08:00
Yiqing Yan
f6c50293d2
[Infra][TRTLLM-3929] Rerun failure tests (#3264)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 16:13:23 +08:00
Yiqing Yan
92a7984945
Waive L0 tests (#4686)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 15:07:02 +08:00
xinhe-nv
59f7622281
test: rcca https://nvbugs/5223130 (#4510)
* add rcca tests

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* skip tests on blackwell

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 09:59:47 +08:00
yuanjingx87
732d92ff62
[Infra] - Multi-GPU testing support with Slurm (#4454)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 19:44:19 +08:00
Enwei Zhu
88190faa34
feat: large-scale EP(part 4: Static EP load balancer integration) (#4615)
* MoeLoadBalancerConfig

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* MoeLoadBalancer integration

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* config file

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-26 18:25:11 +08:00
Emma Qiao
6f626af386
[TRTLLM-4535][infra]: Add marker TIMEOUT for test level (#3905)
* Add marker for TIMEOUT

Signed-off-by: qqiao <qqiao@nvidia.com>

* Remove workspace after tests

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add missed property

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add some debug info

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix errors

Signed-off-by: qqiao <qqiao@nvidia.com>

* Testing

Signed-off-by: qqiao <qqiao@nvidia.com>

* Special process for unittests

Signed-off-by: qqiao <qqiao@nvidia.com>

* Move special proecessing unittests to test generating stage

Signed-off-by: qqiao <qqiao@nvidia.com>

* Process for the whole test list

Signed-off-by: qqiao <qqiao@nvidia.com>

* Test more

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add another test case

Signed-off-by: qqiao <qqiao@nvidia.com>

* Change back the setting for testing

Signed-off-by: qqiao <qqiao@nvidia.com>

* Revert another config file

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add descriptionf or timeout in test readme

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-25 23:30:40 -07:00
Yiqing Yan
2fee408536
Waive L0 tests (#4645)
* Waive L0 tests

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* Apply suggestions from code review

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 11:05:01 +08:00
hlu1
4a236d107d
[Fix][Deepseek] Fix bugs in TestDeepSeekR1 (#4413)
[Deepseek] Fix bugs in TestDeepSeekR1

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-24 09:52:57 +08:00
Yanchao Lu
20c15fc04f
Fix invalid testcase name (#4626)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-24 00:40:00 +08:00
dominicshanshan
ca3eaf4070
[nvbug/5028235][fix]pytest bindings tokens logtis comparison. (#4424)
* fix bug 5028235.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* fix bug 5028235 and update comments.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Update tests/unittest/bindings/test_executor_bindings.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Remove redundant code.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Update based on review comments.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-23 20:41:00 +08:00
Robin Kobus
15a59e57f6
[nvbugs/5301492] ci: waive test_workers_kv_cache_aware_router (#4617)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-23 20:14:28 +08:00
zhhuang-nv
8452775db8
[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535)
* optimize kv cache reuse workflow for MLA

write kv cache first and only call up-projection GEMM once
relax contiguous requirements of k/v for setting paged kv cache
return two contiguous tensors when loading MLA KV Cache

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* support fp8 kv cache for MLA kv cache reuse

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

* resolve comments

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>

---------

Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-23 19:47:50 +08:00
Anthony Chang
bbea2647b1
Qwen3 supports TRTLLM FP4 MoE backend (#4530)
* MoE TRTLLM backend for Qwen3

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* add extra moe_backend to test

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* address comments

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* conditionally compile kernels on newer archs

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* missing positional arg

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* Update the routing kernels

Signed-off-by: Christina Zhang <christinaz@nvidia.com>

* Revise usage of TLLM_LOG_ERROR

Signed-off-by: Christina Zhang <christinaz@nvidia.com>

* Add unit test for Qwen3 moe (trtllm_gen backend)

Signed-off-by: Christina Zhang <christinaz@nvidia.com>

* improve weight processing speed of moe_backend=TRTLLM; roughly 2x

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* tidy and minor fix

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

* temporarily disable accuracy test that has known issue

Signed-off-by: Anthony Chang <anchengc@nvidia.com>

---------

Signed-off-by: Anthony Chang <anchengc@nvidia.com>
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
Co-authored-by: Christina Zhang <christinaz@nvidia.com>
2025-05-23 18:31:08 +08:00
Yiqing Yan
3ca05330f9
Waive L0 test (#4609)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-23 15:54:11 +08:00
Bo Li
9ae705af1b
perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482)
* Add Julien's origina kernel.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Get rid of UpdateKVCache functionality.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Add kernels.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Add torch OP.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Update cmake.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Torch OP must use double as argument dtype.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Add unittest.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Add unittest.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Fix misaligned access when head_dim=64.
In this case, numElemsPerThread=2, numVecPerThread=0. But the store code incorrectly perform vectorized store, some threads (e.g., lane1) issue store to address that is not aligned to 64 bit.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Remove unroll (compiler can do that).
Cleanup code.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Add switch for interleave.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Refactor vectorized load/store.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Implement is_neox. Result not correct yet.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Fix is_neox=True.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

* Add q_weight and k_weight.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

---------

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-05-23 15:31:04 +08:00
bhsueh_NV
6527c055cf
chore: fix bug of llama lora test (#4566)
* fix bug of llama lora test

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* Update test_llm.py

fix bug detected by pre-commit

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-23 14:06:40 +08:00
coldwaterq
1cf0e672e7
fix: [nvbugs/5066257] serialization improvments (#3869)
* added a restricted pcikler and depickler in a sepparate serialization function.

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>

* updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests.

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>

* removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list.

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* cleaned up a couple files to reduce conflicts with main.

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* fix unit tests

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* reorder BASE_ZMQ_CLASSES list alphabetically

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* fix tests and move LogitsProcessor registration to base class

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* revert changes to import log of tensorrt_llm._torch.models

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* fix tests and move LogitsProcessor registration to base class

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* additional comments for multiprocess approved list sync

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* add dataclass from tests

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

---------

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Co-authored-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-05-23 13:06:29 +08:00
CarstyYou
ef280e687e
[feat] support fp8 blockscale gemm on sm89 (#4481)
* [feat] integrate ada blockwise gemm

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [fix] align scale M

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [feat] swizzle mma output

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [test] add ut for sm89

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [delete] remove useless comments

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [chore] codestyle

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [fix] fix review comments

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [chore] fix license

Signed-off-by: CarstyYou <xiy@nvidia.com>

* [chore] fix license

Signed-off-by: CarstyYou <xiy@nvidia.com>

---------

Signed-off-by: CarstyYou <xiy@nvidia.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-05-23 10:39:10 +08:00
Enwei Zhu
d7443b6068
[https://nvbugspro.nvidia.com/bug/5181262] [test] Unwaive Mistral Nemo test (#4515)
unwaive

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-23 10:14:00 +08:00
pcastonguay
d7d455e7ea
[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243)
* feat: Enabling dis serving with TRT backend with Python runtime

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing formatting

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing disagg mtp test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-22 22:01:06 -04:00
Mike Iovine
14fc48ada7
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler (#4402)
[fix] Fix chunked prefill + overlap scheduler

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-23 04:38:22 +08:00
Venky
c713eb5799
test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446)
ultra

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-22 13:07:33 -07:00
xinhe-nv
22c01d5b21
test: [CI] Add failed cases into waives.txt (#4549)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* fix test issues

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-22 17:18:53 +08:00
ruodil
1a45890dae
test: waive hanging cases for perf test (#4562)
waive hanging cases

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 15:50:05 +08:00
Kaiyu Xie
2898d268f9
feat: add health_generate route to openai serving (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856) (#4349)
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Dhruv Singal <dhruvsingalabc@gmail.com>
2025-05-22 11:46:06 +08:00
HuiGao-NV
bc9f1dbede
fix[nvbug-5228840]: Remove test cases of feature not supported anymore (#3972)
* Remove waived cases
* Remove test cases of not supported feature

Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-22 11:18:58 +08:00
Aurelien Chartier
f491244c84
feat: add dataset support for benchmark_core_model with LLMAPI (#4457)
* feat: add dataset support for benchmark_core_model with LLMAPI

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-21 19:18:43 -07:00