Mike Iovine
0bc520f15e
fix: Limit llama4 context length to 8k ( #3778 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-04-23 08:55:10 -07:00
shaharmor98
49262a62a5
add passing E2E LoRA flow ( #3788 )
...
add passing E2E LoRA flow (#3788 )
Signed-off-by: Shahar Mor <smor@nvidia.com>
2025-04-23 18:38:06 +03:00
Enwei Zhu
a51b3cf7a6
[TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py ( #3802 )
...
* cleanup mmlu_llmapi.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* polish
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-23 23:05:13 +08:00
Zhanrui Sun
bfc4e55ded
infra: [TRTLLM-4417]Support auto trigger special test stage for special file change ( #3478 )
...
* infra: Support auto trigger special test stage for special file change
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Fix review
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Fix review
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-23 20:32:19 +08:00
Enwei Zhu
8f2b2eaf83
test: Add DeepSeek-V3-Lite GSM8K tests ( #3771 )
...
* tmp
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update ref
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update waives
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-23 16:54:48 +08:00
xinhe-nv
b82d72bc37
update waive list ( #3696 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-23 14:18:57 +08:00
Yechan Kim
11d35656bf
fix: nvbugs/5234029 fix Qwen2.5-VL image test ( #3726 )
...
* fix: nvbugs/5234029 fix Qwen2.5-VL image test case by adding more answer candidate
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove qwen2.5_vl from waive list
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-04-23 14:09:39 +08:00
xinhe-nv
80d8fdefd6
add test_mistral_large_hidden_vocab_size tests ( #3716 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-23 13:40:11 +08:00
Zongfei Jing
1e5af736ea
Add smart router for moe ( #3641 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-23 12:21:59 +08:00
Yiqing Yan
cc161dd83d
Waive L0 tests ( #3784 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-23 11:22:11 +08:00
shaharmor98
5fff8f0935
Add running E2E LoRA flow ( #3648 )
...
* add passing E2E LoRA flow
Signed-off-by: Shahar Mor <smor@nvidia.com>
* add experimental feature
Signed-off-by: Shahar Mor <smor@nvidia.com>
* fix llma_args definition
Signed-off-by: Shahar Mor <smor@nvidia.com>
* decreased manually size of max loras to address OOM
Signed-off-by: Shahar Mor <smor@nvidia.com>
---------
Signed-off-by: Shahar Mor <smor@nvidia.com>
2025-04-23 11:19:41 +08:00
QI JUN
257abfbc51
move pytorch tests of LLM API into separate test files ( #3745 )
...
* move pytorch tests of LLM API into separate test files
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* polish
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-22 14:36:59 -07:00
Lucas Liebenwein
06b914e0f9
feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs ( #3589 )
...
* generalizing cudagraph to multiple dynamic inputs
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* fix for failing test
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-04-23 03:38:51 +08:00
Emma Qiao
442386d302
infra: Add test stages for sm120 ( #3533 )
...
* Add test stages for sm120
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update chip name and config name
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Split tests to gb202 and gb203
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Don't flash driver for rtx-5090
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Skip the failed cases
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Change the test stage names
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing
Signed-off-by: qqiao <qqiao@nvidia.com>
* Skip failed case on gb202
Signed-off-by: qqiao <qqiao@nvidia.com>
* Fix condition to dynamic driver flashing
Signed-off-by: qqiao <qqiao@nvidia.com>
---------
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-04-23 01:26:12 +08:00
Yukun He
0ae7017342
Unify two versions of AllReduce custom op ( #3032 )
...
* Rewrite unit test for unified allreduce op. Removing the legacy unit test.
* Revise formats, fusion_op bindings. Put all tensors as optional inputs.
* Move the MoeAllreduceOp to a separate custom op.
* Move all the fusion patterns to the new version of the AllReduce fusion kernel. Remove the AllReduce strategy config. Revise the AllReduce strategies and fusion pattern definitions.
* Add more TODOs, fixing minor bugs, and remove legacy code.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-04-22 21:58:42 +08:00
Ivy Zhang
47d2f16bb8
waive gemma on L20 ( #3767 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-04-22 17:52:49 +08:00
ruodil
9223000765
waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log ( #3657 )
...
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-22 14:51:45 +08:00
xinhe-nv
ba216341f4
update waive list ( #3683 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-22 11:09:41 +08:00
Yi Zhang
98966cb45e
test: Unwaive Llama 3.1 with torch compile test ( #3475 )
...
* Fix log info
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
* Revert "test: Waive torch compile tests (#3471 )"
This reverts commit 410f56357e .
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
* Update test_llm_api_pytorch.py
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
---------
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-04-22 10:41:56 +08:00
Enwei Zhu
3fa19ffa4e
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA ( #3483 )
...
* add gsm8k
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix gsm8k
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add gpqa
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* conditional import lm_eval
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* gpqa in lm_eval
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* system prompt
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* shuffle
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update AA prompt and regex
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* revert AA prompt and regex
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* integration to tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add DS-R1
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix and clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* clean up
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* free_gpu_memory_fraction=0.8
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-22 07:38:16 +08:00
Yan Chunwei
231b39015c
unwaive multi_node test ( #3715 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-21 21:26:07 +08:00
Barry Kang
d87b009d8d
Fix ModelOpt Mixtral AWQ OOM ( #3714 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-04-21 19:14:14 +08:00
Zheng Duan
ae48abefc1
bind block key and hasher ( #3712 )
2025-04-21 18:50:57 +08:00
Iman Tabrizian
af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache ( #3095 )
...
* Fix hang bug when KV cache is low
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Review comments
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Fix attentiondp typo
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Add CI test for this case
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* fix: Fix the insertion order for responder futures
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* fix: Fix disagg CPP
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
Stanley Sun
852dd0c1be
test: add llama3.2 ptp test case ( #3363 )
...
* add llama3.2 ptp test case
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* update test list
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
---------
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-04-21 15:15:45 +08:00
Zhenhuan Chen
2672f13d77
test: fix cublas_scaled_mm with aligned workspace size ( #3600 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-04-21 14:51:42 +08:00
yuxianq
faef37782a
fix: Remove ParallelConfig. ( #3678 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-21 14:14:08 +08:00
liji-nv
a51f7559a3
fix: update test_user_buffers_mm_add_prologue atol ( #3711 ) ( #3713 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-04-21 11:24:20 +08:00
Yiqing Yan
6f7f262779
Waive L0 tests ( #3709 )
...
* Waive L0 tests
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* the test is fixed in PR 3711
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
---------
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-21 11:24:00 +08:00
hlu1
31624b079a
feat: [Deepseek] Add trtllm-gen MOE FP4 MOE backend ( #3387 )
...
* Add TRT-LLM Gen MOE to Deepseek
fix fused moe rebase bug.
Fix atol in test_fp4_gemm_quantize.py
fix fused moe rebase bug.
Fix FusedMoe.
Disable 2nd routing kernel preexit
Bump routing reduction to fp32
Disable PDL for fc1
[DEBUG] Lift token limit to 16k
[Bugfix] Token limit to 16k + fp32 routing + tanh
Make fp8 tileN 8
Fix FP8 MoE + Remove redundent temp output for FP4
[FP8-only] Avoid wasting CTAs for activation kernel
fix: unblock FP8 weightloading with trtllm-gen
Remove max_token limit for trtllm-gen path
perf: avoid type-conversion and fill_ from aten
Minor fix
Signed-off-by: Hao Lu <haolu@nvidia.com>
* Fix rebase issues
Signed-off-by: Hao Lu <haolu@nvidia.com>
* Fix compile issue
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* CI clean
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Hao Lu <haolu@nvidia.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-21 10:01:33 +08:00
Emma Qiao
48db263d9a
infra: Add test list name check ( #3097 )
...
* Add steps to check test names
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct test-db command
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Switch to use a trt-llm image
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update go path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct go path
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move the test list check to test ci
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct file path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix path again
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix get path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix typo
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Skip test list check for ARM
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix expression
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Change back unrelated file
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct qa test names
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Remove a stage
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update jenkins/L0_Test.groovy
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move some steps to a python script
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix script path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Split commands and debug
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix typo
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix typo
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Also correct case name in waives list
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move check script to another folder
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update qa list after rebase
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix rebase
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Remove the perf tests under QA
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Some tests already fixed after rebase to TOT
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
---------
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-04-20 23:02:16 +08:00
QI JUN
d51ae53940
move the reset models into examples/models/core directory ( #3555 )
...
* move rest models to examples/models/core directory
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update multimodal readme
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix example path
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix cpp test
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix tensorrt test
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-19 20:48:59 -07:00
brb-nv
c35d2a7532
test: Get Eagle tests working ( #3593 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-20 00:50:57 +08:00
nv-guomingz
e70961f541
test:update waives.txt for nvbug 5219532 ( #3672 )
...
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-19 18:57:39 +08:00
Iman Tabrizian
61ee983488
fix: Fix disaggregated load balance test ( #3689 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-19 10:40:40 +08:00
hlu1
c861b6cf17
Clean up modeling_deepseek.py ( #3640 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-18 17:54:33 -07:00
Iman Tabrizian
a2f190f306
chore: Waive disaggregated load balance ( #3687 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-18 16:04:33 -07:00
Yechan Kim
5460d18b10
feat: trtllm-serve multimodal support ( #3590 )
...
* feat: trtllm-serve multimodal support
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove disable argument
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove disable
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add and separate tests and move the doc
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove block_resue arg from serve.py
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
pcastonguay
ae5671644a
feat: Disaggregated router class ( #3584 )
...
* Add draft scheduler class
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor the design
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* feat: Introduce router class for disaggregated server
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Add unit tests for router class
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding tests for disagg_utils
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing missing import
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Addressing MR review comments
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-19 00:34:12 +08:00
QI JUN
b9fce42717
enable test_ptp_quickstart_advanced_mixed_precision ( #3667 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-18 05:06:24 -07:00
Zheng Duan
bce7ea8c38
test: add kv cache event tests for disagg workers ( #3602 )
2025-04-18 18:30:19 +08:00
Yan Chunwei
2a09826ec4
fix hmac in remote mpi session ( #3649 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-18 17:47:51 +08:00
HuiGao-NV
d3608d6818
Remove dummy forward path ( #3669 )
...
Remove dummy forward path
2025-04-18 16:17:50 +08:00
Dom Brown
dbd9a83b0d
feat: Integrate GPUDirect Storage (GDS) into Executor API ( #3582 )
...
* feat: Integrate GPUDirect Storage (GDS) into Executor API
Squash of several dev commits
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-04-18 15:59:21 +08:00
Erin
4fedf0be5c
unwaive test for nvbug_5150466 ( #3552 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-04-18 15:15:58 +08:00
Emma Qiao
2f48985b9c
infra: Add step to generate new duration file ( #3298 )
...
* Add step to generate new duration file
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Install python in earlier step
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Clone repo and add debug info
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Remove debug info and only generate duration for post-merge
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Test for the new duration file
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update the duration file format
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move generate_duration.py to scripts folder and add try-catch avoiding any broken
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
---------
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
2025-04-18 12:56:31 +08:00
peaceh-nv
88cff61fa1
chore : Split more tests out of gpt tests ( #3524 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-04-18 12:04:57 +08:00
dongfengy
b71a0f76b4
test: Add llama 4 to ci ( #3520 )
...
* Add llama 4 to ci
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
* Only test trtllm
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
* Disable marverick
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
---------
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-04-18 11:25:52 +08:00
Iman Tabrizian
fc88d67675
chore: Refactor test_disaggregated.py ( #3154 )
...
* Refactor test_disaggregated.py
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Address review comments
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Remove waived tests
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* fix: Fix streaming endpoint chat completions
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-18 11:04:06 +08:00
rakib-hasan
ff3b741045
feat: adding multimodal (only image for now) support in trtllm-bench ( #3490 )
...
* feat: adding multimodal (only image for now) support in trtllm-bench
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* fix: add in load_dataset() calls to maintain the v2.19.2 behavior
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* re-adding prompt_token_ids and using that for prompt_len
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* updating the datasets version in examples as well
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* api changes are not needed
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* moving datasets requirement and removing a missed api change
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* addressing review comments
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* refactoring the quickstart example
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
---------
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-04-18 07:06:16 +08:00