Yi Zhang
98966cb45e
test: Unwaive Llama 3.1 with torch compile test ( #3475 )
...
* Fix log info
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
* Revert "test: Waive torch compile tests (#3471 )"
This reverts commit 410f56357e .
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
* Update test_llm_api_pytorch.py
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
---------
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-04-22 10:41:56 +08:00
Kaiyu Xie
a32389b4cd
fix: Remove unnecessary max call ( #3574 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-22 10:33:50 +08:00
rakib-hasan
74c13ea84f
datasets API change : datasets.load_metric => evaluate.load ( #3741 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-04-22 08:23:48 +08:00
Enwei Zhu
3fa19ffa4e
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA ( #3483 )
...
* add gsm8k
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix gsm8k
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add gpqa
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* conditional import lm_eval
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* gpqa in lm_eval
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* system prompt
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* shuffle
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update AA prompt and regex
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* revert AA prompt and regex
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* integration to tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add DS-R1
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix and clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* clean up
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* free_gpu_memory_fraction=0.8
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-22 07:38:16 +08:00
bhsueh_NV
0c07d4dc21
Fix/executor bugs ( #3681 )
...
* fix bugs of py executor
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bugs of py executor
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* revert changes about mpi_barrier()
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-22 07:23:27 +08:00
Kaiyu Xie
943f3ff8f6
Revert "Report number of context tokens in one iteration ( #3691 )" ( #3740 )
...
This reverts commit e0446a4dc0 .
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-22 01:21:43 +08:00
Yan Chunwei
231b39015c
unwaive multi_node test ( #3715 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-21 21:26:07 +08:00
Barry Kang
d87b009d8d
Fix ModelOpt Mixtral AWQ OOM ( #3714 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-04-21 19:14:14 +08:00
Zheng Duan
ae48abefc1
bind block key and hasher ( #3712 )
2025-04-21 18:50:57 +08:00
Iman Tabrizian
af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache ( #3095 )
...
* Fix hang bug when KV cache is low
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Review comments
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Fix attentiondp typo
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Add CI test for this case
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* fix: Fix the insertion order for responder futures
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* fix: Fix disagg CPP
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
Stanley Sun
852dd0c1be
test: add llama3.2 ptp test case ( #3363 )
...
* add llama3.2 ptp test case
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* update test list
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
---------
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-04-21 15:15:45 +08:00
Jinyang Yuan
bc2b01d1dd
chore: update FMHA cubin files ( #3680 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-04-21 15:04:04 +08:00
Zhenhuan Chen
2672f13d77
test: fix cublas_scaled_mm with aligned workspace size ( #3600 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-04-21 14:51:42 +08:00
katec846
eeb605abd6
feat: Offloading Multimodal embedding table to CPU in Chunked Prefill Mode ( #3380 )
...
* Feat: Offload ptable to cpu if enable_chunk_context
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Feat: offload ptable to cpu for chunk context mode
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Fix and add comment
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Update Readme for multimodal and add a new param mm_embedding_offloading
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* fix: Correct prompt table offloading condition in PromptTuningBuffers
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Clean up the code
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Add commits to explain copy from cpu <-> gpu using pinned memory
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Fix namings based on comments
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Fix format based on precommit
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
* Modify --mm_embedding_offloading flag
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
---------
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-21 14:31:01 +08:00
yuxianq
faef37782a
fix: Remove ParallelConfig. ( #3678 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-21 14:14:08 +08:00
HuiGao-NV
e0446a4dc0
Report number of context tokens in one iteration ( #3691 )
...
Report number of context tokens in one iteration
2025-04-21 13:45:28 +08:00
yuxianq
591f3d2be8
fix: Support TLLM_OVERRIDE_LAYER_NUM for llama4. ( #3679 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-21 12:28:56 +08:00
liji-nv
a51f7559a3
fix: update test_user_buffers_mm_add_prologue atol ( #3711 ) ( #3713 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-04-21 11:24:20 +08:00
Yiqing Yan
6f7f262779
Waive L0 tests ( #3709 )
...
* Waive L0 tests
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* the test is fixed in PR 3711
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
---------
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-21 11:24:00 +08:00
hlu1
31624b079a
feat: [Deepseek] Add trtllm-gen MOE FP4 MOE backend ( #3387 )
...
* Add TRT-LLM Gen MOE to Deepseek
fix fused moe rebase bug.
Fix atol in test_fp4_gemm_quantize.py
fix fused moe rebase bug.
Fix FusedMoe.
Disable 2nd routing kernel preexit
Bump routing reduction to fp32
Disable PDL for fc1
[DEBUG] Lift token limit to 16k
[Bugfix] Token limit to 16k + fp32 routing + tanh
Make fp8 tileN 8
Fix FP8 MoE + Remove redundent temp output for FP4
[FP8-only] Avoid wasting CTAs for activation kernel
fix: unblock FP8 weightloading with trtllm-gen
Remove max_token limit for trtllm-gen path
perf: avoid type-conversion and fill_ from aten
Minor fix
Signed-off-by: Hao Lu <haolu@nvidia.com>
* Fix rebase issues
Signed-off-by: Hao Lu <haolu@nvidia.com>
* Fix compile issue
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* CI clean
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Hao Lu <haolu@nvidia.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-21 10:01:33 +08:00
Emma Qiao
48db263d9a
infra: Add test list name check ( #3097 )
...
* Add steps to check test names
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct test-db command
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Switch to use a trt-llm image
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update go path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct go path
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move the test list check to test ci
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct file path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix path again
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix get path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix typo
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Skip test list check for ARM
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix expression
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Change back unrelated file
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Correct qa test names
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Remove a stage
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update jenkins/L0_Test.groovy
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move some steps to a python script
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix script path
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Split commands and debug
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix typo
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix typo
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Also correct case name in waives list
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move check script to another folder
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update qa list after rebase
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Fix rebase
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Remove the perf tests under QA
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Some tests already fixed after rebase to TOT
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
---------
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-04-20 23:02:16 +08:00
Naveassaf
f7c2eb4fa2
Update Nemotron Super and Ultra in Supported Models and add an example ( #3632 )
...
* Update Nemotron Super and Ultra in Supported Models and add an example
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
* Update README link to match new examples structure
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
---------
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-04-20 21:14:33 +08:00
hlu1
17eba98445
Refactor Deepseek tp_size calculation ( #3695 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-19 23:55:19 -07:00
QI JUN
d51ae53940
move the reset models into examples/models/core directory ( #3555 )
...
* move rest models to examples/models/core directory
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update multimodal readme
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix example path
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix cpp test
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix tensorrt test
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-19 20:48:59 -07:00
brb-nv
c35d2a7532
test: Get Eagle tests working ( #3593 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-20 00:50:57 +08:00
nv-guomingz
e70961f541
test:update waives.txt for nvbug 5219532 ( #3672 )
...
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-19 18:57:39 +08:00
yuxianq
5346f53250
feat: Introduce feature properties for attention backend. ( #3659 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-19 12:37:27 +08:00
Iman Tabrizian
61ee983488
fix: Fix disaggregated load balance test ( #3689 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-19 10:40:40 +08:00
hlu1
c861b6cf17
Clean up modeling_deepseek.py ( #3640 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-18 17:54:33 -07:00
Iman Tabrizian
a2f190f306
chore: Waive disaggregated load balance ( #3687 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-18 16:04:33 -07:00
Yechan Kim
5460d18b10
feat: trtllm-serve multimodal support ( #3590 )
...
* feat: trtllm-serve multimodal support
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove disable argument
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove disable
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add and separate tests and move the doc
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove block_resue arg from serve.py
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
mayani-nv
ce8329646f
Update run.py for draft_target_model ( #3615 )
...
This change makes the draft target model works without mismatch in the vocab size
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Co-authored-by: rakib-hasan <rhasan@nvidia.com>
2025-04-19 01:01:50 +08:00
pcastonguay
ae5671644a
feat: Disaggregated router class ( #3584 )
...
* Add draft scheduler class
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor the design
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* feat: Introduce router class for disaggregated server
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Add unit tests for router class
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding tests for disagg_utils
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing missing import
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Addressing MR review comments
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-19 00:34:12 +08:00
QI JUN
b9fce42717
enable test_ptp_quickstart_advanced_mixed_precision ( #3667 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-18 05:06:24 -07:00
Zheng Duan
bce7ea8c38
test: add kv cache event tests for disagg workers ( #3602 )
2025-04-18 18:30:19 +08:00
Yan Chunwei
2a09826ec4
fix hmac in remote mpi session ( #3649 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-18 17:47:51 +08:00
HuiGao-NV
d3608d6818
Remove dummy forward path ( #3669 )
...
Remove dummy forward path
2025-04-18 16:17:50 +08:00
Dom Brown
dbd9a83b0d
feat: Integrate GPUDirect Storage (GDS) into Executor API ( #3582 )
...
* feat: Integrate GPUDirect Storage (GDS) into Executor API
Squash of several dev commits
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-04-18 15:59:21 +08:00
Zheyu Fu
90a28b917f
feat: Add Dynasor-CoT in scaffolding examples. ( #3501 )
...
Signed-off-by: Zheyu Fu <zheyufu2@gmail.com>
Co-authored-by: Junda Chen <32371474+GindaChen@users.noreply.github.com>
Co-authored-by: Yichao Fu <57950249+fuyichao2000@users.noreply.github.com>
Co-authored-by: Andy Dai <zhongdongmin@nvidia.com>
2025-04-18 07:48:01 +00:00
Erin
4fedf0be5c
unwaive test for nvbug_5150466 ( #3552 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-04-18 15:15:58 +08:00
Yuan Tong
0b0e6d8a0a
refactor: Clean up CMakeLists.txt ( #3479 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-04-18 14:39:29 +08:00
Emma Qiao
2f48985b9c
infra: Add step to generate new duration file ( #3298 )
...
* Add step to generate new duration file
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Install python in earlier step
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Clone repo and add debug info
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Remove debug info and only generate duration for post-merge
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Test for the new duration file
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Update the duration file format
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
* Move generate_duration.py to scripts folder and add try-catch avoiding any broken
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
---------
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
2025-04-18 12:56:31 +08:00
peaceh-nv
88cff61fa1
chore : Split more tests out of gpt tests ( #3524 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-04-18 12:04:57 +08:00
dongfengy
b71a0f76b4
test: Add llama 4 to ci ( #3520 )
...
* Add llama 4 to ci
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
* Only test trtllm
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
* Disable marverick
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
---------
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-04-18 11:25:52 +08:00
Iman Tabrizian
fc88d67675
chore: Refactor test_disaggregated.py ( #3154 )
...
* Refactor test_disaggregated.py
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Address review comments
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Remove waived tests
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* fix: Fix streaming endpoint chat completions
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-18 11:04:06 +08:00
Chang Liu
b8818b45be
fix: llama4: address couple of issues in llama4 attention module ( #3491 )
...
* fix attn module for llama4
* Address comments
* Rebase to accommodate latest attn refactor and refactor l4attn
* Remove aux_stream from classic attn
* Use RMSNorm for L2Norm
* Update tensorrt_llm/_torch/models/modeling_llama.py
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Chang Liu <lc9114@gmail.com>
* Add typing informations for _attn_qkv
* Remove redundant comment
* Simplify llama4 DecoderLayer logic
---------
Signed-off-by: Chang Liu <lc9114@gmail.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-04-18 01:54:59 +00:00
Jackch-NV
1b2b112d44
fix sage attention headsize check error in bertAttentionPlugin.cpp ( #3660 )
...
Signed-off-by: Jackch-NV <69230184+Jackch-NV@users.noreply.github.com>
2025-04-18 09:28:04 +08:00
rakib-hasan
ff3b741045
feat: adding multimodal (only image for now) support in trtllm-bench ( #3490 )
...
* feat: adding multimodal (only image for now) support in trtllm-bench
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* fix: add in load_dataset() calls to maintain the v2.19.2 behavior
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* re-adding prompt_token_ids and using that for prompt_len
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* updating the datasets version in examples as well
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* api changes are not needed
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* moving datasets requirement and removing a missed api change
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* addressing review comments
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* refactoring the quickstart example
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
---------
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-04-18 07:06:16 +08:00
QI JUN
26ebd95302
chore: update multi gpu trigger file list ( #3665 )
...
* update multi gpu trigger file list
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update multi gpu trigger file list
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-17 11:19:01 -07:00
QI JUN
91660939fd
tests: waive test_llm_multi_node ( #3664 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-18 01:59:16 +08:00