xinhe-nv
500b43e90c
test: [CI] remove closed bugs ( #4345 )
...
update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-16 13:47:42 +08:00
Barry Kang
0e14941b7f
[fix] Fixed incorrect mixed precision MoE conversion ( #4351 )
...
Fix for mixed precision MoE conversion
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-05-16 13:43:41 +08:00
Tracin
46c5a56444
Support dynamic per-tensor FP8 ( #4250 )
...
* Support dynamic per-tensor FP8
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
* Update test cases.
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
---------
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-16 13:33:58 +08:00
Stanley Sun
11aa50d1ea
test: add kv cache aware test cases to qa test list ( #4257 )
...
add kv cache_aware test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-16 12:47:01 +08:00
WeiHaocheng
54d28718c7
feat: support benchmark on scaffolding ( #3328 ) ( #4286 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-05-16 12:28:49 +08:00
Zhanrui Sun
23a63ef9c1
update README version ( #4381 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-05-16 10:36:39 +08:00
QI JUN
c4cd403af9
[CI] waive test_chunked_prefill test cases ( #4380 )
...
waive test_chunked_prefill
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-16 10:27:20 +08:00
NVJiangShao
6cc3f2093a
Fix bias shape in weightOnlyGroupwiseQuantMatmulPlugin for TRT workflow ( #4348 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Co-authored-by: AIDC-AI <AIDC-AIB@365fanyi.com>
2025-05-16 10:02:30 +08:00
yuxianq
a1daa22970
doc: Add docstring for Attention and MLA module. ( #4354 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-05-16 09:37:04 +08:00
QI JUN
13cdf98278
[CI] update multi-gpu test triggering file list ( #4378 )
...
update multi-gpu test triggering file list
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-16 09:05:44 +08:00
Suyog Gupta
b0f7522c82
[AutoDeploy]feat: Add an AutoDeploy compile backend that only calls torch.compile ( #4240 )
...
* add a torch-compile backend
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* readme changes
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* plumb torch-compile through build_and_run_ad.py
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* add torch-cudagraph backend
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* update readme
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* update readme
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
* further enhanced compiler backends
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* further enhance readme
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* better specified defaults in simple_config.py
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* fix typo in simple_config.py
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* updated deepseek-v3 support
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* revert accidental deletion in AD Readme
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-16 08:38:15 +08:00
rakib-hasan
25407249a5
[TRTLLM-5054][fix] Removing repeated loading of input processor ( #4161 )
...
removing repeated loading of input processor
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-05-16 08:04:58 +08:00
Lucas Liebenwein
4883121477
[AutoDeploy] fix: disable overlap scheduler until supported ( #4365 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-15 16:19:30 -07:00
Yechan Kim
c6e2111f4e
feat: enhance trtllm serve multimodal ( #3757 )
...
* feat: enhance trtllm serve multimodal
1. made the load_image and load_video asynchronous
2. add image_encoded input support to be compatible with genai-perf
3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix bandit
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* trimming uils
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* trimming for test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* genai perf command fix
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* command fix
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* refactor chat_utils
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* stress test genai-perf command
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-05-15 16:16:31 -07:00
Iman Tabrizian
4c7191af67
Move Triton backend to TRT-LLM main ( #3549 )
...
* Move TRT-LLM backend repo to TRT-LLM repo
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Address review comments
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* debug ci
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Update triton backend
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Fixes after update
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-16 07:15:23 +08:00
Erin
c44cf34373
fix: update checks that broke medusa tests when use_py_session=True ( #4339 )
...
fix check
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-15 15:47:28 -07:00
yuxianq
4f8afe4cc6
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism ( #4034 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-16 04:16:53 +08:00
Venky
5ebe32f06f
enh: Enable option in trtllm-bench build subcommand to avoid loading weights ( #4142 )
...
* expose load_format
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* yapf
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-05-16 03:50:53 +08:00
Venky
adb0839a33
test(perf): Add Phi-4-mini-instruct to perf tests ( #4267 )
...
* add phi-4-mini-instruct
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* trim tests
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-15 21:27:03 +08:00
yuxianq
0e87fcc228
refactor: use x is None instead of x == None. ( #4244 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-15 20:00:04 +08:00
Yanchao Lu
5ce1102a02
Revert "[test] add qa test mentioned in docs" ( #4355 )
...
Revert "[test] add qa test mentioned in docs (#4248 )"
This reverts commit b0ce1371ee .
2025-05-15 18:47:30 +08:00
Stanley Sun
9d3e05486b
test: add qa test list for rtx5090 and rtx_pro_6000 ( #4254 )
...
* add test list for rtx5090 and rtx_pro_6000
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* add 2gpu llama70b test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* remove duplicate and invalid test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* add 2gpus test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
---------
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-15 17:57:31 +08:00
zhhuang-nv
d6b741ddfe
[fix] test_no_kv_cache_reuse for overlap_scheduler ( #4350 )
...
fix test_no_kv_cache_reuse for overlap_scheduler
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-15 16:43:53 +08:00
Yuan Tong
593f65ff6a
fix: better method to help torch find nvtx3 ( #4110 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-05-15 16:42:30 +08:00
ixlmar
4ee82fc0fd
chore: reduce code duplication ( #4297 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-05-15 09:25:37 +01:00
Zongfei Jing
f0ca60a95d
Add allreduce and rmsnorm fusion for qwen3 ( #4304 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-05-15 16:22:11 +08:00
xinhe-nv
14bfb5e0d6
test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus ( #4283 )
...
* update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-15 15:57:44 +08:00
zhhuang-nv
97bc680cd8
feat: support kv cache reuse for MLA ( #3571 )
...
* support kv cache reuse for MLA
load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* add CI test
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* use GPTJ style RoPE for MLA
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix rebase error and some docs
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix kv_lens
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* tiny fix
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: use normal device memory instead of pinned memory for unit test
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* fix L0 tests
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile after rebase
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments again
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
---------
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com>
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-05-15 15:22:21 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
dominicshanshan
404fbe9b32
[ https://nvbugs/5277113 ][fix]genai-perf API change stress test ( #4300 )
...
* fix bug 5277113.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
* fix bug 5277113 and 5278517.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
---------
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-15 14:12:34 +08:00
Fridah-nv
d008d6412f
feat:[AutoDeploy] Update MoE pattern matcher to drop expert selection logic ( #3283 )
...
* update matcher to match expert compute first, then extract other args with LCA
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
* support 3D and 2D input in torch.ops.moe.trtllm_fused_moe
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
* update custom ops to support 3D and 2D inputs
Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>
* update deepseek patch
Signed-off-by: Ubuntu <201670829+Fridah-nv@users.noreply.github.com>
---------
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-05-15 13:53:09 +08:00
Ivy Zhang
b0ce1371ee
[test] add qa test mentioned in docs ( #4248 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-15 13:37:11 +08:00
hlu1
3ea42e7519
[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus ( #4346 )
...
Reorganize TestDeepSeekR1::test_nvfp4_8gpus
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-15 13:09:13 +08:00
nv-guomingz
e76cf9d9fe
fix: https://nvbugs/5234033 enable starcoder trt-flow with transforme… ( #3909 )
...
fix:https://nvbugs/5234033 enable startcoder trt-flow with transformer 4.51.3.
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-15 11:16:45 +08:00
Zhanrui Sun
5dc3b539ba
infra: Down the gcc toolset version from 13 to 11 ( #4114 )
...
* Down the gcc toolset version from 13 to 11
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Update rocky8 images
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-05-15 11:08:51 +08:00
Zeyu WANG
2681b26e48
[TRTLLM-2795] feat: Add yarn support for other models in trt-flow ( #3840 )
...
Add yarn support for general models(e.g. llama, qwen) other than deepseek in trt-flow.
Signed-off-by: Zeyu Wang <zeyuw@nvidia.com>
2025-05-15 11:03:57 +08:00
Mike Iovine
f9adac3dea
[feat] Enable chunked context for flashinfer ( #4132 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-15 10:59:38 +08:00
qsang-nv
0fd59d64ab
infra: open source fmha v2 kernels ( #4185 )
...
* add fmha repo
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix format
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix code style
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix header
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix header kernel_traits.h
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* add .gitignore file
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* add SLIDING_WINDOW_ATTENTION
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix style
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* fix format
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* update setup.py
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
* update build_wheel.py
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
---------
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
2025-05-15 10:56:34 +08:00
QI JUN
498ce8a056
Revert "feat: Low Precision Allreduce for PCIe based GPU" ( #4340 )
...
Revert "feat: Low Precision Allreduce for PCIe based GPU (#3851 )"
This reverts commit 5e634dd1bd .
2025-05-15 09:52:39 +08:00
Simeng Liu
efe0972efb
doc: Add tensorrtllm_backend serving documentation in the Deepseek-V3 README ( #4338 )
...
Add tensorrtllm_backend serving option in the Deepseek-V3 README
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-15 09:31:28 +08:00
hlu1
7fb0af9320
[fix] Remove stale cublas heuristics ( #4326 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-14 17:35:51 -07:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow ( #4092 )
...
* chore: Remove GptSession/V1 from TRT workflow
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove stateful decoders
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession buffers
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession utils
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession kernels
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove V1 GPT models from tests
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSessionBenchmark from scripts and docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSession IO classes
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from test lists
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove useless encoder test
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove mActualBatchSize from DecoderState
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove static batching from ExecutorTest
- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
sugunav14
7c828d767f
feat: [AutoDeploy] DSV3 mla attn ref op ( #4272 )
...
* raw ref op + new patch untested
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
* Added mla attn ref op and unit tests for attn + module patches
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
* update stray changes in deepseek.py
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
* Updated stale documentation
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
* removed stray update in sdpa return shapes
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
---------
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
2025-05-15 01:58:20 +08:00
Faraz
42de79d49e
test: Added tests for Llama3.1-70B-BF16 on SM120 ( #4198 )
...
* Added tests for Llama3.1-70B-BF16 on SM120
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* solve conflicts add more tests
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-14 11:57:49 -04:00
Yanchao Lu
504f4bf779
[Infra] - Update the upstream PyTorch dependency to 2.7.0 ( #4235 )
...
[Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-14 22:28:13 +08:00
Robin Kobus
c67da1fbaa
fix: Eagle decoding in TRT flow ( #4229 )
...
* fix: EagleBuffers lifetime issue
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Clean up Eagle kernel parameters
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* fix: Eagle draft tokens init
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Add check for updated sequence length in TrtGptModelInflightBatching
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* fix: Skip check for beam search
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 16:10:49 +02:00
Kaiyu Xie
6c45586c51
chore: Remove deprecated Python runtime benchmark ( #4171 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-14 18:41:05 +08:00
HuiGao-NV
f4059c6e2e
Add test case for kv memory estimation ( #4158 )
...
* Add test case for kv memory estimation
* Dump running log into file and parse kv cache memory size from file
* Set bigger peak memory size for mixed percision case and test_ptp_quickstart_advanced_eagle3 case
* Revert change to usage of fraction
* use context manager to guard temp files
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-14 18:39:25 +08:00
xinhe-nv
f2bfe2f84f
test: [CI] remove closed bugs ( #4207 )
...
update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-14 17:59:05 +08:00
DylanChen-NV
206f82115d
[bug/5247505] fix: CP accuracy on Blackwell ( #4188 )
...
* fix xqa params for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* try adding B200 multi gpu test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add accuracy tests for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
---------
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-05-14 17:40:50 +08:00