Yukun He
179c7dc501
[ https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. ( #7960 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Mike Iovine
c88913dc03
[ https://nvbugs/5541545 ][fix] Remove test_llama4 ( #8031 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-08 15:20:15 -07:00
Yan Chunwei
5342c607cd
[ https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case ( #7717 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yanchao Lu
5c8b022d1e
[None][ci] Test waives for the release/1.0 branch 09/15 ( #7700 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Simeng Liu
99995846b3
[ https://nvbugs/5470782 ][chore] Remove the skip statement in 1.0 rele… ( #7573 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Chang Liu
75b8a90816
[None][fix] Fix llama4 multimodal by skipping request validation ( #6957 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-20 21:58:53 -04:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
Bo Li
26f413ad90
[ https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case ( #6882 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Yanchao Lu
2013034948
[Test] - Waive or fix few known test failures ( #5769 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 21:14:16 +08:00
Emma Qiao
a0135c0f6f
[Infra] - Waive failed cases on release/0.21 ( #5674 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-04 13:14:13 +08:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Mike Iovine
73389d6531
[fix] Fix llama 4 long context ( #4809 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-04 07:48:08 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
amirkl94
fbec0c3552
Release 0.20 to main ( #4577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible ( #3439 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
Mike Iovine
906cddffb0
[infra] Improve llama4 parallelism test coverage ( #3821 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-02 16:15:04 -04:00
milesial
362a8272f8
feat: llama4 input processor ( #3383 )
...
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-25 16:47:14 -07:00
Mike Iovine
0bc520f15e
fix: Limit llama4 context length to 8k ( #3778 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-04-23 08:55:10 -07:00
dongfengy
b71a0f76b4
test: Add llama 4 to ci ( #3520 )
...
* Add llama 4 to ci
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
* Only test trtllm
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
* Disable marverick
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
---------
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-04-18 11:25:52 +08:00
Mike Iovine
5bdf997963
Add Llama 4 ( #3302 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-04-09 03:35:21 +08:00
Enwei Zhu
8ee019f8c4
test: Accuracy test improvement (Part 3.4): Move LLaMA tests ( #3350 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 15:07:57 +08:00
Enwei Zhu
ba019a43d6
test: Accuracy test improvement (Part 3.3): Move DeepSeek tests ( #3260 )
...
add skip
fix
fix
update
update test list
fixqa list
move bf16 to postmerge
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 07:19:04 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python ( #3025 )
...
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
Zongfei Jing
8d48b96545
reduce test cases for deepseek ( #3211 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-02 13:57:55 +08:00
dongjiyingdjy
22ff81b047
fix:fix illeagel memory access when mtp >= 2 ( #3006 )
...
* fix - fix illeagel memory access when mtp > 2
---------
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-01 13:36:45 +08:00
Erin
c75d7cd684
move BuildConfig functional args to llmargs ( #3036 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-03-29 02:20:18 +08:00
Aurelien Chartier
3de82c41cd
Pytorch PP + attention DP support ( #3044 )
...
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-28 00:11:19 +08:00
xiweny
6979afa6f2
test: reorganize tests folder hierarchy ( #2996 )
...
1. move TRT path tests to 'trt' folder
2. optimize some import usage
2025-03-27 12:07:53 +08:00
Dom Brown
f995a92a31
CI: Waive for https://nvbugspro.nvidia.com/bug/5189673 ( #3100 )
...
* Waive for https://nvbugspro.nvidia.com/bug/5189673
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update waive
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-03-26 19:13:43 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00