Enwei Zhu
3cf7066350
test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) ( #3219 )
...
* remove test_llm_models_multi_gpu.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* qwen 2.5
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* upgrade
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-02 17:29:57 +08:00
Zongfei Jing
8d48b96545
reduce test cases for deepseek ( #3211 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-02 13:57:55 +08:00
wili
34e63d07e6
feat: Variable-Beam-Width-Search (VBWS) Part2 ( #3133 )
...
* feat: Variable-Beam-Width-Search Part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part2, fix CPP tests
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part3, simplify CPP tests
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part4, move beam_width_array param
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search, fix CI error
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2, fix pre-commit
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2, fix review
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-02 12:31:28 +08:00
Zongfei Jing
c7548ad72c
perf: Add optimizations for deepseek in min latency mode ( #3093 )
...
* Add optimizations for deepseek min latency
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Update internal cutlass kernel libs
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Format code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Resolve conflicts
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-02 09:05:24 +08:00
Chang Liu
1d3a5d38af
fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions ( #3144 )
...
* Update fp8 sf layout for blackwell and enable fp8 gemm e2e
* Add test case when m needs to be padded
* Better comment
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Add TODO for fp8 quant kernel
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Enable DCO check
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Fix lint
---------
Signed-off-by: Chang Liu <liuc@nvidia.com>
2025-04-01 13:08:29 -07:00
WeiHaocheng
ff35af77ea
feat: refactor scaffolding worker and support openai api worker ( #3166 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
2025-04-01 18:31:52 +08:00
dongjiyingdjy
22ff81b047
fix:fix illeagel memory access when mtp >= 2 ( #3006 )
...
* fix - fix illeagel memory access when mtp > 2
---------
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-01 13:36:45 +08:00
bhsueh_NV
322ac565fc
chore: clean some ci of qa test ( #3083 )
...
* move some models to examples/models/contrib
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* update the document
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove arctic, blip2, cogvlm, dbrx from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove tests of dit, mmdit and stdit from qa test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove grok, jais, sdxl, skywork, smaug from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* re-organize the glm examples
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix issues after running pre-commit
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix some typo in glm_4_9b readme
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-31 14:30:41 +08:00
liji-nv
e0d0dde058
None - Add one-shot version for UB AR NORM FP16/BF16 ( #2995 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-03-31 11:16:03 +08:00
Mike Iovine
5416966ddb
Add initial EAGLE-3 implementation ( #3035 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-03-29 22:31:24 +08:00
Erin
c75d7cd684
move BuildConfig functional args to llmargs ( #3036 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-03-29 02:20:18 +08:00
Aurelien Chartier
3de82c41cd
Pytorch PP + attention DP support ( #3044 )
...
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-28 00:11:19 +08:00
xiweny
6979afa6f2
test: reorganize tests folder hierarchy ( #2996 )
...
1. move TRT path tests to 'trt' folder
2. optimize some import usage
2025-03-27 12:07:53 +08:00
Suyog Gupta
047f2b234d
perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench ( #3041 )
...
* Enable AutoDeploy as a backend in trtllm-bench
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* update how caches are resized
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* fix: files permission from 100755 to 100644
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some comments
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Fix function name
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* refactor
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Remove spurious change
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Add cursor generated doc strings
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* re-enable ad test
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some perf cleanup
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* debug ci
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* ensure that overlap scheduler is enabled
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Reorder the tests
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
---------
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-26 14:33:14 -07:00
wili
3e035f2219
v1.2 ( #3082 )
...
Signed-off-by: wili <wili@nvidia.com>
2025-03-26 23:31:29 +08:00
Dom Brown
f995a92a31
CI: Waive for https://nvbugspro.nvidia.com/bug/5189673 ( #3100 )
...
* Waive for https://nvbugspro.nvidia.com/bug/5189673
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update waive
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-03-26 19:13:43 +08:00
Enwei Zhu
224469b096
test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references ( #3069 )
...
* committed APIs validation
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* clean name
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* separate
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add TODOs
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix naming
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-26 18:14:35 +08:00
peaceh-nv
5e272eef81
feat : reduce trt engine build time in testing ( #3014 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-03-26 13:02:54 +08:00
Yuan Tong
53adb3cb4e
test: waive flaky test_kv_cache_event_async_api ( #3062 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-03-25 18:41:30 +08:00
Netanel Haber
da0b0e0ee3
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size ( #2983 )
...
* fix variable window size reuse - disable when *min attention window* starts sliding, not max
* isPreCyclic -> isCyclic, and invert logic, for clarity
* getDecoderState()
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-03-24 22:49:52 +08:00
Yan Chunwei
531b98ed62
feat: Add several pure python configs to LlmArgs ( #2997 )
...
* add SchedulerConfig
* add PeftCacheConfig
2025-03-24 16:16:17 +08:00
nv-guomingz
ec4f43a0ab
test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… ( #2987 )
...
* test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases.
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* updatet test case per review comments
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
---------
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 14:18:06 +08:00
bhsueh_NV
7413cb555a
relax the limitation of setuptools ( #2992 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-24 13:36:10 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00