Yechan Kim
943218b54a
feat: Add Qwen2.5-VL and refactor Qwen2-VL ( #3156 )
...
* feat: Add Qwen2.5-VL and refactor Qwen2-VL
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix yapf and codespell
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix test_e2e
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* generalize get_rope_index
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix qwen2.5-vl on REAME
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix image test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-10 04:09:03 +08:00
danielafrimi
47f5cf6c0d
lora_tests ( #3201 )
...
LoRA tests and layers
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Co-authored-by: Ubuntu <dafrimi@nvidia.com>
2025-04-09 18:06:52 +03:00
WeiHaocheng
6eee15900e
feat: Enhance the integrated robustness of scaffolding with __init__.py #3305 ( #3312 )
...
Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>
2025-04-09 21:13:47 +08:00
sugunav14
64abb01a36
Fix failing DSV3 unit tests ( #3385 )
...
* Skipping DSV3 module patch unit tests
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
* update tested
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
* Fixed failing unit test
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
---------
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
2025-04-09 11:57:05 +08:00
Iman Tabrizian
8401722245
test: Add single gpu disaggregated tests ( #3295 )
...
* test: Add single gpu disaggregated tests
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Add deepseek with overlap tests
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Use updated prompt
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Move test to disaggregated folder
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-09 09:34:45 +08:00
Mike Iovine
5bdf997963
Add Llama 4 ( #3302 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-04-09 03:35:21 +08:00
yuxianq
7225bd8b91
chore: Refine attention backend interface. ( #3271 )
...
Refine attention backend interface.
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-09 02:34:53 +08:00
wili
54ad95eaa8
Feat: Variable-Beam-Width-Search (VBWS) part3 ( #3338 )
...
* feat/Variable-Beam-Width-Search-Part3, v1.0
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat/Variable-Beam-Width-Search-Part3, v1.1
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat/Variable-Beam-Width-Search-Part3, v1.2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-08 23:51:27 +08:00
sugunav14
84fc07b011
feat: [TRTLLM-3510] DeepseekV3 support in AutoDeploy ( #3281 )
...
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
2025-04-08 21:47:57 +08:00
pcastonguay
02f446a9ff
chore: Adding DS V3-lite tests with overlap + cuda graph ( #3342 )
...
* chore: Adding DS V3-lite tests with overlap + cuda graph
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing pre-commit
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-08 09:36:09 -04:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. ( #3270 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
liji-nv
dca6397d1e
feat: Introduce UB allocator for pytorch flow ( #3257 )
...
* Instead of allocating UserBuffers at beginning of runtime, UB buffers
are now managed with global allocator. The allocator will dynamically
assign free UB buffer or allocate new buffer for torch tensor. It makes
userbuffers easier to use.
* In common usecase, the Userbuffers will be allocated correctly during
warm up stage. There is no dynamic allocation during inference.
* UB fusion pattern is rewroten using the new UB Allocator. It contains
following passes:
1. Fuse Quant with allreduce, replace with UB impl, and insert a
copy_to_userbuffers. Currently the normal allreduce still does not
support FP8 quant. So this need to be done in UB pass
2. Convert all supported allreduce with UB and insert copy_to_userbuffers.
3. Fuse op before ar with the copy_to_userbuffers. So the op directly
writes to the userbuffer
4. Remove userbuffers finalize if the output is connect to another UB
allreduce.
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-04-08 18:39:49 +08:00
Chuang Zhu
cdb0906be4
disagg test single h100 ( #3353 )
2025-04-08 17:45:35 +08:00
amirkl94
e04f6a1b9b
fix: Fix p-tuning test bug ( #3326 )
...
* fix: Fix p-tuning test bug
* A change in the vocab_size calculation for T5Tokenizer,
introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning.
In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added.
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-04-08 17:14:00 +08:00
Enwei Zhu
8ee019f8c4
test: Accuracy test improvement (Part 3.4): Move LLaMA tests ( #3350 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 15:07:57 +08:00
Yukun He
c678774c99
feat: Apply the new torch-flow compatible AutoTuner to both Fused MoE and NVFP4 Linear operators. ( #3151 )
...
* Several optimizations and fixings on the Autotuner.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Apply the new Python side Autotuner on current linear for nvFP4 data type.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Apply the new Python side Autotuner on MoE op
* Remove routers from cache key to improve inference perf
* Prevent unnecessary code profiling. Use do_preparation keyword to select which part should be executed during before evaluating any tactic.
* Remove try-catch inside moe profiling process.
* Move default tactic -1 to 0 transforms in cpp runner.
* Revise relavant tests.
* Predefined the bucketizing strategy for fused_moe
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Add specific_profile support for AutoTuner to bypass the standard cache search process for perf optimization
* Add specific_profile for moe
* Add specific profile for linear
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Fixing and revising according to reviewer's suggestions.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Use lru_cache for inference pref optimization.
* Revert gen_custom_cache_key feature
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Replace runner with runner id to achieve a serializable cache.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Code clean up and minor fixings.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Move all tunable runners and custom ops into torch_custom_ops.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Treat min_latency_mode as a independent dynamic tensor. Modify get_valid_tactics to suit for it.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
---------
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-04-08 14:28:36 +08:00
Gabriel Wu
f1655afb0d
feat: enable DeepGEMM by default ( #3341 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-04-08 13:58:57 +08:00
Fanrong Li
62e0876e39
Waive unittest/trt/model/test_mamba.py::TestMamba::test_loaders_mamba_130m_hf_from_checkpoint. Will fix it later. ( #3356 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-07 22:36:35 -07:00
MinaHuai
31422e7e46
add tp=2 ci test for vision encoder ( #3319 )
...
Signed-off-by: mhuai <mhuai@nvidia.com>
2025-04-07 21:46:08 -07:00
Gabriel Wu
42c8574e93
fix: revert extra cmake var ( #3351 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-08 11:57:16 +08:00
pcastonguay
add5e5cd93
feat: Add option to run disaggregated serving without ctx servers,… ( #3243 )
...
* feat: Add option to run disaggregated serving without ctx servers, to benchmark gen only
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing comment in sanity check
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-07 21:56:03 -04:00
Void
efe2ecfb37
fix: runtime error in est_deepseek_allreduce.py ( #3226 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-04-08 09:19:47 +08:00
Enwei Zhu
ba019a43d6
test: Accuracy test improvement (Part 3.3): Move DeepSeek tests ( #3260 )
...
add skip
fix
fix
update
update test list
fixqa list
move bf16 to postmerge
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 07:19:04 +08:00
Gabriel Wu
376731013d
feat: use NVRTC for DeepGEMM JIT compilation ( #3239 )
...
* feat: use NVRTC for DeepGEMM JIT compilation
Signed-off-by: Zihua Wu
* fix: add license
Signed-off-by: Zihua Wu
* feat: store NVRTC JIT results in memory by default
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* test: set timeout to 7200
Signed-off-by: Zihua Wu
---------
Signed-off-by: Zihua Wu
2025-04-07 20:29:23 +08:00
YueWeng
aab6214801
test: fix conflicting test names ( #3316 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-04-07 20:10:01 +08:00
pansicheng
ef1ba468a1
feat: support abort disconnected requests ( #3214 )
...
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
2025-04-07 16:14:58 +08:00
QI JUN
a2fad51011
chore: waive a timeout multi-GPU test case ( #3310 )
...
* debug CI timeout issue
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* waive timeout case
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-07 14:04:54 +08:00
brb-nv
017361c26c
test: Waive non-Llama Eagle tests ( #3309 )
2025-04-07 09:25:41 +08:00
tburt-nv
7a659885e3
chore: remove usernames from comments ( #3291 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-04-05 13:44:28 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python ( #3025 )
...
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
qixiang-99
0d4d50a745
feat: no-cache attention in PyTorch workflow ( #3085 )
...
* init trtllm attn no cache
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: fix the seq_len issue and attn metadata prepare for qwen reward model test
fix: fix minor bugs after rebase
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: remove unnecessary debug logs and clean up commented code
refactor: update max_seq_len documentation and remove max_seq_len for decoder model contructor in PyTorchModelEngine
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: update calculate_ref_result function to accept tensor inputs and mask type, enhance test_attention_no_cache to support FULL and CAUSAL masks
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: remove unused BERT attention metadata conversion method and add type assertion for no cache attention in PyTorchModelEngine
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: remove use_kv_cache parameter from attention function and related classes, update documentation for KV cache handling
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: implement setAttentionMaskType method for better mask type handling and remove unused conversion function
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: streamline KV cache handling by replacing direct member access with useKVCache method and simplify token per block assignment
remove Debug code.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: Resolve comments for Python code
Simplify no cache attention metadata preparation and streamline related attributes in TrtllmAttentionMetadata
Removed the private method for converting to no cache attention metadata and integrated its logic into the prepare method. Updated the test for BERT sequence classification to reflect these changes and ensure proper handling of attention metadata.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* docs: Add is_dummy_attention field to attention metadata for simulation operations
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: add KVCacheParams to attention backend interface and import relevant metadata classes
Updated the attention backend interface to include KVCacheParams and imported TrtllmAttentionMetadata and VanillaAttentionMetadata in model_engine.py for enhanced functionality.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: fix rebase format issue
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: extend attention mask type handling in MHARunnerFixedParams
Added support for additional attention mask types (BIDIRECTIONAL, BIDIRECTIONALGLM, BLOCKSPARSE) in the MHARunnerFixedParams structure to fix the mapping issue between ContextAttentionMaskType and AttentionMaskType
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: enhance attention mask type handling in TllmGenFmhaRunnerParams
Updated the setAttentionMaskType method to include a switch-case structure for better handling of attention mask types, ensuring proper mapping and error handling for invalid types.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
---------
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
2025-04-05 01:54:32 +08:00
QI JUN
059a34468c
fix deepseek multi gpu tests timeout ( #3285 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-04 16:19:02 +08:00
yuanjings-nvda
5776b99b70
fix vila test ( #3042 )
...
Signed-off-by: Yuanjing Shi <yuanjings@nvidia.com>
2025-04-04 14:30:06 +08:00
shaharmor98
ee4aab72ec
feat: Support PeftCacheManager in Torch ( #3186 )
...
* Add PeftCacheManager implementation
Signed-off-by: Shahar Mor <smor@nvidia.com>
2025-04-04 12:38:08 +08:00
Pengyun Lin
f25c7cefb4
doc: refactor trtllm-serve examples and doc ( #3187 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-04 11:40:43 +08:00
Tracin
bb6c338730
AWQ support Modelopt ckpts. ( #3258 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-04 08:10:35 +08:00
pcastonguay
b763051ba4
chore: Refactor disaggregated serving scripts ( #3073 )
...
* chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Updating README, removing launch script
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding scripts to populate urls section of disagg config based on SLURM env vars
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-03 14:55:05 -04:00
Yibin Li
32ae1564bd
update FP4 quantize layout ( #3045 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-04-03 13:13:54 -04:00
Yukun He
d138795485
Fix minor issues in test_autotuner.py and loose the cache check for test gemms. ( #3261 )
...
This test can cause nondeterministic failures on CI with unexpected kernel profiling results. Given longer delay time or cache clear will not solve the issue. Thus, loose the test checks to avoid these false alarms.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-04-03 18:24:08 +08:00
xinhe-nv
2005e5aaaf
remove tests from qa test lists ( #3256 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-03 16:06:39 +08:00
xiweny
174a5af779
doc: refine integration test guide ( #3215 )
...
* doc: refine integration test guide
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-04-03 15:36:13 +08:00
Zhanrui Sun
7f03125098
test: [TRTLLM-3994] Support only run pytorch tests ( #3013 )
...
* [TRTLLM-3994] Support only run pytorch tests
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Move perf test to TensorRT backend
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Fix review
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-03 13:46:09 +08:00
pcastonguay
b5b83009ff
chore: Reenabling get_stats_async test which seems to have been fixed by recent commit ( #3246 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-04-02 20:57:31 -07:00
Jinyang Yuan
2fdfa39ea8
fix: Fix an error related to dummy request when MTP is used ( #3146 )
2025-04-03 11:08:12 +08:00
Chuang Zhu
f5bf74bc7f
enable some disagg test ( #3203 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-03 06:10:48 +08:00
Enwei Zhu
3cf7066350
test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) ( #3219 )
...
* remove test_llm_models_multi_gpu.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* qwen 2.5
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* upgrade
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-02 17:29:57 +08:00
Zongfei Jing
8d48b96545
reduce test cases for deepseek ( #3211 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-02 13:57:55 +08:00
wili
34e63d07e6
feat: Variable-Beam-Width-Search (VBWS) Part2 ( #3133 )
...
* feat: Variable-Beam-Width-Search Part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part2, fix CPP tests
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part3, simplify CPP tests
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part4, move beam_width_array param
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search, fix CI error
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2, fix pre-commit
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2, fix review
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-02 12:31:28 +08:00
Yiqing Yan
c19b7f7c2a
waive L0 test ( #3217 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-02 11:16:22 +08:00
Chuang Zhu
bc5811da65
chore: Ucx ip port remove mpi depend ( #3101 )
...
* initial ucx support
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixes to support dynloading and ucx connection establishment - not stable yet
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* update
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* more connection bringup fixes - faillig on connection vector build
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* executor test pass
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* update
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* passed full benchmark
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* changing to TLLM_THROW and removing cout
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* stoping progress thread at ucxComm destructor
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fix copyrights
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* adding ucx flavor to cache transceiver test and insertto the CI pipeline
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* allowing sending non ib interfaces IPs
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* setting UCX port reuse for the tests in pipeline
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* code review fixes
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* querying ep after GID message is sent to avoid UCX Errors
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixing more CR issues
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* querying ep to not fail is ep_not_connected yet
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* remove mpi dependency and debug
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* debug to info
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* mpirun n 2
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* remove mpi comm split when disaggOrchestrator mode
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* waive disagg_mtp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use future instead of thread
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use future_promise instead of cv wait
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* connectionId type
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* improve test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* imporve test 2
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* gtest_skip
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-04-02 09:42:29 +08:00
Zongfei Jing
c7548ad72c
perf: Add optimizations for deepseek in min latency mode ( #3093 )
...
* Add optimizations for deepseek min latency
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Update internal cutlass kernel libs
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Format code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Resolve conflicts
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-02 09:05:24 +08:00
brb-nv
1fe3e30356
Add support for Phi-4-mini ( #2990 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-02 08:34:39 +08:00
Chang Liu
1d3a5d38af
fix: Update FP8 sf layout for Blackwell and relax blockwise GEMM assertions ( #3144 )
...
* Update fp8 sf layout for blackwell and enable fp8 gemm e2e
* Add test case when m needs to be padded
* Better comment
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Add TODO for fp8 quant kernel
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Enable DCO check
Signed-off-by: Chang Liu <liuc@nvidia.com>
* Fix lint
---------
Signed-off-by: Chang Liu <liuc@nvidia.com>
2025-04-01 13:08:29 -07:00
Enwei Zhu
b2f69db507
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval ( #3167 )
...
* add eval_llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
tmp commit
port to CLI tool
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
setup llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix spec_dec_algo
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
_update_from_hf_quant_config
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
migrate test_pytorch.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 block scales
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 rowwise
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
adj alpha
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move test_pytorch.py cases
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
rename test_accuracy.py to test_cli.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix cnn_dailymail
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* renaming to cli flow
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename MMLU
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add error
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-01 22:20:29 +08:00
amirkl94
bf02b9144f
feature: Add LoRA support for gemma ( #3068 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-04-01 19:15:55 +08:00
WeiHaocheng
ff35af77ea
feat: refactor scaffolding worker and support openai api worker ( #3166 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: fredw <20514172+WeiHaocheng@users.noreply.github.com>
2025-04-01 18:31:52 +08:00
bhsueh_NV
d34202273b
fix bug of glm-4-9b ci ( #3184 ) bug nvbug_5196515
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-01 16:58:42 +08:00
brb-nv
727d78e785
Support prequantized fp8 ckpt for nemotron-mini-4b-instruct ( #3046 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 14:52:09 +08:00
dongjiyingdjy
22ff81b047
fix:fix illeagel memory access when mtp >= 2 ( #3006 )
...
* fix - fix illeagel memory access when mtp > 2
---------
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-01 13:36:45 +08:00
brb-nv
1901bfcf76
test: Add Eagle tests with untrained heads ( #2991 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 11:41:59 +08:00
Frank
8bb3eea285
perf: Readd iteration logging for trtllm-bench. ( #3039 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-01 08:13:09 +08:00
Iman Tabrizian
e8731ba3b7
fix: disable cuda graph and MTP for overlap tests ( #3155 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-03-31 11:35:35 -07:00
bhsueh_NV
322ac565fc
chore: clean some ci of qa test ( #3083 )
...
* move some models to examples/models/contrib
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* update the document
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove arctic, blip2, cogvlm, dbrx from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove tests of dit, mmdit and stdit from qa test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove grok, jais, sdxl, skywork, smaug from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* re-organize the glm examples
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix issues after running pre-commit
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix some typo in glm_4_9b readme
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-31 14:30:41 +08:00
xinhe-nv
86f3b59f81
update waive list ( #3094 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: larryx <larryx@nvidia.com>
2025-03-31 11:42:45 +08:00
liji-nv
e0d0dde058
None - Add one-shot version for UB AR NORM FP16/BF16 ( #2995 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-03-31 11:16:03 +08:00
Mike Iovine
5416966ddb
Add initial EAGLE-3 implementation ( #3035 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-03-29 22:31:24 +08:00
Erin
c75d7cd684
move BuildConfig functional args to llmargs ( #3036 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-03-29 02:20:18 +08:00
Aurelien Chartier
3de82c41cd
Pytorch PP + attention DP support ( #3044 )
...
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
2025-03-28 00:11:19 +08:00
Fanrong Li
644a01cbbe
test: Add gpqa tests for DeepSeek models ( #3063 )
...
* Add gpqa accuracy test script
* Add gpqa accuracy tests
* Update DeepSeek-v3 doc
* Update qa test list
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-03-27 19:47:06 +08:00
xiweny
6979afa6f2
test: reorganize tests folder hierarchy ( #2996 )
...
1. move TRT path tests to 'trt' folder
2. optimize some import usage
2025-03-27 12:07:53 +08:00
Yan Chunwei
82edd90350
fix gpus_per_node in trtllm-bench when world_size < device_count ( #3007 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-27 09:31:40 +08:00
Dom Brown
60d4dacc47
Port multi GPU changes to GitHub ( #3027 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-03-27 05:55:03 +08:00
Suyog Gupta
047f2b234d
perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench ( #3041 )
...
* Enable AutoDeploy as a backend in trtllm-bench
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* update how caches are resized
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* fix: files permission from 100755 to 100644
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some comments
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Fix function name
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* refactor
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Remove spurious change
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Add cursor generated doc strings
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* re-enable ad test
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some perf cleanup
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* debug ci
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* ensure that overlap scheduler is enabled
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Reorder the tests
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
---------
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-26 14:33:14 -07:00
wili
3e035f2219
v1.2 ( #3082 )
...
Signed-off-by: wili <wili@nvidia.com>
2025-03-26 23:31:29 +08:00
Dom Brown
f995a92a31
CI: Waive for https://nvbugspro.nvidia.com/bug/5189673 ( #3100 )
...
* Waive for https://nvbugspro.nvidia.com/bug/5189673
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update waive
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-03-26 19:13:43 +08:00
Enwei Zhu
224469b096
test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references ( #3069 )
...
* committed APIs validation
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* clean name
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* separate
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add TODOs
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix naming
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-26 18:14:35 +08:00
Ivy Zhang
3e116c9687
test: add random image test for llama-3.2-11b-vision ( #3055 )
...
* add random image test for llama-3.2-11b-vision
Signed-off-by: Ivy Zhang <yanzh@nvidia.com>
* rename case
Signed-off-by: Ivy Zhang <yanzh@nvidia.com>
---------
Signed-off-by: Ivy Zhang <yanzh@nvidia.com>
Co-authored-by: Larry <larryx@nvidia.com>
CI got Passed: https://nv/trt-llm-cicd/job/helpers/job/PR_Github/522/
2025-03-26 15:38:16 +08:00
Aurelien Chartier
0ec7b5701f
chore: Handle qwen2audio inputs ids expansion during processing ( #3080 )
...
* Handle qwen2audio inputs ids expansion during processing
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
* remove more dead code
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
* fix yapf
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
---------
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-03-26 15:00:27 +08:00
Yechan Kim
3c7cb6629c
Add EXAONE-Deep ( #3054 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-03-26 14:24:04 +08:00
kxdc
e6cb34d921
test: fix QA TRT integration testlist mismatch issue ( #3090 )
...
Incorrect test-db context caused empty test list output.
Fix by typo correction: `llm_trt_*` -> `trt_llm_*`.
Signed-off-by: kxdc <xink@nvidia.com>
2025-03-26 14:03:21 +08:00
peaceh-nv
5e272eef81
feat : reduce trt engine build time in testing ( #3014 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-03-26 13:02:54 +08:00
Anurag Mukkara
7361c7d401
Add second possible output ( #3043 )
...
Signed-off-by: Anurag Mukkara <amukkara@nvidia.com>
2025-03-25 12:59:27 -07:00
Enwei Zhu
f93ac9672e
clean ( #3061 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-03-25 21:55:08 +08:00
Chuang Zhu
110c6fc0f0
wait long time for disagg test ( #2998 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-03-25 20:52:38 +08:00
Yuan Tong
53adb3cb4e
test: waive flaky test_kv_cache_event_async_api ( #3062 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-03-25 18:41:30 +08:00
bhsueh_NV
5724c61934
chore: fix bug of model paths in confset.py ( #3011 )
...
* fix bugs of model paths of models in examples/models/contrib/
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug of code layout
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug of test_multimodal.py
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* add gptj_example_root back
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 17:00:44 +08:00
xiweny
aacb8d66f4
doc: document running CI stage locally ( #3060 )
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-03-25 16:18:17 +08:00
QI JUN
a8ec1cc4ea
remove examples/test_gptj.py::test_llm_gptj_fp8_manage_weights_summary test case ( #3057 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-03-25 15:41:27 +08:00
Yan Chunwei
69feafc947
fix: amend the test list ( #3056 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 14:17:36 +08:00
bhsueh_NV
ed84f8f923
fix bug of test_phi ( #3050 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-25 13:12:06 +08:00
Yan Chunwei
c29cebf79d
Deprecate model_api examples ( #2999 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-25 09:37:20 +08:00
Enwei Zhu
705eef68c2
test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite ( #2982 )
...
* Accuracy test improvement (Part 2)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* WAR OOM
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
update
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-25 07:34:10 +08:00
Netanel Haber
da0b0e0ee3
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size ( #2983 )
...
* fix variable window size reuse - disable when *min attention window* starts sliding, not max
* isPreCyclic -> isCyclic, and invert logic, for clarity
* getDecoderState()
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-03-24 22:49:52 +08:00
Yan Chunwei
531b98ed62
feat: Add several pure python configs to LlmArgs ( #2997 )
...
* add SchedulerConfig
* add PeftCacheConfig
2025-03-24 16:16:17 +08:00
nv-guomingz
ec4f43a0ab
test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… ( #2987 )
...
* test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_seek_v2 test cases.
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* updatet test case per review comments
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
---------
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-03-24 14:18:06 +08:00
bhsueh_NV
7413cb555a
relax the limitation of setuptools ( #2992 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-24 13:36:10 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
be17881062
Update TensorRT-LLM ( #2582 )
2024-12-16 21:50:47 -08:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM ( #2562 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM ( #2532 )
...
* blossom-ci.yml: run vulnerability scan on blossom
* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec
---------
Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM ( #2502 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00
Kaiyu Xie
535c9cc673
Update TensorRT-LLM ( #2460 )
2024-11-19 18:30:34 +08:00
Kaiyu Xie
c629546ce4
Update TensorRT-LLM ( #2436 )
2024-11-12 15:27:49 +08:00
Kaiyu Xie
b7868dd1bd
Update TensorRT-LLM ( #2413 )
2024-11-05 16:27:06 +08:00
Kaiyu Xie
f14d1d433c
Update TensorRT-LLM ( #2389 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
2024-10-29 22:24:38 +08:00
Kaiyu Xie
1730a587d8
Update TensorRT-LLM ( #2363 )
...
* Update TensorRT-LLM
---------
Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>
2024-10-22 20:27:35 +08:00
Kaiyu Xie
75057cd036
Update TensorRT-LLM ( #2333 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Puneesh Khanna <puneesh.khanna@tii.ae>
Co-authored-by: Ethan Zhang <26497102+ethnzhng@users.noreply.github.com>
2024-10-15 15:28:40 +08:00
Kaiyu Xie
8681b3a4c0
open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 ( #2297 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Bhuvanesh Sridharan <bhuvanesh.sridharan@sprinklr.com>
Co-authored-by: Qingquan Song <ustcsqq@gmail.com>
2024-10-08 12:19:19 +02:00
Dan Blanaru
48686bca3a
open source 7f370deb0090d885d7518c2b146399ba3933c004 ( #2273 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Qingquan Song <ustcsqq@gmail.com>
2024-09-30 13:51:19 +02:00
Kaiyu Xie
e153372759
Update TensorRT-LLM ( #2253 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Ivan Sorokin <isorokin@nvidia.com>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-09-24 17:27:31 +02:00
Kaiyu Xie
fe7dc6ad4e
Update TensorRT-LLM ( #2230 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Yi Wang <yi.wang.2005@gmail.com>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-09-17 14:39:09 +08:00
Kaiyu Xie
31ac30e928
Update TensorRT-LLM ( #2215 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>
2024-09-10 18:21:22 +08:00
Kaiyu Xie
78f5c2936b
Update TensorRT-LLM ( #2184 )
2024-09-03 12:14:23 +02:00
石晓伟
b8fc6633ba
Update TensorRT-LLM ( #2156 )
...
Co-authored-by: Bruno Magalhaes <bruno.magalhaes@synthesia.io>
2024-08-27 18:20:59 +08:00
石晓伟
32ed92e449
Update TensorRT-LLM
...
Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com>
Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
2024-08-20 18:55:15 +08:00
Kaiyu Xie
74b324f667
Update TensorRT-LLM ( #2110 )
2024-08-13 22:34:33 +08:00
Kaiyu Xie
be9cd719f7
Update TensorRT-LLM ( #2094 )
...
* Update TensorRT-LLM
---------
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
Co-authored-by: Tayef Shah <tayefshah@gmail.com>
Co-authored-by: lfz941 <linfanzai941@gmail.com>
2024-08-07 16:44:43 +08:00
Kaiyu Xie
a681853d38
Update TensorRT-LLM ( #2053 )
2024-07-30 21:25:01 +08:00
Kaiyu Xie
93293aa46d
open source 315e9f5ccd286e906d4c0d402fefbf2f69a1febe ( #2033 )
2024-07-26 16:19:24 +08:00
Kaiyu Xie
5fa9436e17
Update TensorRT-LLM ( #2016 )
2024-07-24 19:50:28 +08:00
Kaiyu Xie
bca9a33b02
Update TensorRT-LLM ( #2008 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Timur Abishev <abishev.timur@gmail.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Co-authored-by: Saeyoon Oh <saeyoon.oh@furiosa.ai>
Co-authored-by: hattizai <hattizai@gmail.com>
2024-07-23 23:05:09 +08:00
Kaiyu Xie
2d234357c6
Update TensorRT-LLM ( #1954 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Altair-Alpha <62340011+Altair-Alpha@users.noreply.github.com>
2024-07-16 15:30:25 +08:00
Kaiyu Xie
a96cccafcf
Update TensorRT-LLM ( #1918 )
2024-07-09 14:42:22 +08:00
Kaiyu Xie
9dbc5b38ba
Update TensorRT-LLM ( #1891 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Marks101 <markus.schnoes@gmx.de>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
2024-07-04 14:37:19 +08:00
Kaiyu Xie
9691e12bce
Update TensorRT-LLM ( #1835 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2024-06-25 21:10:30 +08:00
石晓伟
2a115dae84
Update TensorRT-LLM ( #1793 )
...
Co-authored-by: DreamGenX <x@dreamgen.com>
Co-authored-by: Ace-RR <78812427+Ace-RR@users.noreply.github.com>
Co-authored-by: bprus <39293131+bprus@users.noreply.github.com>
Co-authored-by: janpetrov <janpetrov@icloud.com>
2024-06-18 18:18:23 +08:00
Kaiyu Xie
db4edea1e1
Update TensorRT-LLM ( #1763 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Kota Tsuyuzaki <bloodeagle40234@gmail.com>
Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
2024-06-11 16:59:02 +08:00
Kaiyu Xie
b777bd6475
Update TensorRT-LLM ( #1725 )
...
* Update TensorRT-LLM
---------
Co-authored-by: RunningLeon <mnsheng@yeah.net>
Co-authored-by: Tlntin <TlntinDeng01@Gmail.com>
Co-authored-by: ZHENG, Zhen <zhengzhen.z@qq.com>
Co-authored-by: Pham Van Ngoan <ngoanpham1196@gmail.com>
Co-authored-by: Nathan Price <nathan@abridge.com>
Co-authored-by: Tushar Goel <tushar.goel.ml@gmail.com>
Co-authored-by: Mati <132419219+matichon-vultureprime@users.noreply.github.com>
2024-06-04 20:26:32 +08:00
Kaiyu Xie
f430a4b447
Update TensorRT-LLM ( #1688 )
...
* Update TensorRT-LLM
---------
Co-authored-by: IbrahimAmin <ibrahimamin532@gmail.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com>
Co-authored-by: CoderHam <hemant@cohere.com>
Co-authored-by: Konstantin Lopuhin <kostia.lopuhin@gmail.com>
2024-05-28 20:07:49 +08:00
Kaiyu Xie
5d8ca2faf7
Update TensorRT-LLM ( #1639 )
...
* Update TensorRT-LLM
---------
Co-authored-by: vonjackustc <fga@mail.ustc.edu.cn>
2024-05-21 17:51:02 +08:00
Kaiyu Xie
bf0a5afc92
Update TensorRT-LLM ( #1598 )
...
* Update TensorRT-LLM
2024-05-14 16:43:41 +08:00
Kaiyu Xie
89ba1b1a67
Update TensorRT-LLM ( #1554 )
2024-05-07 23:34:28 +08:00
Kaiyu Xie
06c0e9b1ec
Update TensorRT-LLM ( #1530 )
2024-04-30 17:19:10 +08:00
Kaiyu Xie
66ef1df492
Update TensorRT-LLM ( #1492 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Loki <lokravi@amazon.com>
2024-04-24 14:44:22 +08:00
Kaiyu Xie
71d8d4d3dc
Update TensorRT-LLM ( #1455 )
2024-04-16 19:40:08 +08:00
Kaiyu Xie
035b99e0d0
Update TensorRT-LLM ( #1427 )
...
* Update TensorRT-LLM
---------
Co-authored-by: meghagarwal <16129366+megha95@users.noreply.github.com>
2024-04-09 17:03:34 +08:00
石晓伟
850b6fa1e7
Update TensorRT-LLM ( #1358 )
...
Co-authored-by: Kaiyu <26294424+kaiyux@users.noreply.github.com>
2024-03-26 20:47:14 +08:00
Kaiyu Xie
66ca3378c6
Update TensorRT-LLM ( #1315 )
2024-03-19 17:36:42 +08:00
Kaiyu Xie
4bb65f216f
Update TensorRT-LLM ( #1274 )
...
* Update TensorRT-LLM
---------
Co-authored-by: meghagarwal <16129366+megha95@users.noreply.github.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-03-12 18:15:52 +08:00
Kaiyu Xie
728cc0044b
Update TensorRT-LLM ( #1233 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-03-05 18:32:53 +08:00
Kaiyu Xie
655524dd82
Update TensorRT-LLM ( #1168 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Bhuvanesh Sridharan <bhuvan.sridharan@gmail.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-02-27 17:37:34 +08:00
Kaiyu Xie
0f041b7b57
Update TensorRT-LLM ( #1098 )
...
* Update TensorRT-LLM
* update submodule
* Remove unused binaries
2024-02-18 15:48:08 +08:00
Kaiyu Xie
0ab9d17a59
Update TensorRT-LLM ( #1055 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-02-06 18:38:07 +08:00
Kaiyu Xie
e06f537e08
Update TensorRT-LLM ( #1019 )
...
* Update TensorRT-LLM
---------
Co-authored-by: erenup <ping.nie@pku.edu.cn>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-31 21:55:32 +08:00
Kaiyu Xie
b57221b764
Update TensorRT-LLM ( #941 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-23 23:22:35 +08:00
Kaiyu Xie
c89653021e
Update TensorRT-LLM (20240116) ( #891 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Eddie-Wang1120 <81598289+Eddie-Wang1120@users.noreply.github.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-16 20:03:11 +08:00
Kaiyu Xie
d879430b04
Update TensorRT-LLM ( #846 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-09 21:03:35 +08:00
Kaiyu Xie
deaae40bd7
Update TensorRT-LLM ( #787 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-02 17:54:32 +08:00
Kaiyu Xie
d37b507f41
Update TensorRT-LLM main branch ( #754 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-12-27 17:41:24 +08:00
Kaiyu Xie
a75618df24
Update TensorRT-LLM ( #667 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 0xymoro <jerrymeng100@gmail.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-12-15 22:14:51 +08:00
Kaiyu Xie
f7eca56161
Update TensorRT-LLM ( #613 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: zhang-ge-hao <842720660@qq.com>
2023-12-08 17:49:24 +08:00
Kaiyu Xie
71f60f6df0
Update TensorRT-LLM ( #524 )
2023-12-01 22:27:51 +08:00
Kaiyu Xie
711a28d9bf
Update TensorRT-LLM ( #465 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-11-24 22:12:26 +08:00
Kaiyu Xie
6755a3f077
Update TensorRT-LLM ( #422 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Tltin <TltinDeng01@gmail.com>
Co-authored-by: zhaohb <zhaohbcloud@126.com>
Co-authored-by: Bradley Heilbrun <brad@repl.it>
Co-authored-by: nqbao11 <nqbao11.01@gmail.com>
Co-authored-by: Nikhil Varghese <nikhil@bot-it.ai>
2023-11-18 00:05:54 +08:00
Kaiyu Xie
b2fd493c16
Update TensorRT-LLM ( #349 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-11-10 22:30:31 +08:00
Kaiyu Xie
f044eb8d94
Update TensorRT-LLM ( #302 )
...
* Update TensorRT-LLM
---------
Co-authored-by: wangruohui <12756472+wangruohui@users.noreply.github.com>
2023-11-07 19:51:58 +08:00
Kaiyu Xie
4de32a86ae
Update TensorRT-LLM ( #188 )
...
* Update batch manager
* Update src
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: jdemouth-nvidia <11447840+jdemouth-nvidia@users.noreply.github.com>
2023-10-30 16:06:41 +08:00
Kaiyu Xie
d8b408e6dc
Update TensorRT-LLM ( #148 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-10-27 12:10:00 +08:00
Kaiyu Xie
75b6210ff4
Kaiyu/update main ( #5 )
...
* Update
* Update
2023-10-18 22:38:53 +08:00
Kevin Xie
027cd518e3
Update
2023-10-10 23:22:17 -07:00
Kevin Xie
6e9e318e91
Update code
2023-09-28 09:00:05 -07:00
Kaiyu Xie
23bc5b7c49
Initial commit
2023-09-20 00:29:41 -07:00