bhsueh_NV
9d7d48faeb
fix: disable the kv cache reuse for prompt tuning test ( #3474 )
...
* disable the kv cache reuse for prompt tuning test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* unwaive the wavied tests
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-14 14:35:47 +08:00
brb-nv
44090a5388
Add support for Phi-4-MM ( #3296 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-14 14:24:10 +08:00
pcastonguay
fe6f14b2b1
fix: Fixing issue with first gen token being returned twice in streaming ( #3427 )
...
* fix: Fixing issue with first gen token being returned twice with streaming
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing not_expectring_strings in test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-13 22:45:09 -04:00
dominicshanshan
5d3180be82
feat: Add stress test for TRT-LLM ( #3250 )
...
Signed-off-by: Wangshanshan <dominicw@nvidia.com>
2025-04-13 10:24:25 +08:00
pcastonguay
145a126a28
chore: Unwaive DS + overlap disagg test ( #3339 )
...
* chore: Unwaive DS + overlap disagg test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing pre-commit
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing pre-commit
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-12 13:33:38 -04:00
yuxianq
29c5085400
fix: Fix PP for llama. ( #3449 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-12 17:20:27 +08:00
Iman Tabrizian
3041bbdab3
fix: Fix disagg MTP with overlap ( #3406 )
...
* fix: disagg overlap with MTP
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Review comment
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-12 12:27:24 +08:00
Enwei Zhu
5e2923bb92
test: Automatically clean checkpoints and engines ( #3468 )
...
* auto clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix tempdir
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-12 09:56:29 +08:00
Enwei Zhu
cf9ceea890
test: Add DeepSeek-V3-Lite PP=4 cases ( #3454 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-12 00:09:12 +08:00
Shunkangz
ea050084ad
feat: Add support of chat completion in PD ( #2985 )
...
* Add support of chat completion in PD
Add support of include_usage in PD
Reformat
* Remove redundant code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Add chat completion test
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor code
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-11 17:53:28 +08:00
Ivy Zhang
20e54e5c89
test: add cuda visible device constraint for phi_1gpu test ( #3364 )
...
Signed-off-by: Ivy Zhang <yanzh@nvidia.com>
2025-04-11 17:14:52 +08:00
Ivy Zhang
d998832b33
test: add torch flow test case in qa test list ( #3404 )
...
Signed-off-by: Ivy Zhang <yanzh@nvidia.com>
2025-04-11 16:57:41 +08:00
Enwei Zhu
410f56357e
test: Waive torch compile tests ( #3471 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-11 13:38:05 +08:00
Iman Tabrizian
d7f45e50c6
test: disable attention DP tests for single GPU ( #3395 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-11 01:38:17 +08:00
amitz-nv
a6a2ae6cc1
chore: Rename nvsmall to nemotron nas ( #3447 )
...
* Rename nvsmall to nemotron NAS
* Revert nvsmall to nemotron_nas rename in paths in tests that access llm_models_root/nvsmall/tests
* Add NemotronNAS to pytorch supported models table
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-04-10 23:16:52 +08:00
brb-nv
c59abae436
feat: Add Gemma3 text-only model support ( #3247 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-10 12:34:58 +08:00
QI JUN
b5473f7eca
waive llama3.1 8B test cases with pipeline parallelism ( #3433 )
...
* waive llama3.1 8B test cases with pipeline parallelism
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-10 11:07:58 +08:00
peaceh-nv
215fb20567
chore : split GptExecutor tests out of gpt tests to reduce single test time ( #3412 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-10 09:08:15 +08:00
Yechan Kim
943218b54a
feat: Add Qwen2.5-VL and refactor Qwen2-VL ( #3156 )
...
* feat: Add Qwen2.5-VL and refactor Qwen2-VL
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix yapf and codespell
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix test_e2e
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* generalize get_rope_index
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix qwen2.5-vl on REAME
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix image test
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-10 04:09:03 +08:00
Iman Tabrizian
8401722245
test: Add single gpu disaggregated tests ( #3295 )
...
* test: Add single gpu disaggregated tests
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Add deepseek with overlap tests
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Use updated prompt
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
* Move test to disaggregated folder
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
---------
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-09 09:34:45 +08:00
Mike Iovine
5bdf997963
Add Llama 4 ( #3302 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-04-09 03:35:21 +08:00
yuxianq
7225bd8b91
chore: Refine attention backend interface. ( #3271 )
...
Refine attention backend interface.
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-09 02:34:53 +08:00
wili
54ad95eaa8
Feat: Variable-Beam-Width-Search (VBWS) part3 ( #3338 )
...
* feat/Variable-Beam-Width-Search-Part3, v1.0
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat/Variable-Beam-Width-Search-Part3, v1.1
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat/Variable-Beam-Width-Search-Part3, v1.2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-08 23:51:27 +08:00
pcastonguay
02f446a9ff
chore: Adding DS V3-lite tests with overlap + cuda graph ( #3342 )
...
* chore: Adding DS V3-lite tests with overlap + cuda graph
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing pre-commit
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-08 09:36:09 -04:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. ( #3270 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
Chuang Zhu
cdb0906be4
disagg test single h100 ( #3353 )
2025-04-08 17:45:35 +08:00
amirkl94
e04f6a1b9b
fix: Fix p-tuning test bug ( #3326 )
...
* fix: Fix p-tuning test bug
* A change in the vocab_size calculation for T5Tokenizer,
introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning.
In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added.
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-04-08 17:14:00 +08:00
Enwei Zhu
8ee019f8c4
test: Accuracy test improvement (Part 3.4): Move LLaMA tests ( #3350 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 15:07:57 +08:00
Gabriel Wu
42c8574e93
fix: revert extra cmake var ( #3351 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-08 11:57:16 +08:00
pcastonguay
add5e5cd93
feat: Add option to run disaggregated serving without ctx servers,… ( #3243 )
...
* feat: Add option to run disaggregated serving without ctx servers, to benchmark gen only
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing comment in sanity check
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-07 21:56:03 -04:00
Enwei Zhu
ba019a43d6
test: Accuracy test improvement (Part 3.3): Move DeepSeek tests ( #3260 )
...
add skip
fix
fix
update
update test list
fixqa list
move bf16 to postmerge
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 07:19:04 +08:00
Gabriel Wu
376731013d
feat: use NVRTC for DeepGEMM JIT compilation ( #3239 )
...
* feat: use NVRTC for DeepGEMM JIT compilation
Signed-off-by: Zihua Wu
* fix: add license
Signed-off-by: Zihua Wu
* feat: store NVRTC JIT results in memory by default
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* feat: refinement
Signed-off-by: Zihua Wu
* test: set timeout to 7200
Signed-off-by: Zihua Wu
---------
Signed-off-by: Zihua Wu
2025-04-07 20:29:23 +08:00
YueWeng
aab6214801
test: fix conflicting test names ( #3316 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-04-07 20:10:01 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python ( #3025 )
...
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
qixiang-99
0d4d50a745
feat: no-cache attention in PyTorch workflow ( #3085 )
...
* init trtllm attn no cache
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: fix the seq_len issue and attn metadata prepare for qwen reward model test
fix: fix minor bugs after rebase
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: remove unnecessary debug logs and clean up commented code
refactor: update max_seq_len documentation and remove max_seq_len for decoder model contructor in PyTorchModelEngine
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: update calculate_ref_result function to accept tensor inputs and mask type, enhance test_attention_no_cache to support FULL and CAUSAL masks
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: remove unused BERT attention metadata conversion method and add type assertion for no cache attention in PyTorchModelEngine
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: remove use_kv_cache parameter from attention function and related classes, update documentation for KV cache handling
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: implement setAttentionMaskType method for better mask type handling and remove unused conversion function
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: streamline KV cache handling by replacing direct member access with useKVCache method and simplify token per block assignment
remove Debug code.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: Resolve comments for Python code
Simplify no cache attention metadata preparation and streamline related attributes in TrtllmAttentionMetadata
Removed the private method for converting to no cache attention metadata and integrated its logic into the prepare method. Updated the test for BERT sequence classification to reflect these changes and ensure proper handling of attention metadata.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* docs: Add is_dummy_attention field to attention metadata for simulation operations
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* refactor: add KVCacheParams to attention backend interface and import relevant metadata classes
Updated the attention backend interface to include KVCacheParams and imported TrtllmAttentionMetadata and VanillaAttentionMetadata in model_engine.py for enhanced functionality.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: fix rebase format issue
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: extend attention mask type handling in MHARunnerFixedParams
Added support for additional attention mask types (BIDIRECTIONAL, BIDIRECTIONALGLM, BLOCKSPARSE) in the MHARunnerFixedParams structure to fix the mapping issue between ContextAttentionMaskType and AttentionMaskType
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
* fix: enhance attention mask type handling in TllmGenFmhaRunnerParams
Updated the setAttentionMaskType method to include a switch-case structure for better handling of attention mask types, ensuring proper mapping and error handling for invalid types.
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
---------
Signed-off-by: Qixiang Lin <qixiangl@nvidia.com>
2025-04-05 01:54:32 +08:00
Pengyun Lin
f25c7cefb4
doc: refactor trtllm-serve examples and doc ( #3187 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-04 11:40:43 +08:00
Tracin
bb6c338730
AWQ support Modelopt ckpts. ( #3258 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-04 08:10:35 +08:00
pcastonguay
b763051ba4
chore: Refactor disaggregated serving scripts ( #3073 )
...
* chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Updating README, removing launch script
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding scripts to populate urls section of disagg config based on SLURM env vars
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-03 14:55:05 -04:00
xinhe-nv
2005e5aaaf
remove tests from qa test lists ( #3256 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-03 16:06:39 +08:00
Enwei Zhu
3cf7066350
test: Accuracy test improvement (Part 3.2): Move Qwen tests (NvBug 5135332) ( #3219 )
...
* remove test_llm_models_multi_gpu.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* qwen 2.5
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* upgrade
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-02 17:29:57 +08:00
Chuang Zhu
bc5811da65
chore: Ucx ip port remove mpi depend ( #3101 )
...
* initial ucx support
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixes to support dynloading and ucx connection establishment - not stable yet
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* update
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* more connection bringup fixes - faillig on connection vector build
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* executor test pass
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* update
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* passed full benchmark
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* changing to TLLM_THROW and removing cout
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* stoping progress thread at ucxComm destructor
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixing build with ENABLE_UCX=0 to not build ucx traget at all and removing includes for ucxConnection for cache transceiver, also delete commented cold code
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fix copyrights
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* adding ucx flavor to cache transceiver test and insertto the CI pipeline
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* allowing sending non ib interfaces IPs
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* setting UCX port reuse for the tests in pipeline
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* code review fixes
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* querying ep after GID message is sent to avoid UCX Errors
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* fixing more CR issues
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* querying ep to not fail is ep_not_connected yet
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
* remove mpi dependency and debug
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* debug to info
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* mpirun n 2
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* remove mpi comm split when disaggOrchestrator mode
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* waive disagg_mtp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use future instead of thread
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use future_promise instead of cv wait
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* connectionId type
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* improve test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* imporve test 2
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* gtest_skip
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-04-02 09:42:29 +08:00
brb-nv
1fe3e30356
Add support for Phi-4-mini ( #2990 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-02 08:34:39 +08:00
Enwei Zhu
b2f69db507
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval ( #3167 )
...
* add eval_llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
tmp commit
port to CLI tool
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
setup llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix spec_dec_algo
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
_update_from_hf_quant_config
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
migrate test_pytorch.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 block scales
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 rowwise
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
adj alpha
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move test_pytorch.py cases
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
rename test_accuracy.py to test_cli.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix cnn_dailymail
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* renaming to cli flow
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename MMLU
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add error
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-01 22:20:29 +08:00
bhsueh_NV
d34202273b
fix bug of glm-4-9b ci ( #3184 ) bug nvbug_5196515
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-01 16:58:42 +08:00
brb-nv
727d78e785
Support prequantized fp8 ckpt for nemotron-mini-4b-instruct ( #3046 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 14:52:09 +08:00
brb-nv
1901bfcf76
test: Add Eagle tests with untrained heads ( #2991 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-01 11:41:59 +08:00
Frank
8bb3eea285
perf: Readd iteration logging for trtllm-bench. ( #3039 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-01 08:13:09 +08:00
Iman Tabrizian
e8731ba3b7
fix: disable cuda graph and MTP for overlap tests ( #3155 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-03-31 11:35:35 -07:00
bhsueh_NV
322ac565fc
chore: clean some ci of qa test ( #3083 )
...
* move some models to examples/models/contrib
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* update the document
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove arctic, blip2, cogvlm, dbrx from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove tests of dit, mmdit and stdit from qa test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove grok, jais, sdxl, skywork, smaug from qa test list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* re-organize the glm examples
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix issues after running pre-commit
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix some typo in glm_4_9b readme
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-03-31 14:30:41 +08:00
Mike Iovine
5416966ddb
Add initial EAGLE-3 implementation ( #3035 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-03-29 22:31:24 +08:00