ruodil
bf5b2a2e0a
test: amend regex match for perf throughput ( #4186 )
...
amend regex match for perf throughput
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 17:33:25 +08:00
chenfeiz0326
ffc13bd325
Cherry-pick: Use multi-threading to load MoE expert weights ( #4137 )
...
* Use multi-threading to load MoE expert weights
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
* Update code formatting
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Update code formatting
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
---------
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Po-Han Huang <pohanh@nvidia.com>
2025-05-09 17:29:24 +08:00
WeiHaocheng
0f01826dde
feat: support task collection for to collect information ( #3328 ) ( #3824 )
...
Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>
2025-05-09 17:09:01 +08:00
xinhe-nv
9082411a50
test: [CI] Add failed cases into waives.txt ( #4165 )
...
wavie oom tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-09 16:56:30 +08:00
Fanrong Li
0cf0fce5d3
[fix] Fix add_dummy_requests for spec decoding cases ( #4084 )
...
* fix add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* add max_seq_len to eagle3 test and fix add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix prompt_len in add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* add prepare_resource condition in add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* add some description of token_nums to add_dummy_requests and fix token_nums in torch compile warmup.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix available_tokens.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-09 16:52:51 +08:00
ruodil
5ce5b81281
test: amend default pytorch extra-llm-api-config.yml in perf test ( #4176 )
...
* amend default pytorch extra-llm-api-config.yml
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* add print info to separate cases in output log
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 16:46:48 +08:00
Shi Xiaowei
87f0f79554
fix: library path of nixl ( #4184 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-09 16:31:55 +08:00
Zhanrui Sun
e30c76c530
infra: Fix pipeline step error in post merge ( #3948 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-05-09 15:26:15 +08:00
xinhe-nv
1d26a3fd7c
test: skip tests on b200 ( #3913 )
...
* skip tests on b200
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip phi-3-128k
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-09 14:51:55 +08:00
Fanrong Li
77f8e43592
[fix] Fix relaxed acceptance to support enabling it in context phase ( #4126 )
...
* fix relaxed acceptance to support enable this feature in context phase.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix sample_and_accept_draft_tokens unit test.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-09 14:11:14 +08:00
Bo Li
e3cf3fd15f
test: Add fp8kv to DS-v3-lite integration tests. ( #3950 )
...
* Add fp8 kv cache tests to DSV3-Lite integration tests.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update gsm8k.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update CI list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update TestDeepSeekR1.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Fix test list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Need quant_config besides pytorch_config.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update waive list (bug 5239087).
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update waive list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Correct test name.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update waive list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
---------
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <bobboli0202@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-09 13:35:04 +08:00
Ivy Zhang
c91d03fa0a
test: move mistral / mixtral test cases in QA test list into the new accuracy test suite ( #3440 )
...
* add mistral-7b-v0.1 torch flow test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* rearrange mistral
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* rearrange mixtral case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove api function test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* move mistral nemo cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* move mixtral cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix failure
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix name
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix failure cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update list
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove awq llmapi test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* adjust threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix ci
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix partial comments
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix path
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update thres
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove duplicate test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix ci
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:32:02 +08:00
Ivy Zhang
c2d4c2adb6
[ https://nvbugspro.nvidia.com/bug/5260676 ]test: skip fp8 quantization case for pre-ada ( #4095 )
...
skip pre ada
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:30:16 +08:00
Yukun He
c9cac432dc
chore: Fix pipeline break caused by previous PR ( #4081 ) rebase + pipeline reuse ( #4169 )
...
Fix import break caused by rebase.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-09 12:51:02 +08:00
Mike Iovine
d80dc40135
[nvbug/5262268][fix] Fix trtllm-bench for llama 4 ( #4104 )
...
[fix] Fix trtllm-bench for llama 4
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
2025-05-08 21:27:57 -07:00
NVJiangShao
57b2fe2019
[ #4085 ][fix] Fix apply_per_channel_scale for extremely large input sequence length. ( #4089 )
...
Fix apply_per_channel_scale for extremely large input seq length.
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Co-authored-by: crazy-JiangDongHua <759421566@qq.com>
2025-05-09 11:57:01 +08:00
Erin
cdf5ae1547
fix: change pp broadcast pattern for LPs ( #4130 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-08 20:07:13 -07:00
Yi Zhang
91bf5e6a8e
[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support ( #3804 )
...
Add Piecewise CUDA Graph Support
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-05-09 11:04:01 +08:00
Stanley Sun
fb31f91e15
test: add qwen3 and disaggregated serving accuracy tests to qa test list ( #4083 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-09 11:03:02 +08:00
Yukun He
5b61486d87
chore: Clean up the legacy DeepseekAllreudceFusionOp. ( #4081 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-09 10:20:41 +08:00
bhsueh_NV
700d09ab65
[TRTLLM-5147][Qwen3] fix: fix bug of attention dp on qwen3_moe model ( #4141 )
...
* fix bug of attention dp on qwen3
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix pre-commit changes
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug of attention dp 8
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-09 09:29:39 +08:00
pcastonguay
836c142e1b
[feat] Allow overriding cli args with yaml file in trtllm-serve ( #4164 )
...
feat: Allow overriding cli args with yaml file in trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-08 21:19:05 -04:00
dongxuy04
7147efb2e8
fix: alltoall padding for chunked MoE ( #4157 )
...
fix alltoall padding for chunked MoE
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-05-09 09:01:35 +08:00
forrestl
9477661f4c
Support RingAttention in the BertAttention plugin and the DiT model ( #3661 )
...
support ring attn for bert_attention plugin and dit model
Signed-off-by: ChunhuanLin <lch_xdu@163.com>
2025-05-09 08:06:54 +08:00
Mike Iovine
9afe510367
[fix] Fix llama4 + eagle3 ( #3998 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-08 19:20:27 -04:00
Frank
57afbf6b79
Fix incorrect conversion. ( #4112 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-05-09 06:34:52 +08:00
Lucas Liebenwein
48ed38a2ac
[fix] [AutoDeploy] flashinfer usage on H100 ( #4162 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-09 06:00:57 +08:00
chenfeiz0326
7f5716ef83
Cherry-pick trtllm-gen from feat/llama4 to main ( #4086 )
...
* feat: TRT-LLM Gen FP8 MoE Llama4
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
* feat: TRT-LLM Gen llama4 MoE Top1 routing
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
* feat: add per tensor FP8 TRT-LLM Gen GEMMs
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
* Update
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Update
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Add license for cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/gemmCubins
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Add guard for routingIndicesClusterKernel
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Guard sm90+ for routingkernels
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Guard sm90+ for routingkernels
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
---------
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Nikita Korobov <nkorobov@nvidia.com>
Co-authored-by: Jiqun Tu <jtu@nvidia.com>
2025-05-08 14:13:01 -07:00
Yukun He
bb7bcc75c2
feat: Fallback to NCCL for various patterns when input size is large. ( #4080 )
...
* Fallback to NCCL for various patterns when input size is large.
Move the previous implementation to cpp side.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Revising.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
---------
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-08 11:13:13 -07:00
shaharmor98
7d94c9561f
feat: support multi lora adapters and TP ( #3885 )
...
* support multi lora, tp
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-08 23:45:45 +08:00
Shi Xiaowei
99313af242
infra: Add NIXL into the Dockerfile ( #3981 )
2025-05-08 20:38:05 +08:00
Yuan Tong
5b93273156
feat: adopt new logprob definition in PyTorch flow ( #4057 )
...
feat: align logprob definition of PyTorch flow
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
2025-05-08 20:16:40 +08:00
Enwei Zhu
74df12bbaa
[TRTLLM-4480][doc] Documentation for new accuracy test suite and trtllm-eval ( #3946 )
...
* fix formula
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update doc
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* 1st version
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* polish
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-08 19:35:23 +08:00
nv-guomingz
4dfa3ccf43
chore: enhance the cmake experience by ignoring the additional semicolon ( #3992 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-08 18:43:36 +08:00
Ivy Zhang
7666bec7c4
[TRTQA-2861][test]: add nemotron and llama4 cases into qa test ( #4053 )
...
* add MMLU, GPQADiamond check for llama-4 models
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add nomotron cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add online quant test cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove trt flow cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* adjust parallelism strategy
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix fail
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update sanity list
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix comment
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* skip nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-08 18:10:41 +08:00
xinhe-nv
4468158be4
test: [CI] remove closed bugs ( #4046 )
...
update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-08 18:04:43 +08:00
Tracin
b0dd581e6b
Fix TP8 for NVFP4 kv dupilcation. ( #4143 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-08 17:30:02 +08:00
Kaiyu Xie
d1fa80dee3
doc: TRTLLM-4797 Update perf-analysis.md ( #4100 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-08 17:24:44 +08:00
Yiqing Yan
ce8832e80f
[Infra] Waive L0 flaky test ( #4148 )
...
Waive L0 test
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-08 17:23:45 +08:00
yuanjingx87
6e1d2a1320
feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI ( #4019 )
...
* Add slurm support with RTXPro6000 PostMerge Tests
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
* remove H100 post merge test from testing
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
---------
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-08 15:15:36 +08:00
Zhanrui Sun
179efd45d4
infra: WAR for Argument list too long of globalVars[CACHED_CHANGED_FILE_LIST] ( #4131 )
...
* infra: WAR for Argument list too long of globalVars[CACHED_CHANGED_FILE_LIST]
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Fix init value
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
* Update jenkins/L0_MergeRequest.groovy
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
---------
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-08 14:36:18 +08:00
Enwei Zhu
dae6781494
test: Waive disagg accuracy test ( #4124 )
...
* waive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* waive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-08 13:39:07 +08:00
Enwei Zhu
19eeef1ddd
test: Waive test_llm cases ( #4136 )
...
* waive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Apply suggestions from code review
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-08 11:53:16 +08:00
Yan Chunwei
389614ca99
chore: remove data stage in serve example on slurm ( #4138 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-08 11:18:56 +08:00
Yanchao Lu
7175392206
[Infra] - Update code ownership rules for public APIs ( #4122 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-08 11:04:31 +08:00
Ivy Zhang
d7c51c953b
test: add INTEGRATION_TEST env var to speed up integration test ( #3618 )
...
add INTEGRATION_TEST env var
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-08 10:44:50 +08:00
zihaok
81cc60a0fd
[feat/] enable attention DP in Llama4 maverick model - part 1 ( #4065 )
...
* add feature
cosmetic changes
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
address precommit fix
cosmetic
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* add feature
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* fix bug
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* address comments
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* remove WAR
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* fix format precommit
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* Update tensorrt_llm/_torch/models/modeling_llama.py
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Signed-off-by: zihaok <161090975+zihaok@users.noreply.github.com>
---------
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
Signed-off-by: zihaok <161090975+zihaok@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-05-08 05:06:40 +08:00
hlu1
26a2679217
[Deepseek] Refactor Deepseek Decoder layer ( #4016 )
...
Refactor Deepseek Decoder layer
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-08 01:43:10 +08:00
Simeng Liu
bb766eca0a
feat: Reduce branch overhead in groupRMSNorm kernels ( #4067 )
...
* feat: Reduce branch overhead in groupRMSNorm kernels
* Fix race condition with sm < 90 and avoid all threads in one warp writing to the same shared memory.
Signed-off-by: Simeng Liu <simengl@nvidia.com>
---------
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-08 00:55:27 +08:00
Venky
a7c50cc426
enh: Update docker Makefile to use only the visible GPUs of machine ( #4097 )
...
Update Makefile
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-07 18:01:32 +08:00