Yukun He
cbca6505ff
[nvbugs/5268808][fix] Fix the list-out-of-range access issue of AllReduce workspace on multi-node. ( #4159 )
...
This issue is found for tp=ep=8 on the multi-node machine due to the inconsistent PP sizes.
* Reform the workspace allocation implementation to avoid the list-out-of-range issues.
* Disable min_latency_mode under the multi-node scenario to avoid the illegal memory access issue.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-13 17:17:25 +08:00
Perkz Zheng
e8d7834c50
fix: [ https://nvbugspro.nvidia.com/bug/5238626 ] illegal memory address when running llama 4 with cuda graph enabled ( #4101 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-13 14:58:54 +08:00
v-shobhit
1770dd96d8
Fix Pipeline Parallelism in Llama4 ( #4106 )
...
Signed-off-by: Shobhit Verma <shobhitv@nvidia.com>
2025-05-12 22:54:37 -07:00
nvpohanh
13c8e5a8a8
feat: Prefetch safetensors files before loading them ( #4140 )
...
Prefetching safetensors files so that they are stored in the system file
cache. This significantly speeds up the model weight loading for the
very first run after entering the docker container.
This is beneficial because model weight loading is done layer-by-layer,
which means reading from the safetensors chunk-by-chunk, and that cannot
utilize the internet bandwidth very well, assuming that these files are
stored in some network drives. Instead, loading the whole files in bulk
can achieve higher internet bandwidth utilization.
When running with world_size>1, all ranks collaboratedly prefetch these
files.
In theory, we should add heuristics to decide whether to prefetch the
files or not, but that is beyond the scope of this commit.
For example, when the CPU memory is small, doing prefetching may result
in file cache thrashing, resulting in slower weight loading time.
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-05-13 13:35:30 +08:00
pcastonguay
9643be5f20
[TRTLLM-5050][feat] Enable per-request stats with PyT backend ( #4156 )
...
* feat: Add per-request stats support with PyT backend
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding unit test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing stats unit test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing test with overlap
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-12 21:35:15 -04:00
Simeng Liu
286a789549
feat: Add heuristic for GroupRMSNorm kernel selection. ( #4047 )
...
* feat: Add heuristic for GroupRMSNorm kernel selection.
Implements a logistic regression model to dynamically select between:
- GroupRMSNormBaseKernel: Allocates warps proportional to sum of dimensions
(better SM occupancy in most cases)
- GroupRMSNormLargeBatch: Allocates warps proportional to max dimension
(better block scheduling in large batch scenarios)
Selection heuristic considers batch size, allocated warps, and scheduling
efficiency on the current GPU architecture. Models for Compute Capability
9.x and 10.x are trained base on nsys kernel runtime data.
The default kernel selection is the base kernel.
The python operator group_rms_norm will use the heuristic by default.
User can pick to use the base or large batch kernels as well.
Signed-off-by: Simeng Liu <simengl@nvidia.com>
* Address the comments.
Signed-off-by: Simeng Liu <simengl@nvidia.com>
---------
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-13 08:52:53 +08:00
Erin
4becf32360
fix: reshape token_ids for lp in torch backend ( #4239 )
...
reshape token_ids
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-13 08:43:47 +08:00
wili
eba3623a54
Feat: Variable-Beam-Width-Search (VBWS) part4 ( #3979 )
...
* feat/vbws-part4-v1.8: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* feat/vbws-part4-v1.9: fix incorrect output when using short output length
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.1: remove useless variables
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.2:fix incorrect output when using short output length
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.3: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.4: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.5: remove API change
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-12 22:32:29 +02:00
yuxianq
a4c3359513
fix: Reset planned states to avoid memory leak in TrtllmAttentionWrapper ( #4227 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-12 23:25:54 +08:00
Fridah-nv
3dbb087292
[TRTLLM-5188] fix: [AutoDeploy] update output shape of prepare_fused_mha_metadata_fake ( #4199 )
...
* update output shape of fake kernel prepare_fused_mha_metadata_fake
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
* minor
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
---------
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-05-12 11:11:40 -04:00
yuxianq
b35f9a67f9
refactor: Allow models to override apply_qk_norm. ( #4078 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-12 19:38:24 +08:00
Zheng Duan
c9e2a963e0
feat: add kv cache aware router ( #3831 )
...
* kv cache aware router
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* add tests
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* router config
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
add test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction detect in worker test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* move worker tests to single gpu
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* reduce memory fraction
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* fix partial block
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
---------
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding ( #4066 )
...
* finish
Signed-off-by: Ubospica <ubospica@gmail.com>
* update
Signed-off-by: Ubospica <ubospica@gmail.com>
* update
Signed-off-by: Ubospica <ubospica@gmail.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* exc overlap scheduler
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix api ref
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Zhenhuan Chen
9212e9a740
[TRTLLM-4911] feat(scaffolding): make sampling_params only setable by controller ( #4151 )
...
feat(scaffolding): make sampling_params only setable by controller
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-05-12 15:29:09 +08:00
Chuang Zhu
1333f4f5d5
remove cache_transceiver_prealloc_size ( #4153 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-12 11:53:53 +08:00
Frank
0dcf47f1c2
[TRTLLM-4717][perf] Set CUDA graph max batch size and padding in throughput benchmark. ( #3875 )
...
* Set cuda graph max batch size.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Set padding.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-05-09 23:20:52 +08:00
Mike Iovine
4b8ba7ad61
[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue ( #4069 )
...
[fix] Fix llama 4 test lists
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-09 22:45:14 +08:00
chenfeiz0326
ffc13bd325
Cherry-pick: Use multi-threading to load MoE expert weights ( #4137 )
...
* Use multi-threading to load MoE expert weights
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
* Update code formatting
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Update code formatting
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
---------
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Po-Han Huang <pohanh@nvidia.com>
2025-05-09 17:29:24 +08:00
WeiHaocheng
0f01826dde
feat: support task collection for to collect information ( #3328 ) ( #3824 )
...
Signed-off-by: fredw (generated by with_the_same_user script) <20514172+WeiHaocheng@users.noreply.github.com>
2025-05-09 17:09:01 +08:00
Fanrong Li
0cf0fce5d3
[fix] Fix add_dummy_requests for spec decoding cases ( #4084 )
...
* fix add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* add max_seq_len to eagle3 test and fix add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix prompt_len in add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* add prepare_resource condition in add_dummy_requests.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* add some description of token_nums to add_dummy_requests and fix token_nums in torch compile warmup.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix available_tokens.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-09 16:52:51 +08:00
Fanrong Li
77f8e43592
[fix] Fix relaxed acceptance to support enabling it in context phase ( #4126 )
...
* fix relaxed acceptance to support enable this feature in context phase.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
* fix sample_and_accept_draft_tokens unit test.
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-09 14:11:14 +08:00
Yukun He
c9cac432dc
chore: Fix pipeline break caused by previous PR ( #4081 ) rebase + pipeline reuse ( #4169 )
...
Fix import break caused by rebase.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-09 12:51:02 +08:00
Mike Iovine
d80dc40135
[nvbug/5262268][fix] Fix trtllm-bench for llama 4 ( #4104 )
...
[fix] Fix trtllm-bench for llama 4
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
2025-05-08 21:27:57 -07:00
Erin
cdf5ae1547
fix: change pp broadcast pattern for LPs ( #4130 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-08 20:07:13 -07:00
Yi Zhang
91bf5e6a8e
[TRTLLM-3105][feat] Add Piecewise CUDA Graph Support ( #3804 )
...
Add Piecewise CUDA Graph Support
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-05-09 11:04:01 +08:00
Yukun He
5b61486d87
chore: Clean up the legacy DeepseekAllreudceFusionOp. ( #4081 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-09 10:20:41 +08:00
bhsueh_NV
700d09ab65
[TRTLLM-5147][Qwen3] fix: fix bug of attention dp on qwen3_moe model ( #4141 )
...
* fix bug of attention dp on qwen3
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix pre-commit changes
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug of attention dp 8
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-09 09:29:39 +08:00
pcastonguay
836c142e1b
[feat] Allow overriding cli args with yaml file in trtllm-serve ( #4164 )
...
feat: Allow overriding cli args with yaml file in trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-08 21:19:05 -04:00
dongxuy04
7147efb2e8
fix: alltoall padding for chunked MoE ( #4157 )
...
fix alltoall padding for chunked MoE
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-05-09 09:01:35 +08:00
forrestl
9477661f4c
Support RingAttention in the BertAttention plugin and the DiT model ( #3661 )
...
support ring attn for bert_attention plugin and dit model
Signed-off-by: ChunhuanLin <lch_xdu@163.com>
2025-05-09 08:06:54 +08:00
Mike Iovine
9afe510367
[fix] Fix llama4 + eagle3 ( #3998 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-08 19:20:27 -04:00
Frank
57afbf6b79
Fix incorrect conversion. ( #4112 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-05-09 06:34:52 +08:00
Lucas Liebenwein
48ed38a2ac
[fix] [AutoDeploy] flashinfer usage on H100 ( #4162 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-09 06:00:57 +08:00
chenfeiz0326
7f5716ef83
Cherry-pick trtllm-gen from feat/llama4 to main ( #4086 )
...
* feat: TRT-LLM Gen FP8 MoE Llama4
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
* feat: TRT-LLM Gen llama4 MoE Top1 routing
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
* feat: add per tensor FP8 TRT-LLM Gen GEMMs
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
* Update
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Update
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Add license for cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/gemmCubins
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Add guard for routingIndicesClusterKernel
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Guard sm90+ for routingkernels
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
* Guard sm90+ for routingkernels
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
---------
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
Signed-off-by: Jiqun Tu <jtu@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Nikita Korobov <nkorobov@nvidia.com>
Co-authored-by: Jiqun Tu <jtu@nvidia.com>
2025-05-08 14:13:01 -07:00
Yukun He
bb7bcc75c2
feat: Fallback to NCCL for various patterns when input size is large. ( #4080 )
...
* Fallback to NCCL for various patterns when input size is large.
Move the previous implementation to cpp side.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Revising.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
---------
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-08 11:13:13 -07:00
shaharmor98
7d94c9561f
feat: support multi lora adapters and TP ( #3885 )
...
* support multi lora, tp
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-08 23:45:45 +08:00
Yuan Tong
5b93273156
feat: adopt new logprob definition in PyTorch flow ( #4057 )
...
feat: align logprob definition of PyTorch flow
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
2025-05-08 20:16:40 +08:00
Enwei Zhu
74df12bbaa
[TRTLLM-4480][doc] Documentation for new accuracy test suite and trtllm-eval ( #3946 )
...
* fix formula
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update doc
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* 1st version
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* polish
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-08 19:35:23 +08:00
Tracin
b0dd581e6b
Fix TP8 for NVFP4 kv dupilcation. ( #4143 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-08 17:30:02 +08:00
zihaok
81cc60a0fd
[feat/] enable attention DP in Llama4 maverick model - part 1 ( #4065 )
...
* add feature
cosmetic changes
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
address precommit fix
cosmetic
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* add feature
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* fix bug
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* address comments
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* remove WAR
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* fix format precommit
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
* Update tensorrt_llm/_torch/models/modeling_llama.py
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Signed-off-by: zihaok <161090975+zihaok@users.noreply.github.com>
---------
Signed-off-by: Zihao Kong <zihaok@nvidia.com>
Signed-off-by: zihaok <161090975+zihaok@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-05-08 05:06:40 +08:00
hlu1
26a2679217
[Deepseek] Refactor Deepseek Decoder layer ( #4016 )
...
Refactor Deepseek Decoder layer
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-08 01:43:10 +08:00
Pengyun Lin
721f84a0ac
fix: Align default setting & remove unnecessary check for chat and completion ( #3888 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-07 14:42:53 +08:00
Yan Chunwei
0c26059703
chore: Cleanup deprecated APIs from LLM-API (part 1/2) ( #3732 )
...
* beam_width and max_new_token
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove beam_width
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove min_length
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove return_num_sequences
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-07 13:20:25 +08:00
rakib-hasan
bf9ac96de3
Adding option to specify a set of token ids for multimodal tokens ( #4107 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-05-07 12:15:41 +08:00
bhsueh_NV
f670a036df
[Qwen3] chore: fix bug of fused_moe on tp > 1 ( #4093 )
...
* fix bug of fused_moe on tp > 1
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* refine codes
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-07 11:06:37 +08:00
Enwei Zhu
c28b90984f
[TRTLLM-3925, https://nvbugs/5245262 ] [fix] Normalize LLM.generate API ( #3985 )
...
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-07 11:06:23 +08:00
Kaiyu Xie
52d4302dda
bench: TRTLLM-4936 Port benchmark_serving.py ( #4011 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-05-07 09:45:14 +08:00
milesial
001e666fc5
fix: Pass local dir to processor creation ( #4018 )
...
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-06 12:25:04 -07:00
Erin
cba1793cda
cleanup logprob params ( #4039 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-07 00:50:16 +08:00
Daniel Cámpora
c56a2aca46
fix: Properly get decoding mode according to same logic as cpp. ( #4026 )
...
* Properly get decoding mode according to same logic as cpp.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Cross reference getDecodingMode implementations in pytorch - cpp.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Better bindings for DecodingMode.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Revert to version in main.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Fix.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Revert configuration.py.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-05-06 21:53:17 +08:00