mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-23 12:12:39 +08:00
* Add TRT-LLM Gen MOE to Deepseek fix fused moe rebase bug. Fix atol in test_fp4_gemm_quantize.py fix fused moe rebase bug. Fix FusedMoe. Disable 2nd routing kernel preexit Bump routing reduction to fp32 Disable PDL for fc1 [DEBUG] Lift token limit to 16k [Bugfix] Token limit to 16k + fp32 routing + tanh Make fp8 tileN 8 Fix FP8 MoE + Remove redundent temp output for FP4 [FP8-only] Avoid wasting CTAs for activation kernel fix: unblock FP8 weightloading with trtllm-gen Remove max_token limit for trtllm-gen path perf: avoid type-conversion and fill_ from aten Minor fix Signed-off-by: Hao Lu <haolu@nvidia.com> * Fix rebase issues Signed-off-by: Hao Lu <haolu@nvidia.com> * Fix compile issue Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * CI clean Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Hao Lu <haolu@nvidia.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
48 lines
1.2 KiB
YAML
48 lines
1.2 KiB
YAML
methods:
|
|
__init__:
|
|
parameters:
|
|
clamp_val:
|
|
annotation: Optional[List[float]]
|
|
default: null
|
|
exclude_modules:
|
|
annotation: Optional[List[str]]
|
|
default: null
|
|
group_size:
|
|
annotation: int
|
|
default: 128
|
|
has_zero_point:
|
|
annotation: bool
|
|
default: false
|
|
kv_cache_quant_algo:
|
|
annotation: Optional[tensorrt_llm.quantization.mode.QuantAlgo]
|
|
default: null
|
|
pre_quant_scale:
|
|
annotation: bool
|
|
default: false
|
|
quant_algo:
|
|
annotation: Optional[tensorrt_llm.quantization.mode.QuantAlgo]
|
|
default: null
|
|
smoothquant_val:
|
|
annotation: float
|
|
default: 0.5
|
|
use_meta_recipe:
|
|
annotation: bool
|
|
default: false
|
|
return_annotation: None
|
|
from_dict:
|
|
parameters:
|
|
config:
|
|
annotation: dict
|
|
default: inspect._empty
|
|
return_annotation: tensorrt_llm.models.modeling_utils.QuantConfig
|
|
to_dict:
|
|
parameters: {}
|
|
return_annotation: dict
|
|
is_module_excluded_from_quantization:
|
|
parameters:
|
|
name:
|
|
annotation: str
|
|
default: inspect._empty
|
|
return_annotation: bool
|
|
properties: {}
|