TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-14 15:03:48 +08:00

History

Yuening Li 1f8ae2b2db [TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow (#6629 ) Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>		2025-08-15 17:15:49 -04:00
..
tests_lora_modules	added loraOp into lora layer + test for mlp and comparison to lora plugin (#3455 )	2025-04-17 12:48:27 +08:00
test_fused_moe.py	[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow (#6629 )	2025-08-15 17:15:49 -04:00
test_moe_host_sharer.py	feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4) (#4818 )	2025-06-08 10:25:18 +08:00
test_moe_load_balancer.py	[None][perf] Improve the performance of online EPLB on Hopper by better overlapping (#6624 )	2025-08-12 09:25:13 +08:00
test_moe_routing.py	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
test_triton_linear.py	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00