TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-07 03:31:58 +08:00

History

DylanChen-NV 5ca2b9bb15 [TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>		2025-07-07 18:04:57 +08:00
..
_torch	[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615 )	2025-07-07 18:04:57 +08:00
api_stability	[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler (#5751 )	2025-07-07 17:05:14 +08:00
bindings	[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI (#5497 )	2025-06-27 17:03:05 +02:00
disaggregated	feat: Dynamically remove servers in PD (#5270 )	2025-06-25 09:50:04 +08:00
llmapi	fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. (#5772 )	2025-07-07 14:58:32 +08:00
others	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
scaffolding	[TRTLLM-4638] feat(scaffolding): update Reward Controller to PRM specific controller with step split (#4337 )	2025-05-19 17:53:41 +08:00
tools	Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130 )	2025-06-15 18:54:04 +03:00
trt	[feat] Support torch compile for attention dp (#5086 )	2025-07-01 13:48:52 -04:00
utils	feat: W4A16 GEMM (#4232 )	2025-07-01 10:36:05 +03:00
conftest.py	[feat][test] reuse MPI pool executor across tests (#5566 )	2025-06-29 17:23:12 +03:00
dump_checkpoint_stats.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
gc_utils.py	[nvbug 5273941] fix: broken cyclic reference detect (#5417 )	2025-07-01 20:12:55 +08:00
profile_utils.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
pytest.ini	[ci] small multigpu speedups (#5643 )	2025-07-03 08:06:10 -04:00
test_model_runner_cpp.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
test_pip_install.py	relax the limitation of setuptools (#2992 )	2025-03-24 13:36:10 +08:00