TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yiqing Yan 76c5e1a12f [None][infra] Bump version to 1.1.0rc5 (#7668 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>		2025-09-10 16:06:54 +08:00
..
apps	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 )	2025-06-20 03:01:10 +08:00
auto_deploy	[#6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example (#7221 )	2025-09-05 22:10:48 -04:00
bindings/executor
cpp/executor	[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766 )	2025-08-13 17:39:27 +08:00
cpp_library
disaggregated	[None] [feat] Use numa to bind CPU (#7304 )	2025-08-28 06:27:11 -04:00
dora
draft_target_model	Fix: draft target README and set exclude_input_in_output to False (#4882 )	2025-06-03 23:45:02 -07:00
eagle	doc: remove the outdated features which marked as Experimental (#5995 )	2025-08-06 22:01:42 -04:00
infinitebench
language_adapter
llm-api	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 )	2025-09-09 14:51:36 -04:00
llm-eval/lm-eval-harness	chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680 )	2025-07-04 15:39:15 +09:00
lookahead
medusa
models	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 )	2025-09-09 10:38:20 +08:00
ngram	[chore] Clean up quickstart_advanced.py (#6021 )	2025-07-21 15:00:59 -04:00
openai_triton
python_plugin
quantization	[#6530 ][fix] Fix script when using calibration tensors from modelopt (#6803 )	2025-08-12 20:41:10 -07:00
redrafter	ReDrafter support for Qwen (#4875 )	2025-06-28 02:33:10 +08:00
sample_weight_stripping	doc: remove the outdated features which marked as Experimental (#5995 )	2025-08-06 22:01:42 -04:00
scaffolding	[#3325 ][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding (#7490 )	2025-09-04 19:46:49 -07:00
serve	[None][chore] Enhance trtllm-serve example test (#6604 )	2025-08-06 20:30:35 +08:00
trtllm-eval	test: Add LLGuidance test and refine guided decoding (#5348 )	2025-06-25 14:12:56 +08:00
wide_ep	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 )	2025-09-09 12:16:03 +08:00
constraints.txt	[None][infra] Bump version to 1.1.0rc5 (#7668 )	2025-09-10 16:06:54 +08:00
eval_long_context.py
generate_checkpoint_config.py
generate_xgrammar_tokenizer_info.py
hf_lora_convert.py
mmlu.py	feat: run mmlu and summarize without engine_dir. (#4056 )	2025-05-05 19:35:07 +08:00
run.py	[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974 )	2025-07-25 18:10:40 -04:00
summarize.py	[refactor] Unify name of NGram speculative decoding (#5937 )	2025-07-19 12:59:57 +08:00
utils.py	[refactor] Unify name of NGram speculative decoding (#5937 )	2025-07-19 12:59:57 +08:00