TensorRT-LLMs/examples
Yiqing Yan 76c5e1a12f
[None][infra] Bump version to 1.1.0rc5 (#7668)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-10 16:06:54 +08:00
..
apps [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312) 2025-06-20 03:01:10 +08:00
auto_deploy [#6120][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example (#7221) 2025-09-05 22:10:48 -04:00
bindings/executor
cpp/executor [TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766) 2025-08-13 17:39:27 +08:00
cpp_library
disaggregated [None] [feat] Use numa to bind CPU (#7304) 2025-08-28 06:27:11 -04:00
dora
draft_target_model Fix: draft target README and set exclude_input_in_output to False (#4882) 2025-06-03 23:45:02 -07:00
eagle doc: remove the outdated features which marked as Experimental (#5995) 2025-08-06 22:01:42 -04:00
infinitebench
language_adapter
llm-api [TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349) 2025-09-09 14:51:36 -04:00
llm-eval/lm-eval-harness chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680) 2025-07-04 15:39:15 +09:00
lookahead
medusa
models [https://nvbugs/5453709][fix] Remove transformers version limit in Qwen2VL (#7152) 2025-09-09 10:38:20 +08:00
ngram [chore] Clean up quickstart_advanced.py (#6021) 2025-07-21 15:00:59 -04:00
openai_triton
python_plugin
quantization [#6530][fix] Fix script when using calibration tensors from modelopt (#6803) 2025-08-12 20:41:10 -07:00
redrafter ReDrafter support for Qwen (#4875) 2025-06-28 02:33:10 +08:00
sample_weight_stripping doc: remove the outdated features which marked as Experimental (#5995) 2025-08-06 22:01:42 -04:00
scaffolding [#3325][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding (#7490) 2025-09-04 19:46:49 -07:00
serve [None][chore] Enhance trtllm-serve example test (#6604) 2025-08-06 20:30:35 +08:00
trtllm-eval test: Add LLGuidance test and refine guided decoding (#5348) 2025-06-25 14:12:56 +08:00
wide_ep [TRTLLM-5930][doc] 1.0 Documentation. (#6696) 2025-09-09 12:16:03 +08:00
constraints.txt [None][infra] Bump version to 1.1.0rc5 (#7668) 2025-09-10 16:06:54 +08:00
eval_long_context.py
generate_checkpoint_config.py
generate_xgrammar_tokenizer_info.py
hf_lora_convert.py
mmlu.py feat: run mmlu and summarize without engine_dir. (#4056) 2025-05-05 19:35:07 +08:00
run.py [nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974) 2025-07-25 18:10:40 -04:00
summarize.py [refactor] Unify name of NGram speculative decoding (#5937) 2025-07-19 12:59:57 +08:00
utils.py [refactor] Unify name of NGram speculative decoding (#5937) 2025-07-19 12:59:57 +08:00