TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-26 21:53:30 +08:00

Author	SHA1	Message	Date
Bo Deng	d289d85bff	[TRTLLM-6675][infra] Nixl test completion (#6623 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-08 10:15:54 +08:00
brb-nv	4adde41632	[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-07 13:14:04 -04:00
Yan Chunwei	5eae3184fa	[None][chore] add missing tests to test list (#6590 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-06 22:12:27 +08:00
yunruis	3ff4f503ad	[None][opt] ADP schedule balance optimization (#6061 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-06 09:38:02 +08:00
ixlmar	1ebceb790d	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-08-05 18:27:43 +01:00
Venky	61da2daeb4	[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 07:14:24 -07:00
Pengyun Lin	a15e33351d	[None][fix] Revert commit `48ddc3d` & add test for disagg server with different max_num_tokens (#6259 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-04 15:09:51 +08:00
Leslie Fang	a60190836c	[None][infra] Enable accuracy test for eagle3 and chunked prefill (#6386 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-04 01:45:24 -04:00
Yechan Kim	ee6ab5be96	chore: add EXAONE4 accuracy test (#6397 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-04 10:14:16 +08:00
Jhao-Ting Chen	4da5cfc511	[None][infra] add eagle3 one model accuracy tests (#6264 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-08-02 16:07:46 -07:00
Lizhi Zhou	6f34f3489b	[TRTLLM-6357][test] Add accuracy tests for Qwen3 (#6177 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-08-01 13:33:34 -04:00
liji-nv	1daa8c3232	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-01 07:38:06 -04:00
brb-nv	2eca0d5925	fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-31 17:18:23 -07:00
Ziyi Xiong	8062e0fe7c	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-31 15:31:39 -04:00
bhsueh_NV	ae3a5fc918	[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-31 09:37:23 +08:00
brb-nv	0e16d1f070	test: Add time logging for lora tests (#6466 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 14:02:43 -07:00
pcastonguay	e7ae5e2824	feat: Add support for disaggregation with pp with pytorch backend (#6369 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-07-30 09:42:13 -04:00
Yechan Kim	d6eb8e2366	fix: support mixture of text & multimodal prompts (#6345 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 08:52:31 +08:00
2ez4bz	60e4d3a9d4	[test] Add accuracy regression test for Mistral3.1 (#6322 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 09:41:44 -07:00
nv-guomingz	b8d4cb8beb	feat: Support JSON Schema in OpenAI-Compatible API (#6321 ) Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>	2025-07-25 12:55:56 -04:00
xiaoqi	a0aecf0476	[feat]: support logit_bias (#5354 ) Signed-off-by: xq25478 <xq25478@qq.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-25 09:37:41 +00:00
John Calderon	b7c8a672da	[Issue 6193] Fix gemma3vl weight loader (#6233 ) Signed-off-by: John Calderon <johncalesp@gmail.com>	2025-07-22 10:32:18 -07:00
Yi Zhang	eb7d0f84b5	[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Yan Chunwei	f194b65f3e	fix [nvbug/5351244]: address remote mpi session submit (#5664 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Linda	3efad2e58c	feat: nanobind bindings (#6185 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-21 08:56:57 +01:00
Ziyi Xiong	66030ef815	[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133 ) Signed-off-by: ziyixiong-nv <fxiong@nvidia.com> Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-19 13:17:15 +08:00
wili	82d3587bb8	[refactor] Unify name of NGram speculative decoding (#5937 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-19 12:59:57 +08:00
xiaoqi	28858c8711	feat(eagle3):support qwen3 dense model (#5879 ) Signed-off-by: xq25478 <xq25478@qq.com>	2025-07-19 01:24:32 +08:00
Bo Deng	2c6fa145ee	[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-19 00:48:44 +08:00
Iman Tabrizian	b75e53ab69	Revert "feat: nanobind bindings (#5961 )" (#6160 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-18 10:12:54 +08:00
2ez4bz	8480c120b1	[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-17 11:04:17 -07:00
Linda	5bff317abf	feat: nanobind bindings (#5961 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-17 22:42:52 +08:00
Chuang Zhu	44c70c88f9	chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-07-17 17:42:07 +08:00
Iman Tabrizian	d4d21a106e	[fix] Release slots with spec decode + disagg (#5975 ) (#6032 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-17 12:58:18 +08:00
chenfeiz0326	fe070a0168	test: Update Llama4 Scout FP4 & FP8 accuracy tests (#5901 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-07-17 09:41:18 +08:00
Wanli Jiang	2d2b8bae32	feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-17 06:30:58 +08:00
qixiang-99	e09e409dfb	Fix: Enhance ModelConfig for kv cache size calculations (#5868 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-07-16 14:41:31 -07:00
peaceh-nv	f5f31beee1	feat: Add deepseek-lite tests for RTX pro 6000 (#5903 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-16 15:51:45 +08:00
brb-nv	9214ac662a	test: Add regression tests for Gemma3 VLM (#6033 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-15 11:37:56 -07:00
Fanrong Li	7a1af1c738	Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 (#5989 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-07-16 01:33:12 +09:00
brb-nv	1a2d96919c	feat: Update Gemma3 Vision Encoder (#5973 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:38:10 +08:00
Iman Tabrizian	c8874a7f94	[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yi Zhang	e5e87ecf34	test: Move some of the test from post merge to pre-merge, update dgx b200 test case (#5640 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-14 17:17:30 +08:00
Yan Chunwei	9c673e9707	[TRTLLM-6160] chore: add sampling examples for pytorch (#5951 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 15:28:32 +09:00
Yan Chunwei	c30eead09f	[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch (#5956 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-14 14:09:39 +08:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
2ez4bz	c19840235d	[fix] Fix mistral unit tests due to transformers upgrade (#5904 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-10 10:45:27 -07:00
Anthony Chang	7d21b55b5a	[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-07-10 14:06:50 +08:00
peaceh-nv	76c3a12bcb	[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-10 09:20:30 +08:00
DylanChen-NV	74dca0aa7b	[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-09 23:16:42 +08:00

1 2 3 4 5

242 Commits