TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Wanli Jiang	9632dba02e	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-30 09:20:16 -07:00
pcastonguay	0f083b9daf	fix: Unwaive triton cpp test [nvbug 5401088] (#6412 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-30 11:25:18 -04:00
nv-guomingz	03e38c9087	chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 11:11:06 -04:00
Chang Liu	b4065d8ca6	[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-07-30 10:00:15 -04:00
pcastonguay	e7ae5e2824	feat: Add support for disaggregation with pp with pytorch backend (#6369 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-07-30 09:42:13 -04:00
tomeras91	a2514d93fc	[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-30 07:22:32 -04:00
Yechan Kim	22b29df38c	[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 17:29:55 +08:00
xinhe-nv	d9ab3fd35e	tests: add TestNemotronH cuda graph tests (#6390 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 18:45:58 +10:00
nv-guomingz	a5540acfce	chore: add trtllm-serve json schema example into doc. (#6418 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 04:33:08 -04:00
2ez4bz	d6eed1b624	[fix] Switch placement of image placeholder for mistral 3.1 (#6435 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-30 14:10:36 +08:00
xinhe-nv	c00d6763b2	test: [CI] Add failed cases into waives.txt (#6457 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-30 12:36:58 +10:00
Venky	ab40369053	[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 10:53:43 +10:00
Yechan Kim	d6eb8e2366	fix: support mixture of text & multimodal prompts (#6345 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 08:52:31 +08:00
Yan Chunwei	ad662ddcdd	chore: disallow arbitrary in llm_args.Configs (#6367 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-29 16:16:52 -04:00
Yan Chunwei	1a6930986a	chore: remove unused kv_cache_dtype in api reference (#6444 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-29 14:57:20 -04:00
Michal Guzek	7efe3cb0cd	[fix] Add detokenization-based stop word logic to LLM API (#5948 ) Signed-off-by: moraxu <mguzek@nvidia.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-07-29 10:16:59 -07:00
xinhe-nv	f1086e7d4f	test: [CI] remove closed bugs (#6381 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-29 19:01:23 +10:00
xinhe-nv	4fbb344caf	test: [CI] Add failed cases into waives.txt (#6423 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-29 19:00:30 +10:00
Yukun He	0eee2e2850	[5385981] fix: Update the usage of VisionAttention init API. (#6413 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-29 16:41:48 +08:00
ruodil	e11255e9d0	test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-29 15:52:45 +10:00
Michal Guzek	2573bb729d	feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-28 14:02:14 -07:00
Aurelien Chartier	738ab61593	[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Hopper GPUs (#6339 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-28 12:36:44 -07:00
2ez4bz	cdca541148	[test] Unwaive mistral3.1 small E2E test (#6352 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 14:37:42 -04:00
2ez4bz	60e4d3a9d4	[test] Add accuracy regression test for Mistral3.1 (#6322 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-28 09:41:44 -07:00
ruodil	03632a679f	test: organize perf cases and add missing perflab cases in qa test list (#6283 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-07-28 20:33:32 +10:00
xinhe-nv	971be1fe86	test: waive failed cases (#6394 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-28 20:31:43 +10:00
Yan Chunwei	45d441e60c	[TRTLLM-5061] chore: add status tags to LLM API reference (#5707 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-28 15:57:07 +08:00
Ivy Zhang	2945817cae	[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-28 15:33:30 +08:00
Emma Qiao	b3ca159787	[Infa] - waive failed cases and fix a typo (#6384 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-28 02:06:57 -04:00
Chang Liu	dc757799e1	[nvbugs/5401156][fix] Avoid import all models when import trtllm._common (#6266 )	2025-07-27 23:29:21 -04:00
Yan Chunwei	908f49a4ad	[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-28 09:01:10 +08:00
Michal Guzek	08d57123f9	[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache (#5974 ) Signed-off-by: moraxu <mguzek@nvidia.com>	2025-07-25 18:10:40 -04:00
Iman Tabrizian	c35c78ff58	[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-25 12:47:01 -07:00
nv-guomingz	b8d4cb8beb	feat: Support JSON Schema in OpenAI-Compatible API (#6321 ) Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>	2025-07-25 12:55:56 -04:00
pcastonguay	3805976e90	fix: Fixing kv_cache_events unit tests [nvbug 5362412] (#6265 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-25 08:55:44 -04:00
xiaoqi	a0aecf0476	[feat]: support logit_bias (#5354 ) Signed-off-by: xq25478 <xq25478@qq.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-25 09:37:41 +00:00
xinhe-nv	470544cf17	test: [CI] Add failed cases into waives.txt (#6333 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-25 17:18:06 +10:00
xinhe-nv	6268a60ab3	tests: add test_chunked_prefill for llama4 (#5549 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 23:02:00 -04:00
xinhe-nv	2dcfa90e99	test: skip llama3.3 70b test on cg4 (#6293 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-24 19:29:56 -07:00
Mike Iovine	0f2f11f90b	[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#6104 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-24 21:50:11 -04:00
Shiyu Li	375f74ecb2	[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-07-25 08:01:40 +08:00
Stefan Niebler	0df758ec9f	[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration (#6217 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-24 18:04:41 +02:00
bhsueh_NV	7b6aadc800	[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-24 21:47:37 +08:00
Emma Qiao	0cc1f8c03d	[Infra] - Wiave failed tests in post-merge (#6331 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-24 21:18:06 +08:00
Ivy Zhang	f290108cd8	tests: only get timeout value from pytest marker (#6287 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-24 20:51:02 +08:00
liji-nv	14d94a3856	feat: Add non UB AR + Residual + Norm + Quant fusion (#6320 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-24 05:51:43 -04:00
Iman Tabrizian	5fceaa6153	Revert "tests: add timeout_manager to tensorrt flow test cases (#5942 )" (#6309 )	2025-07-23 23:58:10 -04:00
Emma Qiao	82d03ca979	[Infra] - Increase unittest execution time since some test exceeds 1600 (#6277 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-24 10:02:28 +08:00
Iman Tabrizian	7740bfa31d	Waive tests (#6312 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 18:15:07 -07:00
Lucas Liebenwein	cf4f4e8d73	[AutoDeploy] disable flaky MoE nvfp4 test (#6302 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-07-23 13:13:01 -04:00

1 2 3 4 5 ...

1091 Commits