TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Iman Tabrizian	bc2fb29c5e	[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support (#6224 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-07-23 05:27:16 +08:00
Lucas Liebenwein	41fb8aa8b1	[AutoDeploy] merge feat/ad-2025-07-07 (#6196 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com> Co-authored-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-07-23 05:11:04 +08:00
Raayan Dhar	5234502717	[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency (#6222 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-07-22 11:28:23 -07:00
yuanjingx87	ef4878db05	set NVIDIA_IMEX_CHANNELS for dlcluster slurm job only (#6234 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-07-22 11:27:54 -07:00
2ez4bz	ab7434ac62	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-22 11:06:41 -07:00
John Calderon	b7c8a672da	[Issue 6193] Fix gemma3vl weight loader (#6233 ) Signed-off-by: John Calderon <johncalesp@gmail.com>	2025-07-22 10:32:18 -07:00
danielafrimi	ff9963978a	Add register_fake for finegrained_mixed_dtype_gemm torch_op (#6255 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-07-22 16:59:55 +03:00
Linda	60073731ca	fix: bindings unit tests for nanobind (#6221 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-22 14:51:43 +01:00
Stanley Sun	04f2d4b2eb	test: update test list for RTX6KD (#6213 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-22 18:55:24 +08:00
Lizhi Zhou	3e1a0fbac4	[TRTLLM-6537][infra] extend multi-gpu tests related file list (#6139 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-07-22 16:57:06 +08:00
Yiqing Yan	3e18ee5fe1	chore: bump version to 1.0.0rc5 (#6252 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-22 16:24:28 +08:00
Yechan Kim	b85ab139f9	doc: add supported data modality and types on multimodal serve (#5988 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-22 14:32:41 +08:00
Pengyun Lin	48ddc3d4b9	[fix]: Revert commit `388b491` (#6143 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
bhsueh_NV	24ce6b9517	[Doc][Qwen3] update qwen3 into support-matrix (#6161 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
pcastonguay	310bdd9830	fix: Fix triton backend build [nvbug 5396469] (#6098 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
QI JUN	a03c680581	add release notes for 0.21 release (#6049 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-22 12:48:00 +08:00
nv-guomingz	34dd071bd6	[TRTLLM-6495] doc: add disclaimer for 3rd party software installation. (#6039 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Yi Zhang	eb7d0f84b5	[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Fanrong Li	c66941036f	fix: fix index out of bounds error in spec decoding (#5954 )	2025-07-22 12:48:00 +08:00
Nikita Korobov	9d26b7891a	fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849 ) Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Yan Chunwei	f194b65f3e	fix [nvbug/5351244]: address remote mpi session submit (#5664 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
amirkl94	f4f2176cd5	chore: Port leftover 0.20 (#5907 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Yingge He <yinggeh@nvidia.com> Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com> Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Bo Li	537757e669	fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. (#5896 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-22 12:48:00 +08:00
Bo Li	db77d83a2a	bug: [https://nvbugs/5368507 ] Fix test_generate_with_seed. (#6206 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-07-22 12:28:38 +08:00
2ez4bz	37d0b68442	[fix] Fix flaky mistral E2E test (#6230 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-22 11:55:28 +08:00
WeiHaocheng	fddb7f1141	feat: moe prepare support topk % 4 != 0 (#5742 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-07-22 10:42:46 +08:00
Ivy Zhang	eb5cb5b642	tests: add timeout_manager to tensorrt flow test cases (#5942 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-22 10:23:41 +08:00
Shunkangz	ee45e0c63f	feat: Refactor the fetching request logic (#5786 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-07-22 09:16:28 +08:00
Chang Liu	7381f1dba7	[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444 ) Only supports qwen in this PR	2025-07-21 16:11:58 -07:00
Simeng Liu	4a0951f85c	[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] (#5859 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-07-21 15:46:37 -07:00
Mike Iovine	9645814bdf	[chore] Clean up quickstart_advanced.py (#6021 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-07-21 15:00:59 -04:00
Ziyi Xiong	d7f0b0ab68	[fix] Correct the returned value of has_spec_drafter (#6178 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-21 11:38:59 -04:00
Yi Zhang	f9b0a911fb	test: Enable GB200 torch compile multi gpu tests (#6145 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-21 22:17:13 +08:00
Pengyun Lin	9832bef07d	[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-07-21 21:09:43 +08:00
Emma Qiao	e41507a253	[Infra] - Waive failed cases on recent post-merge (#6212 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-21 21:00:18 +08:00
liji-nv	3e0fb60e50	[TRTLLM-4279] feat: Multistream initial support for torch compile flow (#5847 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-21 19:10:22 +08:00
QI JUN	aea91b2541	doc: add Deprecation Policy section (#5784 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-21 18:47:22 +08:00
Zhanrui Sun	3cbc23f783	infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image) (#4656 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-07-21 16:06:43 +08:00
Linda	3efad2e58c	feat: nanobind bindings (#6185 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-21 08:56:57 +01:00
xinhe-nv	b46fd41026	test: [CI] remove closed bugs (#6201 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-21 15:40:30 +08:00
Yuening Li	e8c068b4b1	[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow (#5850 ) Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com> Co-authored-by: Yuening Li <62227368+yueningl@users.noreply.github.com>	2025-07-21 15:17:35 +08:00
Jinyang Yuan	88076eecd0	[fix] Fix can_use_alltoall in fused_moe_wide_ep.py (#6173 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-07-21 10:53:07 +08:00
nv-guomingz	b4c7e8c9a5	doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-21 10:49:29 +08:00
brb-nv	ca9bc5727e	fix: Flush stale `PlanParams` with custom attention mask (#6163 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-21 09:55:09 +08:00
ruodil	6a3c9f8061	test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-21 11:29:19 +10:00
brb-nv	a433ebad2b	enh: Lift expectation of single image per sample in Gemma3 VLM (#6195 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-21 08:43:07 +08:00
danielafrimi	5300a99bd8	W4A8 GEMM (#6005 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-07-20 17:34:57 +03:00
amitz-nv	98428f330e	[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-07-20 08:00:14 +03:00
Martin Marciniszyn Mehringer	943fd418dd	fix: Ensure mlx5 library is installed for deep_ep and remove deprecated python bindings (#6189 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-07-20 10:38:51 +08:00
bhsueh_NV	2e14c8f443	[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-20 10:25:25 +08:00

1 2 3 4 5 ...

1962 Commits