TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yanchao Lu	20c15fc04f	Fix invalid testcase name (#4626 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-05-24 00:40:00 +08:00
Yao Yao	ef763b0ddc	fix: rename some terms (#4534 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2025-05-23 23:23:49 +08:00
Robin Kobus	7b2818a47b	refactor: CreateNewDecoderRequests (#4452 ) * refactor: CreateNewDecoderRequests Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Consolidate request generation in CreateNewDecoderRequests - Removed the GenerateRequestOptions class and integrated its functionality into CreateNewDecoderRequests. - Updated the constructor of CreateNewDecoderRequests to accept parameters for speculative decoding and normalization options. - Modified the operator() method to handle request generation directly, improving code organization and reducing redundancy. - Cleaned up associated includes and references throughout the codebase. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Simplify request handling in CreateNewDecoderRequests - Removed the generateRequestOptions method and integrated its logic directly into the operator() method. - Updated the request generation process to improve clarity and reduce redundancy. - Adjusted the return type to streamline the handling of batch slots, decoder requests, and sampling configurations. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Enhance createDecoderRequests method in CreateNewDecoderRequests - Updated the createDecoderRequests method to include additional parameters for decoder state and CUDA streams, improving flexibility in request handling. - Removed redundant request generation logic from the operator() method, streamlining the process. - Adjusted the newRequest method to utilize the updated decoder request structure, enhancing clarity and maintainability. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Use MedusaBuffers instead of RuntimeBuffers in CreateNewDecoderRequests - Updated references from RuntimeBuffers to MedusaBuffers across the CreateNewDecoderRequests class and its methods, enhancing clarity in buffer management. - Adjusted method signatures and internal logic to accommodate the new MedusaBuffers type, ensuring compatibility with existing functionality. - Cleaned up unnecessary includes and improved code organization for better maintainability. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Update CreateNewDecoderRequests to use DecoderState and CudaStream parameters - Modified method signatures in CreateNewDecoderRequests to replace GptDecoderBatched with runtime::decoder::DecoderState and added a separate CudaStream for the decoder. - Adjusted the implementation of the operator() method to accommodate the new parameters, enhancing flexibility in request handling. - Updated associated bindings in the pybind11 interface to reflect the changes in method signatures, ensuring consistency across the codebase. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Update TRTLLMSampler to use refactored create_new_decoder_requests - Updated the sampler.py to reflect changes in the request handling logic, replacing generate_request_options with create_new_decoder_requests for improved clarity and consistency. - Updated bindings and method signatures for decoder stream handling. Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Update gptDecoderBatchedTest to use CreateNewDecoderRequests::newRequest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-23 22:54:37 +08:00
dominicshanshan	ca3eaf4070	[nvbug/5028235][fix]pytest bindings tokens logtis comparison. (#4424 ) * fix bug 5028235. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * fix bug 5028235 and update comments. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Update tests/unittest/bindings/test_executor_bindings.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com> * Remove redundant code. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Update based on review comments. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-23 20:41:00 +08:00
juney-nvidia	7b2bb67491	Update CODEOWNERS for PyTorch backend - runtime component (#4620 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-05-23 20:40:44 +08:00
Robin Kobus	15a59e57f6	[nvbugs/5301492] ci: waive test_workers_kv_cache_aware_router (#4617 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-23 20:14:28 +08:00
zhhuang-nv	8452775db8	[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535 ) * optimize kv cache reuse workflow for MLA write kv cache first and only call up-projection GEMM once relax contiguous requirements of k/v for setting paged kv cache return two contiguous tensors when loading MLA KV Cache Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * support fp8 kv cache for MLA kv cache reuse Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> * resolve comments Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com> --------- Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-05-23 19:47:50 +08:00
Anthony Chang	bbea2647b1	Qwen3 supports TRTLLM FP4 MoE backend (#4530 ) * MoE TRTLLM backend for Qwen3 Signed-off-by: Anthony Chang <anchengc@nvidia.com> * add extra moe_backend to test Signed-off-by: Anthony Chang <anchengc@nvidia.com> * address comments Signed-off-by: Anthony Chang <anchengc@nvidia.com> * conditionally compile kernels on newer archs Signed-off-by: Anthony Chang <anchengc@nvidia.com> * missing positional arg Signed-off-by: Anthony Chang <anchengc@nvidia.com> * Update the routing kernels Signed-off-by: Christina Zhang <christinaz@nvidia.com> * Revise usage of TLLM_LOG_ERROR Signed-off-by: Christina Zhang <christinaz@nvidia.com> * Add unit test for Qwen3 moe (trtllm_gen backend) Signed-off-by: Christina Zhang <christinaz@nvidia.com> * improve weight processing speed of moe_backend=TRTLLM; roughly 2x Signed-off-by: Anthony Chang <anchengc@nvidia.com> * tidy and minor fix Signed-off-by: Anthony Chang <anchengc@nvidia.com> * temporarily disable accuracy test that has known issue Signed-off-by: Anthony Chang <anchengc@nvidia.com> --------- Signed-off-by: Anthony Chang <anchengc@nvidia.com> Signed-off-by: Christina Zhang <christinaz@nvidia.com> Co-authored-by: Christina Zhang <christinaz@nvidia.com>	2025-05-23 18:31:08 +08:00
juney-nvidia	419151f358	Update the GH main page to expose tech blogs (#4610 ) * Update the main page to expose the tech blogs Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * fix formating issue Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> * fixing format issue Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> --------- Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-05-23 17:03:56 +08:00
Yiqing Yan	3ca05330f9	Waive L0 test (#4609 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-05-23 15:54:11 +08:00
Bo Li	9ae705af1b	perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482 ) * Add Julien's origina kernel. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Get rid of UpdateKVCache functionality. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Add kernels. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Add torch OP. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update cmake. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Torch OP must use double as argument dtype. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Add unittest. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Add unittest. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix misaligned access when head_dim=64. In this case, numElemsPerThread=2, numVecPerThread=0. But the store code incorrectly perform vectorized store, some threads (e.g., lane1) issue store to address that is not aligned to 64 bit. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Remove unroll (compiler can do that). Cleanup code. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Add switch for interleave. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor vectorized load/store. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Implement is_neox. Result not correct yet. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix is_neox=True. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Add q_weight and k_weight. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-05-23 15:31:04 +08:00
bhsueh_NV	6527c055cf	chore: fix bug of llama lora test (#4566 ) * fix bug of llama lora test Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * Update test_llm.py fix bug detected by pre-commit Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-23 14:06:40 +08:00
Fanrong Li	862bde99b6	draft[doc]: add mtp tech blog (#4580 ) * add mtp tech blog. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * update figure size. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * update the figure caption style and add some code/pr links. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix figure captions. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix figure size and perf data. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> * fix based on comments Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> * fix figure links. Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> --------- Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> Co-authored-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-05-23 13:54:21 +08:00
bhsueh_NV	d69c662215	[Fix][Qwen3] fix bug of qwen3 fp4 workflow with EP (#4575 ) * fix bug of qwen3 fp4 workflow with EP Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug of qwen3_moe with ep Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-23 13:34:05 +08:00
coldwaterq	1cf0e672e7	fix: [nvbugs/5066257] serialization improvments (#3869 ) * added a restricted pcikler and depickler in a sepparate serialization function. Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com> * updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests. Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com> * removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list. Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com> * cleaned up a couple files to reduce conflicts with main. Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com> * fix unit tests Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * reorder BASE_ZMQ_CLASSES list alphabetically Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * fix tests and move LogitsProcessor registration to base class Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * revert changes to import log of tensorrt_llm._torch.models Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com> * fix tests and move LogitsProcessor registration to base class Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * additional comments for multiprocess approved list sync Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> * add dataclass from tests Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> --------- Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com> Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com> Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com> Co-authored-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-05-23 13:06:29 +08:00
djns99	87f734b563	[https://nvbugs/5297775 ] fix: Correct memory guard for large MOE tests to account for TP space (#4553 ) fix: Correct memory guard for large MOE tests to account for TP space Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>	2025-05-23 14:57:49 +12:00
Yuxian Qiu	38241b2346	fix: Fix moe_ep_groups/moe_cluster_groups in Mapping. (#4555 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-05-23 10:41:49 +08:00
CarstyYou	ef280e687e	[feat] support fp8 blockscale gemm on sm89 (#4481 ) * [feat] integrate ada blockwise gemm Signed-off-by: CarstyYou <xiy@nvidia.com> * [fix] align scale M Signed-off-by: CarstyYou <xiy@nvidia.com> * [feat] swizzle mma output Signed-off-by: CarstyYou <xiy@nvidia.com> * [test] add ut for sm89 Signed-off-by: CarstyYou <xiy@nvidia.com> * [delete] remove useless comments Signed-off-by: CarstyYou <xiy@nvidia.com> * [chore] codestyle Signed-off-by: CarstyYou <xiy@nvidia.com> * [fix] fix review comments Signed-off-by: CarstyYou <xiy@nvidia.com> * [chore] fix license Signed-off-by: CarstyYou <xiy@nvidia.com> * [chore] fix license Signed-off-by: CarstyYou <xiy@nvidia.com> --------- Signed-off-by: CarstyYou <xiy@nvidia.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-05-23 10:39:10 +08:00
Enwei Zhu	d7443b6068	[https://nvbugspro.nvidia.com/bug/5181262 ] [test] Unwaive Mistral Nemo test (#4515 ) unwaive Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-23 10:14:00 +08:00
nv-guomingz	e3a534d0ee	chore: guardword clean for header file. (#4540 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-05-23 10:08:14 +08:00
pcastonguay	d7d455e7ea	[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243 ) * feat: Enabling dis serving with TRT backend with Python runtime Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing formatting Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg mtp test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-05-22 22:01:06 -04:00
Kunyao Wu	60a6c20174	Scaffoldingllm supports MCP (#4410 ) * support mcp # Conflicts: # tensorrt_llm/scaffolding/worker.py Signed-off-by: wu1du2 <wu1du2@gmail.com> * move all into contrib/mcp # Conflicts: # examples/scaffolding/contrib/mcp/mcptest.py # tensorrt_llm/scaffolding/__init__.py # tensorrt_llm/scaffolding/contrib/__init__.py # tensorrt_llm/scaffolding/contrib/mcp/__init__.py # tensorrt_llm/scaffolding/contrib/mcp/mcp_controller.py # tensorrt_llm/scaffolding/task.py # tensorrt_llm/scaffolding/worker.py Signed-off-by: wu1du2 <wu1du2@gmail.com> * support sandbox, websearch # Conflicts: # examples/scaffolding/contrib/mcp/mcptest.py # examples/scaffolding/contrib/mcp/weather/weather.py # tensorrt_llm/scaffolding/contrib/mcp/mcp_controller.py # tensorrt_llm/scaffolding/contrib/mcp/mcp_utils.py # tensorrt_llm/scaffolding/contrib/mcp/mcp_worker.py # tensorrt_llm/scaffolding/worker.py Signed-off-by: wu1du2 <wu1du2@gmail.com> * remove pics Signed-off-by: wu1du2 <wu1du2@gmail.com> * pre-commit fix # Conflicts: # tensorrt_llm/scaffolding/contrib/mcp/__init__.py # tensorrt_llm/scaffolding/contrib/mcp/mcp_utils.py # tensorrt_llm/scaffolding/contrib/mcp/mcp_worker.py Signed-off-by: wu1du2 <wu1du2@gmail.com> * fix spell Signed-off-by: wu1du2 <wu1du2@gmail.com> * rebase Signed-off-by: wu1du2 <wu1du2@gmail.com> --------- Signed-off-by: wu1du2 <wu1du2@gmail.com>	2025-05-23 01:54:49 +00:00
dongxuy04	338744fba6	fix[nvbug-5295425]: [TRTLLM-5385] fix race condition in MoeLoadBalancer (#4573 ) fix moe possible race cond and add bypass worker thread for no updates Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-05-23 09:24:23 +08:00
QI JUN	1e55d616da	Chore: clean up _gather_dp_requests_num method of PyExecutor (#4571 ) clean up _gather_dp_requests_num method of PyExecutor Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-23 08:37:39 +08:00
nv-guomingz	3549b68c1c	chroe:clean useless flag (#4567 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-05-23 07:05:15 +08:00
Mike Iovine	9c0de251db	[feat] Integrate Hopper chunked attention kernels (#4330 ) * Integrate chunked attention kernels Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> * Fix cache key Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> * Fix lint Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> --------- Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-22 17:10:57 -04:00
Mike Iovine	14fc48ada7	[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler (#4402 ) [fix] Fix chunked prefill + overlap scheduler Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-23 04:38:22 +08:00
Venky	c713eb5799	test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) (#4446 ) ultra Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-05-22 13:07:33 -07:00
Robin Kobus	e5c90883a9	fix: Move cv2 import to load_video function (#4541 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-22 17:56:07 +02:00
Chuang Zhu	558eaecf16	fix sequence data race (#4565 ) stash for debug broken promise Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-22 23:13:48 +08:00
QI JUN	1e5d526db4	Chore: clean up _merge_dummy_request method of PyExecutor (#4438 ) * clean up _merge_dummy_request method of PyExecutor Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update comment Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-05-22 18:19:07 +08:00
xinhe-nv	22c01d5b21	test: [CI] Add failed cases into waives.txt (#4549 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * fix test issues Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-05-22 17:18:53 +08:00
ruodil	1a45890dae	test: waive hanging cases for perf test (#4562 ) waive hanging cases Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>	2025-05-22 15:50:05 +08:00
Chuang Zhu	3410508020	cache_transceiver_config (#4556 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-22 13:59:51 +08:00
Iman Tabrizian	e741d2b8d0	Add tritonrelease container (#4455 ) * Add tritonrelease container Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Review comments Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Update docker/Makefile Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-05-21 23:47:50 -04:00
Kaiyu Xie	2898d268f9	feat: add health_generate route to openai serving (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856 ) (#4349 ) Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856 Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Dhruv Singal <dhruvsingalabc@gmail.com>	2025-05-22 11:46:06 +08:00
HuiGao-NV	bc9f1dbede	fix[nvbug-5228840]: Remove test cases of feature not supported anymore (#3972 ) * Remove waived cases * Remove test cases of not supported feature Signed-off-by: Hui Gao <huig@nvidia.com>	2025-05-22 11:18:58 +08:00
Aurelien Chartier	f491244c84	feat: add dataset support for benchmark_core_model with LLMAPI (#4457 ) * feat: add dataset support for benchmark_core_model with LLMAPI Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-05-21 19:18:43 -07:00
Kaiyu Xie	099cd3ce07	chore: Add all_reduce.py benchmark script to test (#4537 ) Add all_reduce.py script to test Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-05-22 10:13:27 +08:00
Michal Guzek	9033dd987d	[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct (#4415 ) Add phi-4-mini CLI acc test Signed-off-by: moraxu <mguzek@nvidia.com>	2025-05-22 09:56:48 +08:00
Yan Chunwei	4798d088d9	chore: Partition LlmArgs into TorchLlmArgs and TrtLlmArgs (#3823 ) * partition LlmArgs Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * update backend Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-22 09:40:56 +08:00
Chuang Zhu	44cfd757b2	Agent interface impl for NIXL (#4125 ) * agentConnection Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> recv Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> agentState Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> NIXL interfaces Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> update cmakelists Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> nixl improve Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> remove cppzmq Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> transferAgent remove register Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> work for cache Test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> reduce sleep time Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> intergarte Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> nixl env Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> fix rebase error Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> cpp test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> stash for send metaData Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> loadRemoteMD after fetchRemoteMD Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> workaround for mixed gen and context Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> test_env Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> avoid port conflict in test Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * format Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * use std::string Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * typo Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> * fix transferAgentTest Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> --------- Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-05-22 09:09:41 +08:00
Aurelien Chartier	1681e9fd1e	chore: remove extra PYTHONPATH (#4453 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-05-21 17:38:01 -07:00
Nikita Korobov	e1b42be3d1	fix: TRT-LLM Gen dtype declaration (#4503 ) Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>	2025-05-21 23:56:37 +02:00
Dom Brown	1cffa99792	test: Split test_simple into mpi_utils and cache transceiver tests for DGX (#4451 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-22 04:26:21 +08:00
Zongfei Jing	dbaddb3a29	Adding two-shot allreduce kernel and mnnvl multicasting buffer (#4216 ) * Adding two-shot allreduce kernel and mnnvl multicasting buffergit gffe Signed-off-by: Shiyu Li <shili@nvidia.com> Adding comments Signed-off-by: Shiyu Li <shili@nvidia.com> Add unittest of the twoshot kernel. Signed-off-by: Shiyu Li <shili@nvidia.com> Update dispatch logic Signed-off-by: Shiyu Li <shili@nvidia.com> Use cpu barrier instead of GPU at init Signed-off-by: Shiyu Li <shili@nvidia.com> Merge dispatch logic fix Signed-off-by: Shiyu Li <shili@nvidia.com> Update the kernel to use GPU-managed buffer Signed-off-by: Shiyu Li <shili@nvidia.com> * Refine Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Clean code Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Fix compile error Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Fix issue Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Clean up Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Simplify AllReduce interface Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Rename Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Fix warning Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Tidy code Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Rename Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Fix compile error Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Refine Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Skip ut for no_fusion Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Refine Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Shiyu Li <shili@nvidia.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Shiyu Li <shili@nvidia.com>	2025-05-22 03:42:36 +08:00
Venky	0a8461d54c	test(perf): Pt.2 Add `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (cpp) (#4499 ) add low concurrency perf tests Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-05-21 10:46:48 -07:00
Kevin Chen	b80b78f87c	Add pytorch backend team (#4405 ) * Add pytorch backend team Signed-off-by: Kevin Chen * Update .github/CODEOWNERS Co-authored-by: Yanchao Lu Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com> --------- Signed-off-by: Kevin Chen Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com> Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com> Co-authored-by: Yanchao Lu	2025-05-21 21:10:35 +08:00
nv-guomingz	3b12e460e7	chore: clean ucx and nixl mirror. (#4531 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-05-21 19:45:20 +08:00
Robin Kobus	cd0c826417	refactor: DisaggExecutorTest (#4398 ) * chore: Improve formatting of DisaggExecutorTest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Typed InstanceRole param in DisaggExecutorTest Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * refactor: Skip DisaggExecutorTest based on device count Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-05-21 18:01:45 +08:00

1 2 3 4 5 ...

1044 Commits