coldwaterq
1cf0e672e7
fix: [nvbugs/5066257] serialization improvments ( #3869 )
...
* added a restricted pcikler and depickler in a sepparate serialization function.
Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
* updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests.
Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
* removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list.
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
* cleaned up a couple files to reduce conflicts with main.
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
* fix unit tests
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* reorder BASE_ZMQ_CLASSES list alphabetically
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* fix tests and move LogitsProcessor registration to base class
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* revert changes to import log of tensorrt_llm._torch.models
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
* fix tests and move LogitsProcessor registration to base class
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* additional comments for multiprocess approved list sync
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
* add dataclass from tests
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
---------
Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Co-authored-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-05-23 13:06:29 +08:00
djns99
87f734b563
[ https://nvbugs/5297775 ] fix: Correct memory guard for large MOE tests to account for TP space ( #4553 )
...
fix: Correct memory guard for large MOE tests to account for TP space
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-05-23 14:57:49 +12:00
Yuxian Qiu
38241b2346
fix: Fix moe_ep_groups/moe_cluster_groups in Mapping. ( #4555 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-23 10:41:49 +08:00
CarstyYou
ef280e687e
[feat] support fp8 blockscale gemm on sm89 ( #4481 )
...
* [feat] integrate ada blockwise gemm
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [fix] align scale M
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [feat] swizzle mma output
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [test] add ut for sm89
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [delete] remove useless comments
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [chore] codestyle
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [fix] fix review comments
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [chore] fix license
Signed-off-by: CarstyYou <xiy@nvidia.com>
* [chore] fix license
Signed-off-by: CarstyYou <xiy@nvidia.com>
---------
Signed-off-by: CarstyYou <xiy@nvidia.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-05-23 10:39:10 +08:00
Enwei Zhu
d7443b6068
[ https://nvbugspro.nvidia.com/bug/5181262 ] [test] Unwaive Mistral Nemo test ( #4515 )
...
unwaive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-23 10:14:00 +08:00
nv-guomingz
e3a534d0ee
chore: guardword clean for header file. ( #4540 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-23 10:08:14 +08:00
pcastonguay
d7d455e7ea
[feat][TRTLLM-5018] Dis serving python runtime trt backend ( #4243 )
...
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-22 22:01:06 -04:00
Kunyao Wu
60a6c20174
Scaffoldingllm supports MCP ( #4410 )
...
* support mcp
# Conflicts:
# tensorrt_llm/scaffolding/worker.py
Signed-off-by: wu1du2 <wu1du2@gmail.com>
* move all into contrib/mcp
# Conflicts:
# examples/scaffolding/contrib/mcp/mcptest.py
# tensorrt_llm/scaffolding/__init__.py
# tensorrt_llm/scaffolding/contrib/__init__.py
# tensorrt_llm/scaffolding/contrib/mcp/__init__.py
# tensorrt_llm/scaffolding/contrib/mcp/mcp_controller.py
# tensorrt_llm/scaffolding/task.py
# tensorrt_llm/scaffolding/worker.py
Signed-off-by: wu1du2 <wu1du2@gmail.com>
* support sandbox, websearch
# Conflicts:
# examples/scaffolding/contrib/mcp/mcptest.py
# examples/scaffolding/contrib/mcp/weather/weather.py
# tensorrt_llm/scaffolding/contrib/mcp/mcp_controller.py
# tensorrt_llm/scaffolding/contrib/mcp/mcp_utils.py
# tensorrt_llm/scaffolding/contrib/mcp/mcp_worker.py
# tensorrt_llm/scaffolding/worker.py
Signed-off-by: wu1du2 <wu1du2@gmail.com>
* remove pics
Signed-off-by: wu1du2 <wu1du2@gmail.com>
* pre-commit fix
# Conflicts:
# tensorrt_llm/scaffolding/contrib/mcp/__init__.py
# tensorrt_llm/scaffolding/contrib/mcp/mcp_utils.py
# tensorrt_llm/scaffolding/contrib/mcp/mcp_worker.py
Signed-off-by: wu1du2 <wu1du2@gmail.com>
* fix spell
Signed-off-by: wu1du2 <wu1du2@gmail.com>
* rebase
Signed-off-by: wu1du2 <wu1du2@gmail.com>
---------
Signed-off-by: wu1du2 <wu1du2@gmail.com>
2025-05-23 01:54:49 +00:00
dongxuy04
338744fba6
fix[nvbug-5295425]: [TRTLLM-5385] fix race condition in MoeLoadBalancer ( #4573 )
...
fix moe possible race cond and add bypass worker thread for no updates
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-05-23 09:24:23 +08:00
QI JUN
1e55d616da
Chore: clean up _gather_dp_requests_num method of PyExecutor ( #4571 )
...
clean up _gather_dp_requests_num method of PyExecutor
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-23 08:37:39 +08:00
nv-guomingz
3549b68c1c
chroe:clean useless flag ( #4567 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-23 07:05:15 +08:00
Mike Iovine
9c0de251db
[feat] Integrate Hopper chunked attention kernels ( #4330 )
...
* Integrate chunked attention kernels
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
* Fix cache key
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
* Fix lint
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
---------
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-22 17:10:57 -04:00
Mike Iovine
14fc48ada7
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler ( #4402 )
...
[fix] Fix chunked prefill + overlap scheduler
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-23 04:38:22 +08:00
Venky
c713eb5799
test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) ( #4446 )
...
ultra
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-22 13:07:33 -07:00
Robin Kobus
e5c90883a9
fix: Move cv2 import to load_video function ( #4541 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-22 17:56:07 +02:00
Chuang Zhu
558eaecf16
fix sequence data race ( #4565 )
...
stash for debug broken promise
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 23:13:48 +08:00
QI JUN
1e5d526db4
Chore: clean up _merge_dummy_request method of PyExecutor ( #4438 )
...
* clean up _merge_dummy_request method of PyExecutor
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* update comment
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-22 18:19:07 +08:00
xinhe-nv
22c01d5b21
test: [CI] Add failed cases into waives.txt ( #4549 )
...
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* fix test issues
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-22 17:18:53 +08:00
ruodil
1a45890dae
test: waive hanging cases for perf test ( #4562 )
...
waive hanging cases
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 15:50:05 +08:00
Chuang Zhu
3410508020
cache_transceiver_config ( #4556 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 13:59:51 +08:00
Iman Tabrizian
e741d2b8d0
Add tritonrelease container ( #4455 )
...
* Add tritonrelease container
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Review comments
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Update docker/Makefile
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-21 23:47:50 -04:00
Kaiyu Xie
2898d268f9
feat: add health_generate route to openai serving (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856 ) ( #4349 )
...
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Dhruv Singal <dhruvsingalabc@gmail.com>
2025-05-22 11:46:06 +08:00
HuiGao-NV
bc9f1dbede
fix[nvbug-5228840]: Remove test cases of feature not supported anymore ( #3972 )
...
* Remove waived cases
* Remove test cases of not supported feature
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-22 11:18:58 +08:00
Aurelien Chartier
f491244c84
feat: add dataset support for benchmark_core_model with LLMAPI ( #4457 )
...
* feat: add dataset support for benchmark_core_model with LLMAPI
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-21 19:18:43 -07:00
Kaiyu Xie
099cd3ce07
chore: Add all_reduce.py benchmark script to test ( #4537 )
...
Add all_reduce.py script to test
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-22 10:13:27 +08:00
Michal Guzek
9033dd987d
[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct ( #4415 )
...
Add phi-4-mini CLI acc test
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-22 09:56:48 +08:00
Yan Chunwei
4798d088d9
chore: Partition LlmArgs into TorchLlmArgs and TrtLlmArgs ( #3823 )
...
* partition LlmArgs
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* update backend
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-22 09:40:56 +08:00
Chuang Zhu
44cfd757b2
Agent interface impl for NIXL ( #4125 )
...
* agentConnection
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
recv
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
agentState
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
NIXL interfaces
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
update cmakelists
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
nixl improve
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
remove cppzmq
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
transferAgent remove register
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
work for cache Test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
reduce sleep time
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
intergarte
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
nixl env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix rebase error
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
cpp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
stash for send metaData
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
loadRemoteMD after fetchRemoteMD
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
workaround for mixed gen and context
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
test_env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
avoid port conflict in test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* format
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use std::string
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* typo
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* fix transferAgentTest
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 09:09:41 +08:00
Aurelien Chartier
1681e9fd1e
chore: remove extra PYTHONPATH ( #4453 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-21 17:38:01 -07:00
Nikita Korobov
e1b42be3d1
fix: TRT-LLM Gen dtype declaration ( #4503 )
...
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-05-21 23:56:37 +02:00
Dom Brown
1cffa99792
test: Split test_simple into mpi_utils and cache transceiver tests for DGX ( #4451 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-22 04:26:21 +08:00
Zongfei Jing
dbaddb3a29
Adding two-shot allreduce kernel and mnnvl multicasting buffer ( #4216 )
...
* Adding two-shot allreduce kernel and mnnvl multicasting buffergit gffe
Signed-off-by: Shiyu Li <shili@nvidia.com>
Adding comments
Signed-off-by: Shiyu Li <shili@nvidia.com>
Add unittest of the twoshot kernel.
Signed-off-by: Shiyu Li <shili@nvidia.com>
Update dispatch logic
Signed-off-by: Shiyu Li <shili@nvidia.com>
Use cpu barrier instead of GPU at init
Signed-off-by: Shiyu Li <shili@nvidia.com>
Merge dispatch logic fix
Signed-off-by: Shiyu Li <shili@nvidia.com>
Update the kernel to use GPU-managed buffer
Signed-off-by: Shiyu Li <shili@nvidia.com>
* Refine
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Clean code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix issue
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Clean up
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Simplify AllReduce interface
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Rename
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix warning
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Tidy code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Rename
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Refine
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Skip ut for no_fusion
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Refine
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Shiyu Li <shili@nvidia.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Shiyu Li <shili@nvidia.com>
2025-05-22 03:42:36 +08:00
Venky
0a8461d54c
test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) ( #4499 )
...
add low concurrency perf tests
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-21 10:46:48 -07:00
Kevin Chen
b80b78f87c
Add pytorch backend team ( #4405 )
...
* Add pytorch backend team
Signed-off-by: Kevin Chen
* Update .github/CODEOWNERS
Co-authored-by: Yanchao Lu
Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
---------
Signed-off-by: Kevin Chen
Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu
2025-05-21 21:10:35 +08:00
nv-guomingz
3b12e460e7
chore: clean ucx and nixl mirror. ( #4531 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-21 19:45:20 +08:00
Robin Kobus
cd0c826417
refactor: DisaggExecutorTest ( #4398 )
...
* chore: Improve formatting of DisaggExecutorTest
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Typed InstanceRole param in DisaggExecutorTest
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Skip DisaggExecutorTest based on device count
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-21 18:01:45 +08:00
dongxuy04
4018806742
feat: large-scale EP(part 3 - refactor: FusedMoe for redundant expert) ( #4495 )
...
refactor fused_moe for redundant expert
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-05-21 17:17:49 +08:00
xinhe-nv
407ef08662
tests: add qwene fp4 tests into QA test list & update sanity test list ( #4478 )
...
* update sanity test list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update test list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 16:52:02 +08:00
ruodil
83f1933f0c
test: add failed case in waive list and fix some test script issue for perf test ( #4527 )
...
add failed case in waive list and fix some test script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-21 16:37:25 +08:00
WeiHaocheng
a201ce9d53
docs: update the introduction for scaffolding ( #4360 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-05-21 14:54:01 +08:00
ruodil
3d9a2b5eb7
test: remove enable_overlap_schedule in pytorch config and set enable_chunked prefill to be true for isl>2048 cases ( #4285 )
...
1.remove enable_overlap_schedule in pytorch config
2.rename model_yaml_config.py to pytorch_model_config.py and set enable_chunked_prefill to be true for cases with isl>2048
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 14:26:56 +08:00
QI JUN
15317ece5a
CI: waive test_fp8_block_scales_4gpus of deepseek v3 lite ( #4520 )
...
waive test_fp8_block_scales_4gpus of deepseek v3 lite
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 13:19:43 +08:00
xinhe-nv
750f412b8f
tests: add llama 3.3 70b 2 nodes tests ( #4391 )
...
* add llama 3.3 70b 2 nodes tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* remove enable_overlap_scheduler parameter
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-21 12:42:45 +08:00
Perkz Zheng
6a35c599ef
Clean: fmha codes ( #4496 )
...
clean codes
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-21 11:45:47 +08:00
Chuang Zhu
ab5bea957d
unwaive some disagg test ( #4476 )
...
* unwaive some disagg test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* pytest.mark.skip_less_device(4)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-21 11:45:11 +08:00
Ruoqian Guo
db7446fda7
Feat: add deep_gemm swapab Kernel ( #4430 )
...
* feat: add deepgemm_swapab
feat: add fp8_gemm_kernel_swapab
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
feat: set threshold for deepgemm and deepgemmswapab
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
* docs: update README.md
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
* fix: std::runtime_error needs #include <stdexcept>
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
* chores: remove the redundant code
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
* feat: support for dense deep_gemm swapab
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
* chores: remove redundant code
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
---------
Signed-off-by: Ruoqian Guo <ruoqiang@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-05-21 10:48:43 +08:00
QI JUN
2372589689
Chore: waive torch compile test cases of deepseek v3 lite ( #4508 )
...
waive torch compile test cases of deepseek v3 lite
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 10:43:31 +08:00
Shi Xiaowei
3d62727303
test: NIXL single process test ( #4486 )
2025-05-21 10:41:46 +08:00
Thor Johnsen
5d438be59a
[TRTLLM-5000][feat] Pytorch implementation of ngram drafter ( #3936 )
...
* v1.5
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
v1.5.4 Add back draft_overhead to spec dec stats
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* v1.5.5: fix CI error
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.6: fix CI error 8196 > 8192
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* Address reviewer concerns
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* Address reviewer concerns
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* precommit run
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* v2.0: Address reviewer concerns
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v2.1: add fix from wili
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* Revert changes that require use of TypeAlias because that requires python version >= 3.10
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
---------
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-21 10:40:00 +08:00
Yan Chunwei
9199793848
fix: llmapi-launch add add trtllm-bench test with engine building ( #4091 )
...
* add trtllm-bench mgmn test
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-21 10:18:01 +08:00