Yuxian Qiu
2fb16ad328
[None][fix] fix log_once usage ( #7210 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-26 19:13:03 +08:00
HuiGao-NV
df80b1e128
[ https://nvbugs/5473789 ][bug] install cuda-toolkit to fix sanity check ( #7159 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-26 18:51:21 +08:00
Linda
b394c51c99
[ https://nvbugs/5409416 ][fix] test_openai_multi_chat_example ( #7174 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-26 11:33:58 +02:00
Ivy Zhang
1f7a1645d6
[None][fix] update skip case ( #7193 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 12:31:48 +08:00
ruodil
ebbbacf81c
[None][test] add kv cache size in bench metric and fix failed cases ( #7211 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-26 10:09:22 +08:00
pcastonguay
9d2b181e7d
[ https://nvbugs/5470840 ][fix] Disaggregated unit test MPI Init handling ( #7139 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-25 19:48:38 -04:00
Shi Xiaowei
d010b2043a
[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands ( #7191 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-25 20:21:43 +08:00
Yan Chunwei
5d165186d5
[None][doc] fix tensorrt legacy quickstart page ( #7190 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-25 19:30:50 +08:00
Wanli Jiang
b76c987913
[ https://nvbugs/5467232 ][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value ( #7168 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-25 15:37:57 +08:00
Guoming Zhang
01c5f2f233
[None][fix] Switch llm api quickstart example location per workflow. ( #7182 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-24 22:17:20 -04:00
peaceh-nv
030598a497
[ https://nvbugs/5448426 ][fix] Fix illegal memory access in cuda graph ( #7127 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-25 10:04:34 +08:00
Wanli Jiang
036c3dd0ea
[TRTLLM-6825][fix] Update lora for phi4-mm ( #7149 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-23 20:57:00 +08:00
Dom Brown
3f2eb4d2e8
[ https://nvbugs/5461712 ] [fix] Disable deep_gemm for Qwen3 due to accuracy issues ( #7170 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-08-23 05:26:12 -04:00
Michal Guzek
7ea53ff516
[ https://nvbugs/5433545 ][fix] TestPhi4MiniInstruct::test_auto_dtype - Use max_seq_len=4096 to fallback to the short RoPE factor ( #6895 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-08-22 10:28:09 -07:00
Dimitrios Bariamis
4b6cca0662
[ https://nvbugs/5474037 ][fix] Fix building tritonbuild/tritonrelease images ( #7157 )
...
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
2025-08-22 16:47:38 +02:00
Shi Xiaowei
3ee8523829
[ https://nvbugs/5450074 ][fix] Reduce the device memory requirements for testing ( #6990 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-22 17:33:30 +08:00
milesial
c1eefa8735
[ https://nvbugs/5467062 ][fix] pass logitsPostProcessorBatched by reference ( #7110 )
...
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
2025-08-22 04:42:06 -04:00
HuiGao-NV
253af9f9af
[ https://nvbugs/5410391 ][bug] Support to share device buffers in attention meta ( #6557 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-08-22 13:19:27 +08:00
brb-nv
79f1e6c867
[ https://nvbugs/5449032 ][fix] Add more llm-args to llm_mgmn_trtllm_bench.sh ( #7144 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-22 12:24:35 +08:00
Pamela Peng
1e5a6be55d
[ https://nvbugs/5448442 ][fix] Skip trtllm moe backend for sm120 ( #7010 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-08-21 13:34:07 -04:00
Emma Qiao
441edf1eeb
[None][infra] Skip failed tests for release branch ( #7130 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:09:51 +08:00
Venky
9eac744d72
[ https://nvbugs/5464088 ] [fix] dequantize fp8 activation input to lora forward; update perf test config ( #7014 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-21 08:28:54 -04:00
ChristinaZ
a875e50321
[ https://nvbugs/5392414 ] [fix] For release 1.0 cherry pick. Add customized default routing method ( #7068 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-21 20:06:50 +08:00
Yan Chunwei
caf73f5bab
[ https://nvbugs/5383702 ][fix] test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus ( #6889 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-21 08:56:42 +08:00
Yan Chunwei
e77ec061db
[ https://nvbugs/5451296 ][fix] zmq nonblock bug with retry ( #7019 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-21 08:34:46 +08:00
chenfeiz0326
5acf213a15
[ https://nvbugs/5440241 ][fix] Fix 70B GSM8K Accuracy drop ( #7075 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-20 18:11:00 -04:00
Erin
9a8f9b338f
[None][doc] update v1.0 doc for trtllm-serve ( #7056 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-08-20 10:57:21 -07:00
Dimitrios Bariamis
a343e8535e
[None][fix] Fix build of tritonbuild/tritonrelease image ( #7003 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
2025-08-20 16:57:39 +02:00
amitz-nv
3efe1d918a
[TRTLLM-7263][fix] Prevent recreation of cublas handles in lora_grouped_gemm every call ( #7053 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-20 06:41:20 -04:00
yifeizhang-c
5959d72d74
[ https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 ( #6975 )
...
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-20 16:32:27 +08:00
Jin Li
69846c6586
[ https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… ( #6978 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-20 15:06:56 +08:00
Bo Deng
df00c81aea
[ https://nvbugs/5448437 ][fix] fix some nixl tests ( #6940 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-20 14:19:48 +08:00
Yan Chunwei
fae43e7b46
[None][doc] add status labels to LLM class's api reference ( #6899 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-19 21:50:04 -04:00
Emma Qiao
c4535e6c3a
[None][infra] Waive failed tests for release branch ( #7036 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-19 20:42:47 +08:00
Iman Tabrizian
520117ece0
[ https://nvbugs/5451296 ][bug] Fix a thread leak in test_llm_args.py ( #7017 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-19 07:19:53 -04:00
brb-nv
da91256503
[None][chore] Waive E2E GB200 tests for Gemma3 27B ( #6916 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-19 05:19:34 -04:00
Yechan Kim
d6c2a6a81f
[ https://nvbugs/5448579 ][fix] EXAONE-4.0 accuracy test bugfix ( #6888 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-19 09:29:32 +02:00
Nave Assaf
d4dd5b4f4d
[ https://nvbugs/5451028 ][fix] Constrain NemotronSuper test parameters… ( #6987 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-19 09:19:50 +02:00
Perkz Zheng
20f7df25ac
[ https://nvbugs/5394685 ][fix] proper fix for the accuracy issue in 2CTA MLA kernels (release 1.0) ( #6946 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-19 03:10:29 -04:00
QI JUN
cd1b809d6e
[ https://nvbugs/5374016 ][fix] improve error message ( #6893 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-19 10:29:08 +08:00
Aurelien Chartier
fef2f1f55d
[ https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 ( #6913 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-19 10:25:43 +08:00
William Zhang
790a105563
[ https://nvbugs/5462007 ][ci] Unwaive Mistral Small 3.1 FP8 test ( #7008 )
...
The error was fixed by #6909 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 19:50:03 -04:00
Yanchao Lu
6fda8ddac9
[None][infra] Cherry-pick #6836 from main branch and improve SSH connection ( #6971 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-19 01:11:11 +08:00
Yiqing Yan
28c30e1bf8
[None][chore] Remove duplicate test waives ( #6999 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 22:04:43 +08:00
Emma Qiao
2992e9cd58
[None][infra] Waive failed tests for release branch 0818 ( #6993 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:31:50 +08:00
peaceh-nv
28526fe2b1
[ https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf ( #6937 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-18 15:58:53 +08:00
Ivy Zhang
055fdd9e31
[None][fix] update skip config ( #6891 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-18 13:50:46 +08:00
Guoming Zhang
96bda14fbd
[ https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 ( #6847 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-17 23:22:01 -04:00
William Zhang
c16aff5e3f
[ https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests ( #6909 )
...
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 11:17:37 +08:00
Liao Lanyu
d9b9b5d053
[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights ( #6927 )
...
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
2025-08-18 10:20:09 +08:00