Yiqing Yan
1ce23545fc
[None][chore] Remove duplicate test waives ( #6998 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 21:15:49 +08:00
Emma Qiao
69ff32f9b1
[None][infra] Waive failed tests on main 0818 ( #6992 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:34:52 +08:00
Shi Xiaowei
5ec15b98f0
[TRTLLM-7030][fix] uppercase def value in pd-config ( #6981 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-18 02:33:23 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 ( #6945 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Naveassaf
d6322f70b7
[ https://nvbugs/5451028 ][fix] Constrain NemotronSuper test parameters to prevent OOMs ( #6970 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-17 13:38:36 -04:00
amitz-nv
3a49b47081
[ https://nvbugs/5390853 ][fix] Fix _test_openai_lora.py - disable cuda graph ( #6965 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-17 16:56:16 +03:00
Emma Qiao
cc6d763824
[None][infra]Waive failed cases in main branch ( #6951 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-17 14:27:59 +03:00
bhsueh_NV
85cbd0263b
[None][feat] Support Yarn on Qwen3 ( #6785 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-17 07:21:29 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
brb-nv
9505727d31
[ https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests ( #6952 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-15 16:35:02 -07:00
Yuening Li
1f8ae2b2db
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow ( #6629 )
...
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-08-15 17:15:49 -04:00
dongfengy
0ad0b967bb
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) ( #6722 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-15 16:58:42 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy ( #6764 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
yifeizhang-c
4127d77678
[ https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 ( #6537 )
...
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
liji-nv
18ccd053d3
[ https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… ( #6858 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-15 11:14:20 -04:00
peaceh-nv
1c1d5d2495
[ https://nvbugs/5451373 ][fix] : Fix the accuracy issue when using FP8 context MLA ( #6881 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-15 16:53:56 +08:00
xinhe-nv
b23fdfc62f
[None][chore] Add failed cases into waives.txt ( #6914 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-15 14:00:16 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures ( #6836 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Bo Deng
e54ba75dac
[None][fix] Update tests to use standardized uppercase backend identifiers ( #6921 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-15 11:14:15 +08:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. ( #6800 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation ( #6840 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
Raayan Dhar
8b237b943b
[ https://nvbugs/5441714 ][chore] remove skip on disagg n-gram test ( #6872 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Bo Li
26f413ad90
[ https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case ( #6882 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Matthias Jouanneaux
69574ad730
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types ( #6816 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-08-14 09:00:02 -07:00
Emma Qiao
96339c69a9
[None][infra] Waive failed cases on main ( #6902 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-14 23:59:44 +08:00
Pengbo Wang @ NVIDIA
ffc976ceaf
[ https://nvbugs/5445466 ][fix] fix deepseek r1 hang by not enabling mnnvl by default ( #6860 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-14 22:36:56 +08:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands ( #6323 )
2025-08-14 03:48:57 -04:00
chenfeiz0326
5cd8c0f6cc
[None][test] Add perf-sweep scripts ( #6738 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-14 14:04:47 +08:00
NVJiangShao
a700646132
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE ( #6784 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-14 13:35:37 +08:00
Yan Chunwei
0132c1db84
[ https://nvbugs/5427043 ][fix] request length exceeds max_num_tokens ( #6821 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-14 13:31:12 +08:00
Bo Deng
d8acca495b
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 ( #6735 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-14 04:36:38 +00:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill ( #6655 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Izzy Putterman
ef53de8eef
[None][feat] Add test for speculative rejection sampler (2-model) ( #6542 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-13 22:09:35 -04:00
Mike Iovine
7cba883932
[ https://nvbugs/5410399 ][chore] Unwaive mtp llmapi test ( #6833 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 17:38:45 -04:00
Emma Qiao
c7e6145409
[None][infra] Waive failed cases on main ( #6863 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-13 09:50:14 -04:00
Anthony Chang
2198587b35
[ https://nvbugs/5378031 ] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend ( #6200 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-08-13 21:24:40 +08:00
Yukun He
bc5f766e0e
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. ( #6545 )
...
* Generalize the definition of tactics so that users can implement more customizable tactic types, making the configurations clearer for each kernel run.
* Allow the user not to specify the `gen_tuning_buckets` or the `map_to_tuning_buckets` function.
* Other code refactoring.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-13 16:25:22 +08:00
Mike Iovine
f68e03e646
[ https://nvbugs/5452167 ][fix] Fix ngram padding issue ( #6837 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 11:23:16 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support ( #6622 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
rakib-hasan
2923eb88a1
[None][fix] Refactoring input prep to allow out-of-tree models ( #6497 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-12 20:29:10 -04:00
xinhe-nv
e35fca4272
[TRTQA-2920][chore] improve hang tests ( #6781 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-12 18:26:51 +08:00
Sergey Klevtsov
27fc35175e
[None][feat] CUTLASS MoE FC2+Finalize fusion ( #3294 )
...
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-08-12 15:56:48 +08:00
Fridah-nv
0dc4b4e699
[ #4403 ][autodeploy] Refactor: Move more transformations to new inf optimizer, Add quantization_source to factory interface ( #6760 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-11 22:02:46 -07:00
Enwei Zhu
7c686ba8de
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill ( #6774 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-12 09:30:06 +08:00
Ziyi Xiong
b4fcd5f592
[ https://nvbugs/5441438 ][fix] Set correct draft length for the cuda graph dummy request ( #6701 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-12 09:28:47 +08:00
Jinyang Yuan
ead89a0e40
[None][perf] Improve the performance of online EPLB on Hopper by better overlapping ( #6624 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-08-12 09:25:13 +08:00
Chang Liu
be9dd4713c
[ https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version ( #6673 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
Aurelien Chartier
56bfc3a6d2
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically ( #6763 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-11 15:18:19 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
Emma Qiao
5145e9d40e
[None][infra] Unwaive an updated case to test ( #6791 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 06:47:33 -04:00