Zhenhua Wang
8416d7fea8
[ https://nvbugs/5412885 ][doc] Add the workaround doc for H200 OOM ( #6853 )
...
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
2025-08-13 19:51:38 +08:00
Perkz Zheng
0fad6029f7
[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels ( #6851 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 19:13:40 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Yukun He
bc5f766e0e
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. ( #6545 )
...
* Generalize the definition of tactics so that users can implement more customizable tactic types, making the configurations clearer for each kernel run.
* Allow the user not to specify the `gen_tuning_buckets` or the `map_to_tuning_buckets` function.
* Other code refactoring.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-13 16:25:22 +08:00
Void
1d80df0955
[None][feat] DeepEP LL combine FP4 ( #6822 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-13 04:20:21 -04:00
Zhou Yuxin
50e5e725e9
[ https://nvbugs/5412456 ][fix] Fix an illegal instruction was encountered ( #6776 )
...
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-13 15:45:59 +08:00
Aurelien Chartier
2e0081b53e
[ #6530 ][fix] Fix script when using calibration tensors from modelopt ( #6803 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-12 20:41:10 -07:00
Mike Iovine
f68e03e646
[ https://nvbugs/5452167 ][fix] Fix ngram padding issue ( #6837 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 11:23:16 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support ( #6622 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
Fanrong Li
1bbc0e323b
[None][fix] Pre-allocate workspaces for DeepGEMM MoE to avoid frequent cudaFree/cudaMalloc ( #6811 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-13 10:27:57 +08:00
Kaiyu Xie
47806f09d9
feat: Support custom repo_dir for SLURM script ( #6546 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
2025-08-12 22:06:59 -04:00
rakib-hasan
2923eb88a1
[None][fix] Refactoring input prep to allow out-of-tree models ( #6497 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-12 20:29:10 -04:00
dongxuy04
bd9a6dd9ab
[TRTLLM-7008][fix] fix wideEP weights loading and args ( #6789 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-12 19:14:20 -04:00
Robin Kobus
45c7518032
[None][refactor] Simplify decoder state initialization ( #6559 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:44:41 +02:00
Robin Kobus
dd11e08d26
[ #6187 ][feat] add LayerNorm module ( #6625 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:43:30 +02:00
nvchenghaoz
81f0ded1c4
[None][feat] Add GPT OSS support for AutoDeploy ( #6641 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-08-12 14:03:22 -04:00
Jhao-Ting Chen
a060e12041
[ https://nvbugs/5438869 ][fix] Set nvfp4 expert w1 w3 weight scale to the same value if they're not ( #6656 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-12 20:47:10 +08:00
xinhe-nv
e35fca4272
[TRTQA-2920][chore] improve hang tests ( #6781 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-12 18:26:51 +08:00
QI JUN
8845e0f065
[None][fix] fix ci ( #6814 )
2025-08-12 02:21:50 -07:00
Shunkangz
ab0d768acf
[None][fix] Fix attention dp log ( #6570 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-12 04:53:09 -04:00
Liao Lanyu
f7c13a4aa7
[TRTLLM-6906][chore] Using pybind to bind functions in thop/attentionOp ( #6745 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-12 16:45:16 +08:00
Sergey Klevtsov
27fc35175e
[None][feat] CUTLASS MoE FC2+Finalize fusion ( #3294 )
...
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-08-12 15:56:48 +08:00
Fridah-nv
0dc4b4e699
[ #4403 ][autodeploy] Refactor: Move more transformations to new inf optimizer, Add quantization_source to factory interface ( #6760 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-11 22:02:46 -07:00
Enwei Zhu
7c686ba8de
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill ( #6774 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-12 09:30:06 +08:00
Ziyi Xiong
b4fcd5f592
[ https://nvbugs/5441438 ][fix] Set correct draft length for the cuda graph dummy request ( #6701 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-12 09:28:47 +08:00
Jinyang Yuan
ead89a0e40
[None][perf] Improve the performance of online EPLB on Hopper by better overlapping ( #6624 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-08-12 09:25:13 +08:00
Chang Liu
be9dd4713c
[ https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version ( #6673 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
Aurelien Chartier
56bfc3a6d2
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically ( #6763 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-11 15:18:19 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
Venky
c9fe07ede6
[TRTLLM-6812][feat] Add standardized GitHub issue templates and disable blank issues ( #6494 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-11 13:08:48 -04:00
Zhenhua Wang
7e33ed6d61
[None][chore] always try-catch when clear build folder in build_wheel.py ( #6748 )
...
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-11 14:02:17 +02:00
Emma Qiao
5145e9d40e
[None][infra] Unwaive an updated case to test ( #6791 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 06:47:33 -04:00
Liao Lanyu
a2e9153cb0
[None][doc] Add K2 tool calling examples ( #6667 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-11 16:25:41 +08:00
bhsueh_NV
83dbc6c75d
[TRTLLM-5532][feat] store the block of context request into kv cache ( #6683 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-11 16:14:52 +08:00
Martin Marciniszyn Mehringer
9a8195ef88
fix: Ensure that Python stub generation works against libnvidia-ml stubs ( #6188 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-08-11 09:18:17 +02:00
Emma Qiao
d6ad4a9d5b
[None][infra] Waive failed tests on main 0811 ( #6778 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:16:25 -04:00
xinhe-nv
9c358c26e4
[None][chore] remove closed bugs ( #6772 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-11 14:39:58 +08:00
Yiqing Yan
62d6c98d68
[TRTLLM-5633][infra] Force set changed file diff to empty string for post-merge CI ( #6777 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-11 02:38:05 -04:00
Eran Geva
b3e8fa2960
[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu ( #6487 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-08-11 08:33:13 +03:00
Tracin
49bcaa4e95
Add gpt-oss GSM8K test. ( #6732 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-08-10 22:45:43 -04:00
Chuang Zhu
c566a8d2a2
[None][fix] fix same pp disagg ( #6730 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-10 22:45:15 -04:00
Bo Deng
767879ef85
[ https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper ( #6736 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 10:05:10 +08:00
Zero Zeng
4b4b91ab51
[None][feat] improve dataloading for benchmark_dataset by using batch… ( #6548 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-08-11 09:50:41 +08:00
Yechan Kim
60073a7ad9
[None][feat] Support SharedTensor on MultimodalParams ( #6254 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-10 17:48:24 -07:00
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs ( #6592 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
pcastonguay
4142320e53
[ https://nvbugs/5444937 ][fix] Fixing kv_cache_event unit test ( #6753 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-10 16:45:38 -07:00
shaharmor98
14b36e07d7
[TRTLLM-6174][feat] Enable FP32 mamba ssm cache ( #6574 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 16:27:51 -04:00
Yueh-Ting (eop) Chen
199f306984
[None][chore][kv cache manager] Dead code elimination, we no longer record/fetch through WindowBlockManager:: mContextBlocksByHash ( #6249 )
...
No functional change is intended in this MR.
`WindowBlockManager::mCachedBlocksRoot` is now who is responsible
for the bookkeeping of the `KVCacheBlock`, and the `mNextBlocks` is
now the actual hash map that fetches the block.
The `mEnableHashKey` knob and related hashing is removed.
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-08-10 09:10:10 -04:00
Gal Hubara-Agam
3c5aec19c2
[ #5048 ][enhance] AutoDeploy: Optimize prepare_inputs ( #6634 )
...
Optimize prepare_inputs routine in AutoDeploy, as part of the effort to reduce the performance gap compared to the default backend.
This PR includes two major fixes, and some other minor tweaks:
1. Avoid back and forth data copies
2. Optimize position ids update by separating the implementation for generation mode and context mode.
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-10 13:55:04 +03:00
Emma Qiao
ee19ca5e58
[None][infra] Waive test main 0808 ( #6751 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-09 23:54:07 -04:00