xinhe-nv
|
2c86cee38c
|
[None][chore] Remove closed bugs (#6969)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-19 16:01:33 +08:00 |
|
Ivy Zhang
|
bff5fdf6df
|
[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-08-19 13:59:14 +08:00 |
|
fredricz-20070104
|
e90280a84d
|
[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
2025-08-19 00:13:04 -04:00 |
|
Fanrong Li
|
816a120af6
|
[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-08-19 00:03:03 -04:00 |
|
Lizhi Zhou
|
71e28eab36
|
[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-08-19 09:58:22 +08:00 |
|
Leslie Fang
|
e76e5c640f
|
[None][infra] Enable accuracy test for mtp and chunked prefill (#6314)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-19 07:42:52 +08:00 |
|
Shi Xiaowei
|
5ec15b98f0
|
[TRTLLM-7030][fix] uppercase def value in pd-config (#6981)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-08-18 02:33:23 -04:00 |
|
Leslie Fang
|
ce0b13ea02
|
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-18 09:18:17 +08:00 |
|
Naveassaf
|
d6322f70b7
|
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters to prevent OOMs (#6970)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
|
2025-08-17 13:38:36 -04:00 |
|
Daniel Cámpora
|
53312eeebd
|
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-08-16 00:27:24 -04:00 |
|
brb-nv
|
9505727d31
|
[https://nvbugs/5401114][fix] Unwaive Gemma3 tests (#6952)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-08-15 16:35:02 -07:00 |
|
dongfengy
|
0ad0b967bb
|
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) (#6722)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2025-08-15 16:58:42 -04:00 |
|
ajrasane
|
4162d2d746
|
[None][test] Add accuracy evaluation for AutoDeploy (#6764)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-08-15 13:46:09 -04:00 |
|
yifeizhang-c
|
4127d77678
|
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
|
2025-08-15 09:52:06 -07:00 |
|
liji-nv
|
18ccd053d3
|
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6858)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-08-15 11:14:20 -04:00 |
|
Bo Deng
|
e54ba75dac
|
[None][fix] Update tests to use standardized uppercase backend identifiers (#6921)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-08-15 11:14:15 +08:00 |
|
Frank
|
2cc59aacb3
|
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-08-14 22:46:20 -04:00 |
|
Aurelien Chartier
|
b13a5a99b2
|
[None][chore] Add tests for non-existent and completed request cancellation (#6840)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-08-14 15:57:01 -07:00 |
|
Raayan Dhar
|
8b237b943b
|
[https://nvbugs/5441714][chore] remove skip on disagg n-gram test (#6872)
Signed-off-by: raayandhar <rdhar@nvidia.com>
|
2025-08-14 15:45:00 -07:00 |
|
Shi Xiaowei
|
1095dfd03c
|
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#6323)
|
2025-08-14 03:48:57 -04:00 |
|
Bo Deng
|
d8acca495b
|
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-08-14 04:36:38 +00:00 |
|
jmydurant
|
4200fa46d1
|
[None][feat] Add support for Hopper MLA chunked prefill (#6655)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-08-14 10:39:26 +08:00 |
|
Yechan Kim
|
12102e2d48
|
[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-12 19:34:02 -07:00 |
|
xinhe-nv
|
e35fca4272
|
[TRTQA-2920][chore] improve hang tests (#6781)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-08-12 18:26:51 +08:00 |
|
Enwei Zhu
|
7c686ba8de
|
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill (#6774)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-12 09:30:06 +08:00 |
|
Ziyi Xiong
|
b4fcd5f592
|
[https://nvbugs/5441438][fix] Set correct draft length for the cuda graph dummy request (#6701)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-08-12 09:28:47 +08:00 |
|
Aurelien Chartier
|
56bfc3a6d2
|
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically (#6763)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-08-11 15:18:19 -07:00 |
|
Tracin
|
49bcaa4e95
|
Add gpt-oss GSM8K test. (#6732)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-08-10 22:45:43 -04:00 |
|
Chuang Zhu
|
c566a8d2a2
|
[None][fix] fix same pp disagg (#6730)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-08-10 22:45:15 -04:00 |
|
Bo Deng
|
767879ef85
|
[https://nvbugs/5431127][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6736)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-08-11 10:05:10 +08:00 |
|
Ye Zhang
|
bcf5ec0c9a
|
[None][feat] Core Metrics Implementation (#5785)
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-08-09 02:48:53 -04:00 |
|
Leslie Fang
|
294e0d3dab
|
[https://nvbugs/5436461][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI (#6631)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-08-08 15:30:47 +08:00 |
|
Li Min
|
d913955952
|
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-08-08 15:03:48 +08:00 |
|
Enwei Zhu
|
aee828d98a
|
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-08 12:10:36 +08:00 |
|
ruodil
|
22f45a0e19
|
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-07 22:57:04 -04:00 |
|
Daniel Cámpora
|
efca359b66
|
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-08-07 22:19:37 -04:00 |
|
Yuan Tong
|
db8dc97b7b
|
[None][fix] Migrate to new cuda binding package name (#6700)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-08-07 16:29:55 -04:00 |
|
Raayan Dhar
|
4055b764db
|
[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
|
2025-08-07 11:18:02 -04:00 |
|
Enwei Zhu
|
1b9781e8e7
|
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) (#6300)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-08-07 05:53:48 -04:00 |
|
xinhe-nv
|
0a467b00cc
|
[https://nvbugs/5409414][fix] fix Not registered specs (#6660)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-07 17:55:53 +10:00 |
|
hlu1
|
8207d5fd39
|
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-07 03:04:18 -04:00 |
|
ruodil
|
f30398470d
|
[None][chore] update readme for perf release test (#6664)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-08-07 10:00:45 +10:00 |
|
Yechan Kim
|
1aed7511fe
|
[https://nvbugs/5430124][fix] Mistral mixture_text_image test case fix (#6648)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-06 06:58:58 -07:00 |
|
ruodil
|
907c180eb2
|
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-08-06 02:25:57 -04:00 |
|
yunruis
|
3ff4f503ad
|
[None][opt] ADP schedule balance optimization (#6061)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
2025-08-06 09:38:02 +08:00 |
|
Yechan Kim
|
c17f4984e2
|
[None][feat] Refactor Llava-Next (#6478)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-08-05 17:53:53 -07:00 |
|
ixlmar
|
1ebceb790d
|
[TRTLLM-5508][feat] check input tokens + improve error handling (#5170)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-08-05 18:27:43 +01:00 |
|
liji-nv
|
dcbfa7e509
|
[https://nvbugs/5252313][fix] Fix torch compile + MTP (#6554)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-08-05 10:31:29 -04:00 |
|
Venky
|
61da2daeb4
|
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-08-05 07:14:24 -07:00 |
|
Pengbo Wang @ NVIDIA
|
c289880afb
|
[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-08-05 18:05:33 +08:00 |
|