brb-nv
|
0e16d1f070
|
test: Add time logging for lora tests (#6466)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-30 14:02:43 -07:00 |
|
Bo Deng
|
24e7f4eece
|
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-31 00:41:37 +08:00 |
|
Wanli Jiang
|
9632dba02e
|
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-07-30 09:20:16 -07:00 |
|
nv-guomingz
|
03e38c9087
|
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-30 11:11:06 -04:00 |
|
pcastonguay
|
e7ae5e2824
|
feat: Add support for disaggregation with pp with pytorch backend (#6369)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-07-30 09:42:13 -04:00 |
|
Yechan Kim
|
22b29df38c
|
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 17:29:55 +08:00 |
|
xinhe-nv
|
d9ab3fd35e
|
tests: add TestNemotronH cuda graph tests (#6390)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 18:45:58 +10:00 |
|
2ez4bz
|
d6eed1b624
|
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-30 14:10:36 +08:00 |
|
Venky
|
ab40369053
|
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests (#6463)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-30 10:53:43 +10:00 |
|
Yechan Kim
|
d6eb8e2366
|
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-30 08:52:31 +08:00 |
|
xinhe-nv
|
4fbb344caf
|
test: [CI] Add failed cases into waives.txt (#6423)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-29 19:00:30 +10:00 |
|
ruodil
|
e11255e9d0
|
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases (#6430)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-29 15:52:45 +10:00 |
|
Michal Guzek
|
2573bb729d
|
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-07-28 14:02:14 -07:00 |
|
2ez4bz
|
60e4d3a9d4
|
[test] Add accuracy regression test for Mistral3.1 (#6322)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-28 09:41:44 -07:00 |
|
ruodil
|
03632a679f
|
test: organize perf cases and add missing perflab cases in qa test list (#6283)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-07-28 20:33:32 +10:00 |
|
xinhe-nv
|
971be1fe86
|
test: waive failed cases (#6394)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-28 20:31:43 +10:00 |
|
Ivy Zhang
|
2945817cae
|
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#6292)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-28 15:33:30 +08:00 |
|
Yan Chunwei
|
908f49a4ad
|
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-28 09:01:10 +08:00 |
|
Iman Tabrizian
|
c35c78ff58
|
[fix][nvbugs/5390810] Improve the check for disaggregated serving test (#6301)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-25 12:47:01 -07:00 |
|
nv-guomingz
|
b8d4cb8beb
|
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
|
2025-07-25 12:55:56 -04:00 |
|
xinhe-nv
|
470544cf17
|
test: [CI] Add failed cases into waives.txt (#6333)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-25 17:18:06 +10:00 |
|
xinhe-nv
|
6268a60ab3
|
tests: add test_chunked_prefill for llama4 (#5549)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-24 23:02:00 -04:00 |
|
xinhe-nv
|
2dcfa90e99
|
test: skip llama3.3 70b test on cg4 (#6293)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-07-24 19:29:56 -07:00 |
|
bhsueh_NV
|
7b6aadc800
|
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-24 21:47:37 +08:00 |
|
Ivy Zhang
|
f290108cd8
|
tests: only get timeout value from pytest marker (#6287)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-24 20:51:02 +08:00 |
|
Iman Tabrizian
|
5fceaa6153
|
Revert "tests: add timeout_manager to tensorrt flow test cases (#5942)" (#6309)
|
2025-07-23 23:58:10 -04:00 |
|
Emma Qiao
|
82d03ca979
|
[Infra] - Increase unittest execution time since some test exceeds 1600 (#6277)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-24 10:02:28 +08:00 |
|
Yechan Kim
|
83c3ed128b
|
chore: set default device to cpu on Multimodal models (#5994)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-22 21:45:31 -07:00 |
|
pcastonguay
|
310bdd9830
|
fix: Fix triton backend build [nvbug 5396469] (#6098)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
Bo Li
|
537757e669
|
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. (#5896)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-07-22 12:48:00 +08:00 |
|
2ez4bz
|
37d0b68442
|
[fix] Fix flaky mistral E2E test (#6230)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-22 11:55:28 +08:00 |
|
Ivy Zhang
|
eb5cb5b642
|
tests: add timeout_manager to tensorrt flow test cases (#5942)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-07-22 10:23:41 +08:00 |
|
Simeng Liu
|
4a0951f85c
|
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] (#5859)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-07-21 15:46:37 -07:00 |
|
Mike Iovine
|
9645814bdf
|
[chore] Clean up quickstart_advanced.py (#6021)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-07-21 15:00:59 -04:00 |
|
Yi Zhang
|
f9b0a911fb
|
test: Enable GB200 torch compile multi gpu tests (#6145)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-21 22:17:13 +08:00 |
|
Pengyun Lin
|
9832bef07d
|
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve (#5717)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-21 21:09:43 +08:00 |
|
liji-nv
|
3e0fb60e50
|
[TRTLLM-4279] feat: Multistream initial support for torch compile flow (#5847)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-21 19:10:22 +08:00 |
|
ruodil
|
6a3c9f8061
|
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (#5826)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-07-21 11:29:19 +10:00 |
|
bhsueh_NV
|
2e14c8f443
|
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-20 10:25:25 +08:00 |
|
wili
|
82d3587bb8
|
[refactor] Unify name of NGram speculative decoding (#5937)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-19 12:59:57 +08:00 |
|
xiaoqi
|
28858c8711
|
feat(eagle3):support qwen3 dense model (#5879)
Signed-off-by: xq25478 <xq25478@qq.com>
|
2025-07-19 01:24:32 +08:00 |
|
Bo Deng
|
2c6fa145ee
|
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (#6095)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-19 00:48:44 +08:00 |
|
Erin
|
9522cde464
|
fix: NVBug 5385576 py_batch_idx issue (#6153)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-18 22:36:43 +08:00 |
|
Chuang Zhu
|
c0e416535e
|
fix single_disagg_test (#6166)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-18 13:18:37 +08:00 |
|
2ez4bz
|
8480c120b1
|
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2025-07-17 11:04:17 -07:00 |
|
Stanley Sun
|
9518e14f69
|
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (#6115)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-07-17 20:55:04 +10:00 |
|
Yi Zhang
|
a718486900
|
fix: Fix DeepSeek R1 CI (#6129)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-17 18:24:49 +08:00 |
|
Erin
|
de60ae47e3
|
chores: unwaive a few tests for v1.0 (#6107)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-17 17:59:51 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|