Commit Graph

32 Commits

Author SHA1 Message Date
JadoTu
0582e54b61
[None][fix] modify qwen3-next sampling stop_tokens (#9331)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-23 21:10:09 +08:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs (#7206)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
mpikulski
8298e93bd8
[TRTLLM-8414][chore] BREAKING CHANGE: refine sampling strategy selection (#8132)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-08 15:46:50 +02:00
Yueh-Ting (eop) Chen
c5012423f5
[None][chore] Remove developer name in comment (#7981)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-09-25 06:43:38 -07:00
Venky
6ff0fad75e
[TRTLLM-7015] [feat] Enable prompt_logprobs in pytorch backend (#7580)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-09-23 18:48:10 -07:00
Pengbo Wang
5792464d37
[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:47:03 +08:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
Enwei Zhu
4b299cb77e
feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 (#6408)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-31 09:53:52 +08:00
xiaoqi
a0aecf0476
[feat]: support logit_bias (#5354)
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-25 09:37:41 +00:00
Enwei Zhu
055c4a9fe6
[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-10 16:30:00 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
2ez4bz
dc52b67492
linting(python): Enable ruff on more files (wave 1/N) (#5140)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-06-14 19:19:34 +08:00
Yibin Li
b79eb34bfe
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-06-13 11:37:50 +08:00
Enwei Zhu
0087bd27ba
[fix] Fix SamplingParams check on n and best_of (#4655)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-01 09:11:55 +08:00
coldwaterq
1cf0e672e7
fix: [nvbugs/5066257] serialization improvments (#3869)
* added a restricted pcikler and depickler in a sepparate serialization function.

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>

* updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests.

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>

* removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list.

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* cleaned up a couple files to reduce conflicts with main.

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* fix unit tests

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* reorder BASE_ZMQ_CLASSES list alphabetically

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* fix tests and move LogitsProcessor registration to base class

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* revert changes to import log of tensorrt_llm._torch.models

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* fix tests and move LogitsProcessor registration to base class

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* additional comments for multiprocess approved list sync

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* add dataclass from tests

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

---------

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Co-authored-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-05-23 13:06:29 +08:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding (#4066)
* finish

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* exc overlap scheduler

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix api ref

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Yuan Tong
5b93273156
feat: adopt new logprob definition in PyTorch flow (#4057)
feat: align logprob definition of PyTorch flow

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
2025-05-08 20:16:40 +08:00
Yan Chunwei
0c26059703
chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732)
* beam_width and max_new_token

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove beam_width

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove min_length

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove return_num_sequences

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-07 13:20:25 +08:00
Erin
cba1793cda
cleanup logprob params (#4039)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-07 00:50:16 +08:00
Erin
8fe7bdeacf
feat: LogitsProcessor in PyTorch backend (#3145)
* support lp in pytorch backend

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

* fix tp

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

---------

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-01 14:15:30 -07:00
Erin
83f37614ef
feat: Support Top-K logprobs and prompt_logprobs in LLMAPI (#3388)
* support return logprob in llmapi

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

update and add test

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

stability test

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

* revert removal of old flag

Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

---------

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>
2025-05-01 12:47:14 -04:00
wili
3e035f2219
v1.2 (#3082)
Signed-off-by: wili <wili@nvidia.com>
2025-03-26 23:31:29 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
be17881062
Update TensorRT-LLM (#2582) 2024-12-16 21:50:47 -08:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM (#2562)
* Update TensorRT-LLM

---------

Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM (#2532)
* blossom-ci.yml: run vulnerability scan on blossom

* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec

---------

Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00