Commit Graph

20 Commits

Author SHA1 Message Date
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
2ez4bz
dc52b67492
linting(python): Enable ruff on more files (wave 1/N) (#5140)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-06-14 19:19:34 +08:00
Yibin Li
b79eb34bfe
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-06-13 11:37:50 +08:00
Enwei Zhu
0087bd27ba
[fix] Fix SamplingParams check on n and best_of (#4655)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-01 09:11:55 +08:00
coldwaterq
1cf0e672e7
fix: [nvbugs/5066257] serialization improvments (#3869)
* added a restricted pcikler and depickler in a sepparate serialization function.

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>

* updated IPC to remove approved classes, removed the serialization function because it didn't work for all objects that made debugging harder, added tests.

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>

* removed LLM arg and moved class registration to a serialization module function. Also added missing classes to approved list.

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* cleaned up a couple files to reduce conflicts with main.

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* fix unit tests

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* reorder BASE_ZMQ_CLASSES list alphabetically

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* fix tests and move LogitsProcessor registration to base class

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* revert changes to import log of tensorrt_llm._torch.models

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* added comments to explain why BASE_ZMQ_CLASSES has to be passed into spawned child processes

Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>

* fix tests and move LogitsProcessor registration to base class

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* additional comments for multiprocess approved list sync

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

* add dataclass from tests

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

---------

Signed-off-by: coldwaterq@users.noreply.github.com <coldwaterq@users.noreply.github.com>
Signed-off-by: coldwaterq <coldwaterq@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Co-authored-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-05-23 13:06:29 +08:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding (#4066)
* finish

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* update

Signed-off-by: Ubospica <ubospica@gmail.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* exc overlap scheduler

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add test

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix api ref

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Yuan Tong
5b93273156
feat: adopt new logprob definition in PyTorch flow (#4057)
feat: align logprob definition of PyTorch flow

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
2025-05-08 20:16:40 +08:00
Yan Chunwei
0c26059703
chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732)
* beam_width and max_new_token

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove beam_width

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove min_length

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove return_num_sequences

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-07 13:20:25 +08:00
Erin
cba1793cda
cleanup logprob params (#4039)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-07 00:50:16 +08:00
Erin
8fe7bdeacf
feat: LogitsProcessor in PyTorch backend (#3145)
* support lp in pytorch backend

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

* fix tp

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

---------

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-01 14:15:30 -07:00
Erin
83f37614ef
feat: Support Top-K logprobs and prompt_logprobs in LLMAPI (#3388)
* support return logprob in llmapi

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

update and add test

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

stability test

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

* revert removal of old flag

Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

---------

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>
2025-05-01 12:47:14 -04:00
wili
3e035f2219
v1.2 (#3082)
Signed-off-by: wili <wili@nvidia.com>
2025-03-26 23:31:29 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
be17881062
Update TensorRT-LLM (#2582) 2024-12-16 21:50:47 -08:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM (#2562)
* Update TensorRT-LLM

---------

Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM (#2532)
* blossom-ci.yml: run vulnerability scan on blossom

* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec

---------

Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00