Commit Graph

9 Commits

Author SHA1 Message Date
Robin Kobus
4cd8543d8c
[TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic from postProcessRequest (#5489)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-02 10:13:31 +02:00
Robin Kobus
5f77d212ef
test: Reduce number of C++ test cases (#5437)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-01 09:40:49 +02:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092)
* chore: Remove GptSession/V1 from TRT workflow

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove stateful decoders

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession buffers

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession utils

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession kernels

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove V1 GPT models from tests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove gptSessionBenchmark from scripts and docs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove gptSession IO classes

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession from test lists

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove GptSession from docs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove useless encoder test

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove mActualBatchSize from DecoderState

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove static batching from ExecutorTest

- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
Robin Kobus
ccff86068e
fix: request termination in pipeline parallelism (#3892)
* feat: Implement synchronous request termination in batch manager

- Added `terminateRequestSync` method to `TrtEncoderModel` and `TrtGptModelInflightBatching` for handling request termination in the next `forwardSync` call.
- Updated existing request termination logic to utilize the new synchronous method, ensuring generated tokens are cleared appropriately.
- Enhanced logging for clarity in token management during request processing.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: MockedModelCancelRequest

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: terminate with timeout

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* docs: Update doc string for allottedTimeMs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-05 21:51:41 +08:00
Robin Kobus
403370af62
refactor: Move ModelSpec to core library (#3980)
* refactor: Move ModelSpec from tests to core library

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Move ModelSpec from runtime to separatedir

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Use new bindings path and clean up

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Updated licenses

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove script_dir from path

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-04 01:39:09 +08:00
Dom Brown
b40f351b7a
[TRTLLM-4460] test: Use Llama 3.2 1B for Llama C++ tests (#3206)
* Squash of dev commits

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Add timer + waive test with suspected GptSession bug

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Respond to reviewer comments

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

---------

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-01 05:31:08 +08:00
QI JUN
991939a0f4
chore: increase A30 for cpp test (#3811)
* increase A30 for cpp test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* enable parallel run test for gpt_executor

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* decrease freeGpuMemoryFraction of cpp tests

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:34:39 -07:00
Dom Brown
60d4dacc47
Port multi GPU changes to GitHub (#3027)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-03-27 05:55:03 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00