mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-25 13:12:45 +08:00
The seq_len of 4096 will cause some unknown CUDA illegal memory access issue if run with some other tests consecutively. Put a saturated upper bound for any sequence length larger than it. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| cpp_custom_ops.py | ||
| flashinfer_custom_ops.py | ||
| torch_custom_ops.py | ||
| trtllm_gen_custom_ops.py | ||
| userbuffers_custom_ops.py | ||