News

Testing the Qwen2.5 VL-3B model using TRTLLM version 0.19.0, following the PyTorch workflow example , running with the use_cuda_graph parameter resulted in only a few generated tokens. Removing the ...