This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other matrix algebra primitives and some commonly used likelihoods, with more additions planned for the near future. Stan users can now benefit from speedups offered by GPUs with little effort and without changes to their existing Stan code. We demonstrate the practical utility of our work with two examples – logistic regression and Gaussian process regression.