Table-based implementations have been mainly reported in research related to high-performance AES on GPUs, in which tables are stored in the shared memory. On the other hand, this kind of implementations is subject to timing attacks, due to the latency required to access tables in the shared memory. Thanks to the increasing number of registers every year, GPU programming has enabled memory intensive applications such as bitsliced AES algorithm to be easily implemented. However, researches of implementation of bitsliced AES algorithm on GPU have not so far been conducted sufficiently in terms of several parameters. For this reason, in this paper, we present an implementation of bitsliced AES encryption on CUDA-enabled GPU with several parameters, especially focusing on three kinds of parallel processing granularities. According to the conducted experiments, the throughput of bitsliced AES-ECB encryption with Bs64 granularity achieves 605.9 Gbps on Nvidia Tesla P100-PCIe resulting in an enhancement of 8.0% when compared to the table-based implementation.