For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming models, communication primitives such as MPI functions cannot be used within GPU kernels. Instead, such functions should be used in the CPU code. Therefore, programmer must handle both GPU kernel and CPU code for data communications. This makes GPU programming and its optimization very difficult. In this paper, we propose a programming framework named FLAT which enables programmers to use MPI functions within GPU kernels. Our framework automatically transforms MPI functions written in a GPU kernel into runtime routines executed on the CPU. The execution model and the implementation of FLAT are described, and the applicability of FLAT in terms of scalability and programmability is discussed. We also evaluate the performance of FLAT. The result shows that FLAT achieves good scalability for intended applications.