Estimating accurate depth from an RGB image in any environment is challenging task in computer vision. Recent learning based method using deep Convolutional Neural Networks (CNNs) have driven plausible appearance, but these conventional methods are not good at estimating scenes that have a pure rotation of camera, such as in-plane rolling. This movement imposes perturbations on learning-based methods because gravity direction is considered to be strong prior to CNN depth estimation (i.e., the top region of an image has a relatively large depth, whereas bottom region tends to have a small depth). To overcome this crucial weakness in depth estimation with CNN, we propose a simple but effective refining method that incorporates in-plane roll alignment using camera poses of monocular Simultaneous Localization and Mapping (SLAM). For the experiment, we used public datasets and also created our own dataset composed of mostly in-plane roll camera movements. Evaluation results on these datasets show the effectiveness of our approach.