In order to overcome performance degradation caused by performance disparity between processor and main memory, there have been proposed several new VLSI architectures which have software controlled on-chip memory in addition to the conventional cache. However, users must specify data allocation/replacement on software controlled on-chip memory and data transfer between the on-chip and off-chip memories to achieve higher performance by utilizing on-chip memory. Because such properties are automatically controlled by hardware in conventional caches, a cost of optimization for a program becomes a matter that should be considered. In this paper, we propose an data movement optimization technique for software-controlled on-chip memory. We evaluated the proposed method using two applications. The results reveal that the proposed technique can drastically reduce memory stall cycles and achieve high performance.