In this study, we attempt to address the issue regarding the spatial join count, where in the number of particles around a halo is counted only once for a given simulation result. An efficient spatial index is necessary for accelerated counting; therefore, we propose a CPU optimized sort-tile-recursive R-tree that employs a parallel radix sort and node packing with thread pool and single instruction multiple data instructions. In an experiment conducted with astronomical data, the proposed method demonstrates an improvement in performance by 26.8 times compared with that using a conventional CPU optimized R-tree. We also propose a partial materialization approach to handle large amount of data that exceeds the capacity of main memory. To accelerate the approach, we propose a construct-search-destruct pipeline that exploits a thread pool to conceal the latency of the construction and destruction of the index. The pipelining method achieves an improvement in performance by 27.5 times compared with that of a conventional CPU optimized R-tree. All our codes are available on GitHub.