The configuration data transfer time of a dynamically re-configurable processor often bottlenecks the hardware context switching time and degrades its computation performance. In order to reduce data transferring time from a central memory to hardware context memory modules in all Processing Elements (PEs) and Switching Elements (SEs), a multicasting mechanism called RoMultiC (Row-Muticast Configuration) was proposed. However, the original RoMultiC used the whole PE or SE as a unit of multicast, the reduction of transfers is limited. Here, the trade-off between the granularity of multicast and hardware increase are evaluated, and the best way to make the multicast bit-map is explored. Evaluation results show that time for transfer is reduced up to 42% compared with the original RoMultiC with only 2% hardware overhead.