In recent years there has been increased interest in studies that explore integrative learning of language and other modalities by using neural network models. However, for practical application to human-robot interaction, the acquired semantic structure between language and meaning has to be available immediately and repeatably whenever necessary, just as in everyday communication. As a solution to this problem, this study proposes a method in which a recurrent neural network self-organizes cyclic attractors that reflect semantic structure and represent interaction flows in its internal dynamics. To evaluate this method we design a simple task in which a human verbally directs a robot, which responds appropriately. Training the network with training data that represent the interaction series, the cyclic attractors that reflect the semantic structure is self-organized. The network first receives a verbal direction, and its internal state moves according to the first half of the cyclic attractors with branch structures corresponding to semantics. After that, the internal state reaches a potential to generate appropriate behavior. Finally, the internal state moves to the second half and converges on the initial point of the cycle while generating the appropriate behavior. By self-organizing such an internal structure in its forward dynamics, the model achieves immediate and repeatable response to linguistic directions. Furthermore, the network self-organizes a fixed-point attractor, and so able to wait for directions. It can thus repeat the interaction flexibly without explicit turn-taking signs.