Providing route directions is a complicated interaction. Utterances are combined with gestures and pronounced with appropriate timing. This study proposes a model for a robot that generates route directions by integrating three important crucial elements: Utterances, gestures, and timing. Two research questions must be answered in this modeling process. First, is it useful to let robot perform gesture even though the information conveyed by the gesture is given by utterance as well? Second, is it useful to implement the timing at which humans speaks? Many previous studies about the natural behavior of computers and robots have learned from human speakers, such as gestures and speech timing. However, our approach is different from such previous studies. We emphasized the listener's perspective. Gestures were designed based on the usefulness, although we were influenced by the basic structure of human gestures. Timing was not based on how humans speak, but modeled from how they listen. The experimental result demonstrated the effectiveness of our approach, not only for task efficiency but also for perceived naturalness.