This paper proposes a speech generation system named Linfa-III, which generates an utterance dependent on a real world situation. To generate the situated utterance, Linta-III has a joint attention mechanism, which develops joint attention between a person and a robot. The joint attention mechanism employs eye-contact and an attention expression. The eye-contact promotes the relationship between the person and the robot. The attention expression manifests relevant sensor information with a physical expression. With the eye-contact and attention expression, the joint attention mechanism is able to draw the person's attention to the same sensor information as the robot. As a result of the joint attention, Linta-III is able to omit obvious words in the situation from an utterance description. We also conducted a psychological experiment on the development of joint attention. The results indicated that the eye-contact and attention expression are significant factors in the development of joint attention.