Dialogue interface with voice is one of the common methods of interaction between a user and a machine. This interface is believed to be easy for all people because auditory information doesn't require additional knowledge from users. However, having only auditory instructions sometimes causes misinterpretation of spatial information like locations and directions. This risk becomes large, especially with older people, because users' mental abilities to manipulate images or patterns decrease with age. We support older people's learning of spatial information by using an attachable gesture robot. Our robot consists of human-like eyes and arms and is attached to the object. It supports older people's mental manipulation of space with gestures during voice interaction. We designed and implemented both hardware and software on a vacuum robot and evaluated the method by training older people to learn its features. We compared two instructional methods that explained eight features on the vacuum, one method with voice and gestures and one with only voice. The results show that the subjects were more likely to remember two features if given training with gestures. We also found that their learning motivation was increased when given voice and gesturing instructional methods.