This paper proposes a machine learning method for mapping object-manipulation verbs with sensory inputs and motor outputs that are grounded in the real world. The method learns motion concepts demonstrated by a user and generates a sequence of motions, using reference-point-dependent probability models. Four components, needed to learn objectmanipulation verbs, are estimated from camera images; (1) a trajector and landmark, which are the objects of transitive verbs; (2) a reference point; (3) an intrinsic coordinate system; and (4) parameters of the motion's probabilistic model. The motion concepts are learned using hidden Markov models (HMMs). In the motion generation phase, our method then combines HMMs to generate trajectories to accomplish goal-oriented tasks. Results from simulation experiments in which our method generates motion by combining learned motion primitives are shown.