The target task of this study is grounded language understanding for domestic service robots (DSRs). In particular, we focus on instruction understanding for short sentences where verbs are missing. This task is of critical importance to build communicative DSRs because manipulation is essential for DSRs. Existing instruction understanding methods usually estimate missing information only from non-grounded knowledge; therefore, whether the predicted action is physically executable or not was unclear. In this paper, we present a grounded instruction understanding method to estimate appropriate objects given an instruction and situation. We extend the Generative Adversarial Nets (GAN) and build a GAN-based classifier using latent representations. To quantitatively evaluate the proposed method, we have developed a data set based on the standard data set used for visual question answering (VQA). Experimental results have shown that the proposed method gives the better result than baseline methods.