Contextual reference refers to the mention of matters or topics that appear in the conversation, and situational reference to the mention of objects or events that exist around the conversation participants. In conventional utterance processing, the system deals with either contextual or situational reference in a dialogue. However, in order to achieve meaningful communication between people and the system in the real world, the system needs to consider Mixed Reference Interpretation (MRI) problem, that is, handling both types of reference in an integrated manner. In this paper, we propose DICONS, a method that sequentially estimates an interpretation of utterances from interpretation candidates derived from both contextual reference and situational reference in a dialogue. In an experiment in which DICONS handled this task with both contextual and situational references, we found that it could properly judge which type of reference had occurred. We also found that the referent of the demonstrative word in each context and situation could be properly estimated.