Translation disambiguation for cross-language information retrieval using context-based translation probability

Kazuaki Kishida, Emi Ishita

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.

Original languageEnglish
Pages (from-to)481-495
Number of pages15
JournalJournal of Information Science
Volume35
Issue number4
DOIs
Publication statusPublished - 2009 Aug

Keywords

  • Cross-language information retrieval
  • Parallel corpora
  • Translation probability
  • Word sense disambiguation

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Translation disambiguation for cross-language information retrieval using context-based translation probability'. Together they form a unique fingerprint.

  • Cite this