Translation disambiguation for cross-language information retrieval using context-based translation probability

Kazuaki Kishida, Emi Ishita

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.

Original languageEnglish
Pages (from-to)481-495
Number of pages15
JournalJournal of Information Science
Volume35
Issue number4
DOIs
Publication statusPublished - 2009 Aug

Fingerprint

Query languages
information retrieval
language
work technique
Glossaries
Statistics
dictionary
heuristics
statistics
performance

Keywords

  • Cross-language information retrieval
  • Parallel corpora
  • Translation probability
  • Word sense disambiguation

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Translation disambiguation for cross-language information retrieval using context-based translation probability. / Kishida, Kazuaki; Ishita, Emi.

In: Journal of Information Science, Vol. 35, No. 4, 08.2009, p. 481-495.

Research output: Contribution to journalArticle

@article{4b8a014fe3574594a6ec6dd7d3404828,
title = "Translation disambiguation for cross-language information retrieval using context-based translation probability",
abstract = "Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.",
keywords = "Cross-language information retrieval, Parallel corpora, Translation probability, Word sense disambiguation",
author = "Kazuaki Kishida and Emi Ishita",
year = "2009",
month = "8",
doi = "10.1177/0165551509103599",
language = "English",
volume = "35",
pages = "481--495",
journal = "Journal of Information Science",
issn = "0165-5515",
publisher = "SAGE Publications Ltd",
number = "4",

}

TY - JOUR

T1 - Translation disambiguation for cross-language information retrieval using context-based translation probability

AU - Kishida, Kazuaki

AU - Ishita, Emi

PY - 2009/8

Y1 - 2009/8

N2 - Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.

AB - Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.

KW - Cross-language information retrieval

KW - Parallel corpora

KW - Translation probability

KW - Word sense disambiguation

UR - http://www.scopus.com/inward/record.url?scp=68249147415&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=68249147415&partnerID=8YFLogxK

U2 - 10.1177/0165551509103599

DO - 10.1177/0165551509103599

M3 - Article

AN - SCOPUS:68249147415

VL - 35

SP - 481

EP - 495

JO - Journal of Information Science

JF - Journal of Information Science

SN - 0165-5515

IS - 4

ER -