Prediction of performance of cross-language information retrieval using automatic evaluation of translation

研究成果: Article

12 引用 (Scopus)

抄録

This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.

元の言語English
ページ(範囲)138-144
ページ数7
ジャーナルLibrary and Information Science Research
30
発行部数2
DOI
出版物ステータスPublished - 2008 6

Fingerprint

Query languages
information retrieval
Information retrieval
Information retrieval systems
language
evaluation
performance
Gaging
Regression analysis
regression
Experiments
regression analysis
experiment

ASJC Scopus subject areas

  • Library and Information Sciences

これを引用

@article{bea838cdb2f84f1083155913fe7040f4,
title = "Prediction of performance of cross-language information retrieval using automatic evaluation of translation",
abstract = "This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60{\%} of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.",
author = "Kazuaki Kishida",
year = "2008",
month = "6",
doi = "10.1016/j.lisr.2007.09.003",
language = "English",
volume = "30",
pages = "138--144",
journal = "Library and Information Science Research",
issn = "0740-8188",
publisher = "Elsevier BV",
number = "2",

}

TY - JOUR

T1 - Prediction of performance of cross-language information retrieval using automatic evaluation of translation

AU - Kishida, Kazuaki

PY - 2008/6

Y1 - 2008/6

N2 - This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.

AB - This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.

UR - http://www.scopus.com/inward/record.url?scp=44949231122&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44949231122&partnerID=8YFLogxK

U2 - 10.1016/j.lisr.2007.09.003

DO - 10.1016/j.lisr.2007.09.003

M3 - Article

AN - SCOPUS:44949231122

VL - 30

SP - 138

EP - 144

JO - Library and Information Science Research

JF - Library and Information Science Research

SN - 0740-8188

IS - 2

ER -