TY - JOUR

T1 - A term dependence model in information retrieval

AU - Taniguchi, Shoichi

PY - 1990/12/1

Y1 - 1990/12/1

N2 - In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.

AB - In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.

UR - http://www.scopus.com/inward/record.url?scp=53349161466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=53349161466&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:53349161466

VL - 1990

SP - 105

EP - 119

JO - Library and Information Science

JF - Library and Information Science

SN - 0373-4447

IS - 28

ER -