A term dependence model in information retrieval

Research output: Contribution to journalArticle

Abstract

In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.

Original languageEnglish
Pages (from-to)105-119
Number of pages15
JournalLibrary and Information Science
Volume1990
Issue number28
Publication statusPublished - 1990
Externally publishedYes

Fingerprint

information retrieval
experiment

ASJC Scopus subject areas

  • Library and Information Sciences

Cite this

A term dependence model in information retrieval. / Taniguchi, Shoichi.

In: Library and Information Science, Vol. 1990, No. 28, 1990, p. 105-119.

Research output: Contribution to journalArticle

@article{c93d3d8dcd3e4e229407f9bebd6b713e,
title = "A term dependence model in information retrieval",
abstract = "In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.",
author = "Shoichi Taniguchi",
year = "1990",
language = "English",
volume = "1990",
pages = "105--119",
journal = "Library and Information Science",
issn = "0373-4447",
publisher = "Mita Society for Library and Information Science",
number = "28",

}

TY - JOUR

T1 - A term dependence model in information retrieval

AU - Taniguchi, Shoichi

PY - 1990

Y1 - 1990

N2 - In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.

AB - In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.

UR - http://www.scopus.com/inward/record.url?scp=53349161466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=53349161466&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:53349161466

VL - 1990

SP - 105

EP - 119

JO - Library and Information Science

JF - Library and Information Science

SN - 0373-4447

IS - 28

ER -