In most information retrieval systems or models, the assumption is normally made that index terms assigned to the documents of a collection occur independently of each other. So as to improve the retrieval effectiveness of systems, there is a need to take dependencies between certain index term pairs into account. As the similarity measure between a query and a document is important in quantitative retrieval, two measures, which reflect directly the relationships between index terms when they are given by pairwise correlations, are proposed in this paper. One of the proposed measures is an extension of the cosine function model. This measure is based on oblique coordinates whose degree of angle between axes corresponds to the pairwise correlation between index terms, in contrast to the conventional cosine function measure based on rectangular coordinates. The other measure is an extension of the extended Boolean model, which was proposed by G. Salton et al. Using these measures, we need no assumption of term independence. Retrieval experiments to evaluate the proposed measures was performed on a test collection of 623 document records and 5 queries, in a weighted mode, in which index terms assigned to the document record were weighted, and in an unweighted mode. The experiment showed following results: 1) it is useful to incorporate term dependencies into the similarity measures; and 2) the proposed measures, however, did not have much better effectiveness than conventional ones.
|Number of pages||15|
|Journal||Library and Information Science|
|Publication status||Published - 1990 Dec 1|
ASJC Scopus subject areas
- Library and Information Sciences