The synonym processing mechanism in web index

Shiori Ikuta, Motomichi Toyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web Index (WIX) is a hyperlink generation system that achieves joining information resources on the web. The WIX system takes a set of pairs of keywords and URLs written in XML (called WIX Files) and join it to the text content of a web page in order to transform keywords into hyperlinks to a specific web page group of the user's choice. In the previous WIX system, synonymous expressions of the same entity had to be directly added to the WIX File in order to generate hyperlinks for such expressions. In this paper, we propose a synonym processing mechanism, which generates the hyperlink for synonyms without adding them to the WIX File directly. We collected synonymous relations from the redirection function of the Japanese version of Wikipedia and constructed a synonym database for our system. We incorporated the information of the Synonym database into the automaton based on the Aho-Corasick algorithm used for the lexicographic matching process of the WIX system, and achieved to generate hyperlinks on synonymous expressions without barely changing the size of our WIX File database.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
EditorsBipin C. Desai, Motomichi Toyama
PublisherAssociation for Computing Machinery
Pages158-161
Number of pages4
EditionCONFCODENUMBER
ISBN (Electronic)9781450334143
DOIs
Publication statusPublished - 2015 Jul 13
Event19th International Database Engineering and Applications Symposium, IDEAS 2015 - Yokohama, Japan
Duration: 2015 Jul 132015 Jul 15

Publication series

NameACM International Conference Proceeding Series
NumberCONFCODENUMBER
Volume0

Other

Other19th International Database Engineering and Applications Symposium, IDEAS 2015
CountryJapan
CityYokohama
Period15/7/1315/7/15

Keywords

  • Database
  • Hyperlink
  • Synonym
  • Web
  • Wikipedia

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'The synonym processing mechanism in web index'. Together they form a unique fingerprint.

Cite this