An automatic sameAs link discovery from Wikipedia

Kosuke Kagawa, Susumu Tamagawa, Takahira Yamaguchi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages399-413
Number of pages15
Volume8388 LNCS
ISBN (Print)9783319068251
DOIs
Publication statusPublished - 2014
Event3rd Joint International Semantic Technology Conference, JIST 2013 - Seoul, Korea, Republic of
Duration: 2013 Nov 282013 Nov 30

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8388 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd Joint International Semantic Technology Conference, JIST 2013
CountryKorea, Republic of
CitySeoul
Period13/11/2813/11/30

Fingerprint

Wikipedia
Data Preprocessing
Data integration
Data Integration
Costs
Anchors
Data mining
Data Mining
Strings
Processing
Ambiguity
Similarity

Keywords

  • Disambiguation
  • Ontology
  • SameAs link
  • Spelling variants
  • Synonym
  • Wikipedia

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Kagawa, K., Tamagawa, S., & Yamaguchi, T. (2014). An automatic sameAs link discovery from Wikipedia. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8388 LNCS, pp. 399-413). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8388 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-06826-8_29

An automatic sameAs link discovery from Wikipedia. / Kagawa, Kosuke; Tamagawa, Susumu; Yamaguchi, Takahira.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8388 LNCS Springer Verlag, 2014. p. 399-413 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8388 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kagawa, K, Tamagawa, S & Yamaguchi, T 2014, An automatic sameAs link discovery from Wikipedia. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8388 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8388 LNCS, Springer Verlag, pp. 399-413, 3rd Joint International Semantic Technology Conference, JIST 2013, Seoul, Korea, Republic of, 13/11/28. https://doi.org/10.1007/978-3-319-06826-8_29
Kagawa K, Tamagawa S, Yamaguchi T. An automatic sameAs link discovery from Wikipedia. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8388 LNCS. Springer Verlag. 2014. p. 399-413. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-06826-8_29
Kagawa, Kosuke ; Tamagawa, Susumu ; Yamaguchi, Takahira. / An automatic sameAs link discovery from Wikipedia. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8388 LNCS Springer Verlag, 2014. pp. 399-413 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{fa78b2471fa9430386441e11e0257c00,
title = "An automatic sameAs link discovery from Wikipedia",
abstract = "Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover {"}sameAs{"} and {"}meaningOf{"} links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70{\%} precision/ recall rate.",
keywords = "Disambiguation, Ontology, SameAs link, Spelling variants, Synonym, Wikipedia",
author = "Kosuke Kagawa and Susumu Tamagawa and Takahira Yamaguchi",
year = "2014",
doi = "10.1007/978-3-319-06826-8_29",
language = "English",
isbn = "9783319068251",
volume = "8388 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "399--413",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - An automatic sameAs link discovery from Wikipedia

AU - Kagawa, Kosuke

AU - Tamagawa, Susumu

AU - Yamaguchi, Takahira

PY - 2014

Y1 - 2014

N2 - Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate.

AB - Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate.

KW - Disambiguation

KW - Ontology

KW - SameAs link

KW - Spelling variants

KW - Synonym

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=84902586651&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902586651&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-06826-8_29

DO - 10.1007/978-3-319-06826-8_29

M3 - Conference contribution

AN - SCOPUS:84902586651

SN - 9783319068251

VL - 8388 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 399

EP - 413

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -