Robust toponym resolution based on surface statistics

Tomohisa Sano, Shiho Hoshi Nobesawa, Hiroyuki Okamoto, Hiroya Susuki, Masaki Matsubara, Hiroaki Saito

研究成果: Article

1 引用 (Scopus)

抄録

Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.

元の言語English
ページ(範囲)2313-2320
ページ数8
ジャーナルIEICE Transactions on Information and Systems
E92-D
発行部数12
DOI
出版物ステータスPublished - 2009

Fingerprint

Word processing
Salvaging
Glossaries
Earth (planet)
Statistics

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

これを引用

Robust toponym resolution based on surface statistics. / Sano, Tomohisa; Nobesawa, Shiho Hoshi; Okamoto, Hiroyuki; Susuki, Hiroya; Matsubara, Masaki; Saito, Hiroaki.

:: IEICE Transactions on Information and Systems, 巻 E92-D, 番号 12, 2009, p. 2313-2320.

研究成果: Article

Sano, T, Nobesawa, SH, Okamoto, H, Susuki, H, Matsubara, M & Saito, H 2009, 'Robust toponym resolution based on surface statistics', IEICE Transactions on Information and Systems, 巻. E92-D, 番号 12, pp. 2313-2320. https://doi.org/10.1587/transinf.E92.D.2313
Sano, Tomohisa ; Nobesawa, Shiho Hoshi ; Okamoto, Hiroyuki ; Susuki, Hiroya ; Matsubara, Masaki ; Saito, Hiroaki. / Robust toponym resolution based on surface statistics. :: IEICE Transactions on Information and Systems. 2009 ; 巻 E92-D, 番号 12. pp. 2313-2320.
@article{35d57e91b2584393a892da9e026a023d,
title = "Robust toponym resolution based on surface statistics",
abstract = "Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54{\%} precision rate, 91.92{\%} recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.",
keywords = "Area identification, Natural language processing, Statistical information, Toponym resolution",
author = "Tomohisa Sano and Nobesawa, {Shiho Hoshi} and Hiroyuki Okamoto and Hiroya Susuki and Masaki Matsubara and Hiroaki Saito",
year = "2009",
doi = "10.1587/transinf.E92.D.2313",
language = "English",
volume = "E92-D",
pages = "2313--2320",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "12",

}

TY - JOUR

T1 - Robust toponym resolution based on surface statistics

AU - Sano, Tomohisa

AU - Nobesawa, Shiho Hoshi

AU - Okamoto, Hiroyuki

AU - Susuki, Hiroya

AU - Matsubara, Masaki

AU - Saito, Hiroaki

PY - 2009

Y1 - 2009

N2 - Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.

AB - Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas.

KW - Area identification

KW - Natural language processing

KW - Statistical information

KW - Toponym resolution

UR - http://www.scopus.com/inward/record.url?scp=77950248580&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950248580&partnerID=8YFLogxK

U2 - 10.1587/transinf.E92.D.2313

DO - 10.1587/transinf.E92.D.2313

M3 - Article

AN - SCOPUS:77950248580

VL - E92-D

SP - 2313

EP - 2320

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 12

ER -