Technical term recognition with semi-supervised learning using hierarchical bayesian language models

Ryo Fujii, Akito Sakurai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages327-332
Number of pages6
Volume7337 LNCS
DOIs
Publication statusPublished - 2012
Event17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012 - Groningen, Netherlands
Duration: 2012 Jun 262012 Jun 28

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7337 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012
CountryNetherlands
CityGroningen
Period12/6/2612/6/28

Fingerprint

Semi-supervised Learning
Language Model
Supervised learning
Bayesian Model
Glossaries
Term
Terminology
Costs
Supervised Learning
Heuristics
Model-based
Experiments
Experiment
Dictionary

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Fujii, R., & Sakurai, A. (2012). Technical term recognition with semi-supervised learning using hierarchical bayesian language models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7337 LNCS, pp. 327-332). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7337 LNCS). https://doi.org/10.1007/978-3-642-31178-9_42

Technical term recognition with semi-supervised learning using hierarchical bayesian language models. / Fujii, Ryo; Sakurai, Akito.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7337 LNCS 2012. p. 327-332 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7337 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fujii, R & Sakurai, A 2012, Technical term recognition with semi-supervised learning using hierarchical bayesian language models. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7337 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7337 LNCS, pp. 327-332, 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Groningen, Netherlands, 12/6/26. https://doi.org/10.1007/978-3-642-31178-9_42
Fujii R, Sakurai A. Technical term recognition with semi-supervised learning using hierarchical bayesian language models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7337 LNCS. 2012. p. 327-332. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-31178-9_42
Fujii, Ryo ; Sakurai, Akito. / Technical term recognition with semi-supervised learning using hierarchical bayesian language models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7337 LNCS 2012. pp. 327-332 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c144ae041c9e4d618a6d9dc9ff9bf519,
title = "Technical term recognition with semi-supervised learning using hierarchical bayesian language models",
abstract = "To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65{\%} F-measure which is 8{\%} points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.",
author = "Ryo Fujii and Akito Sakurai",
year = "2012",
doi = "10.1007/978-3-642-31178-9_42",
language = "English",
isbn = "9783642311772",
volume = "7337 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "327--332",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Technical term recognition with semi-supervised learning using hierarchical bayesian language models

AU - Fujii, Ryo

AU - Sakurai, Akito

PY - 2012

Y1 - 2012

N2 - To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.

AB - To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.

UR - http://www.scopus.com/inward/record.url?scp=84863997707&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863997707&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-31178-9_42

DO - 10.1007/978-3-642-31178-9_42

M3 - Conference contribution

SN - 9783642311772

VL - 7337 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 327

EP - 332

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -