Browser GUI for generating web data extraction rules in ducky

Kei Kanaoka, Motomichi Toyama

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

To benefit from the invaluable data in the World Wide Web, manual extraction or creation of web scraping programs may be necessary. However, these processes can be tedious and complicated. To address these, we have proposed Ducky, which is aWeb data extraction system including a web wrapper that extracts data from web sources and translates them into structured data based on user-defined data extraction rules. Ducky can extract data flexibly from various structured web pages, remove noise from extracted data and integrate data distributed to multiple pages from different sites. In this paper, we propose a browser GUI for Ducky. Instead of manually writing a configuration file, users can just click or point a cursor (mouse over) to objective elements. The users' actions are then automatically converted to data extraction rules and saved in a configuration file. Thus, we help users to extract the data by allowing intuitive operations and reduce users' burden in write the configuration file.

本文言語English
ホスト出版物のタイトル17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings
出版社Association for Computing Machinery, Inc
ISBN(印刷版)9781450334914
DOI
出版ステータスPublished - 2015 12 11
イベント17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Brussels, Belgium
継続期間: 2015 12 112015 12 13

Other

Other17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015
国/地域Belgium
CityBrussels
Period15/12/1115/12/13

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • 情報システム
  • コンピュータ サイエンスの応用

フィンガープリント

「Browser GUI for generating web data extraction rules in ducky」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル