抄録
To benefit from the invaluable data in the World Wide Web, manual extraction or creation of web scraping programs may be necessary. However, these processes can be tedious and complicated. To address these, we have proposed Ducky, which is aWeb data extraction system including a web wrapper that extracts data from web sources and translates them into structured data based on user-defined data extraction rules. Ducky can extract data flexibly from various structured web pages, remove noise from extracted data and integrate data distributed to multiple pages from different sites. In this paper, we propose a browser GUI for Ducky. Instead of manually writing a configuration file, users can just click or point a cursor (mouse over) to objective elements. The users' actions are then automatically converted to data extraction rules and saved in a configuration file. Thus, we help users to extract the data by allowing intuitive operations and reduce users' burden in write the configuration file.
本文言語 | English |
---|---|
ホスト出版物のタイトル | 17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings |
出版社 | Association for Computing Machinery, Inc |
ISBN(印刷版) | 9781450334914 |
DOI | |
出版ステータス | Published - 2015 12月 11 |
イベント | 17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Brussels, Belgium 継続期間: 2015 12月 11 → 2015 12月 13 |
Other
Other | 17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 |
---|---|
国/地域 | Belgium |
City | Brussels |
Period | 15/12/11 → 15/12/13 |
ASJC Scopus subject areas
- コンピュータ ネットワークおよび通信
- 情報システム
- コンピュータ サイエンスの応用