Paper
10 July 2009 Information extraction from semi-structured web page based on DOM tree
Wei-Dong Li, Yi-bing Dong
Author Affiliations +
Proceedings Volume 7490, PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering; 749015 (2009) https://doi.org/10.1117/12.837215
Event: International Conference on Photonics and Image in Agriculture Engineering (PIAGENG 2009), 2009, Zhangjiajie, China
Abstract
To extract information automatically from semi-structured web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and Maximal Similar Sub Tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. Furthermore, the system can extract information from result pages of paper searching websites automatically. The experiments made through with some common paper searching websites have demonstrated that this system has high efficiency and accuracy.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wei-Dong Li and Yi-bing Dong "Information extraction from semi-structured web page based on DOM tree", Proc. SPIE 7490, PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering, 749015 (10 July 2009); https://doi.org/10.1117/12.837215
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Data modeling

Computer programming

Computing systems

Data acquisition

Detection and tracking algorithms

Information fusion

Back to Top