Harvesting Relational Tables from Lists on the Web

Elmeleegy, Hazem; Madhavan, Jayant; Halevy, Alon

A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multi-column tables, which can then be used in more semantically meaningful tasks. However, harvesting relational tables from such lists can be a challenging task. The lists are manually generated and hence need not have well defined templates – they have inconsistent delimiters (if any) and often have missing information.

