pdf2table: A Method to Extract Table Information from PDF files

	Conference Paper
Author	Burcu Yildiz Katharina Kaiser Silvia Miksch
Abstract	Tables are a common structuring element in many documents, such as PDF files. To reuse such tables, appropriate methods need to be develop, which capture the structure and the content information. We have developed several heuristics which together recognize and decompose tables in PDF files and store the extracted data in a structured data format (XML) for easier reuse. Additionally, we implemented a prototype, which gives the user the ability of making adjustments on the extracted data. Our work shows that purely heuristic-based approaches can achieve good results, especially for lucid tables.
Year of Publication	2005
Conference Name	2nd Indian International Conference on Artificial Intelligence
URL	http://ieg.ifs.tuwien.ac.at/pub/yildiz_iicai_2005.pdf
reposiTUm Handle	20.500.12708/51218
Download citation	Google Scholar BibTeX