Machine Learning for Intelligent Document Processing: The WISDOM System

Floriana Esposito Donato Malerba Francesca A. Lisi

Dipartimento di Informatica, Università degli Studi di Bari

via Orabona 4, 70125 Bari, Italy

{esposito | malerba | lisi}

Abstract. WISDOM is a intelligent document processing system that transforms printed information into a symbolic representation. Its distinguishing feature is the use of a rule base which is automatically built from a set of training documents using two inductive learning techniques: Decision tree learning for the blocks classification, and first-order rule induction for the document classification and understanding. In the paper, advances made with respect to previous studies on this application domain are illustrated and a complete set of experimental results is reported.