https://www.ebooknetworking.net/books_detail-B000RR9142.html

Exploiting structural information for semi-structured document categorization [An article from: Information Processing and Management]

Name: Exploiting structural information for semi-structured document categorization [An article from: Information Processing and Management]
Author: A. Bratko, B. Filipic
ISBN: 978B000RR9142

AuthorA. Bratko, B. Filipic

PublisherElsevier

Shop on Amazon — choose your country

🇺🇸 USA 🇨🇦 Canada 🇬🇧 UK 🇩🇪 Germany 🇫🇷 France 🇮🇳 India

Buy New on Amazon 🇫🇷

Book Details

Author(s)A. Bratko, B. Filipic

PublisherElsevier

ISBN / ASINB000RR9142

ISBN-13978B000RR9142

MarketplaceFrance 🇫🇷

Description

This digital document is a journal article from Information Processing and Management, published by Elsevier in 2006. The article is delivered in HTML format and is available in your Amazon.com Media Library immediately after purchase. You can view it with any web browser.

Description:
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting of a collection of fields, or arbitrary tree-structured documents that can be adequately modeled with such a flat structure. The approaches range from trivial modifications of text modeling to more elaborate schemes, specifically tailored to structured documents. We combine these methods with three different text classification algorithms and evaluate their performance on four standard datasets containing different types of semi-structured documents. The best results were obtained with stacking, an approach in which predictions based on different structural components are combined by a meta classifier. A further improvement of this method is achieved by including the flat text model in the final prediction.