https://www.ebooknetworking.net/books_detail-B000RR919W.html

Information extraction from research papers using conditional random fields [An article from: Information Processing and Management]

Name: Information extraction from research papers using conditional random fields [An article from: Information Processing and Management]
Author: F. Peng, A. McCallum
ISBN: 978B000RR9197

AuthorF. Peng, A. McCallum

PublisherElsevier

Shop on Amazon — choose your country

🇺🇸 USA 🇨🇦 Canada 🇬🇧 UK 🇩🇪 Germany 🇫🇷 France 🇮🇳 India

7.95 USD

Buy New on Amazon 🇺🇸

Available for download now

Book Details

Author(s)F. Peng, A. McCallum

PublisherElsevier

ISBN / ASINB000RR919W

ISBN-13978B000RR9197

AvailabilityAvailable for download now

Sales Rank10,042,942

MarketplaceUnited States 🇺🇸

Description

This digital document is a journal article from Information Processing and Management, published by Elsevier in 2006. The article is delivered in HTML format and is available in your Amazon.com Media Library immediately after purchase. You can view it with any web browser.

Description:
With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. CRFs provide a principled way for incorporating various local features, external lexicon features and globle layout features. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. We make an empirical exploration of several factors, including variations on Gaussian, Laplace and hyperbolic-L"1 priors for improved regularization, and several classes of features. Based on CRFs, we further present a novel approach for constraint co-reference information extraction; i.e., improving extraction performance given that we know some citations refer to the same publication. On a standard benchmark dataset, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs. On four co-reference IE datasets, our system significantly improves extraction performance, with an error rate reduction of 6-14%.