Advancing Information Access (Synthesis Lectures on Information Concepts, Retrieval, and S)
Book Details
Author(s)Eugene Agichtein
PublisherMorgan & Claypool
ISBN / ASIN1608454851
ISBN-139781608454853
MarketplaceFrance 🇫🇷
Description
Proliferation of the Internet and ubiquitous access to the Web enable millions of Web users to collaborate online on a variety of activities. Many of these activities result in the construction of large repositories of knowledge, either as their primary aim (e.g., Wikipedia) or as a by-product (e.g., Yahoo! Answers). The unprecedented amounts of information in collaboratively generated content (CGC) enable new, knowledge-rich approaches to information access, which are significantly more powerful than the conventional word-based methods. Considerable progress has been made in this direction over the last few years. This lecture reviews influential examples of this line of research, including explicit manipulation of human-defined concepts and their use to augment the bag of words, using large-scale taxonomies of topics from Wikipedia or the Open Directory Project to construct additional class-based features, andidentifying newly available word senses and examples of their usage for better word disambiguation. However, the quality and comprehensiveness of collaboratively created content varies drastically, and a significant amount of preprocessing, filtering, and organization is often necessary. Thus, not only the content repositories can be used to improve IR methods, but the reverse pollination is also possible, as better information extraction methods can be used for automatically collecting more knowledge, or verifying the contributed content. This natural connection between modeling the generation process of CGC and effectively using the accumulated knowledge is further explored in this lecture.
