Supervised categorization of JavaScript^T^M using program analysis features [An article from: Information Processing and Management]
Book Details
Author(s)W. Lu, M.Y. Kan
PublisherElsevier
ISBN / ASINB000PBZV76
ISBN-13978B000PBZV71
AvailabilityAvailable for download now
MarketplaceUnited States 🇺🇸
Description
This digital document is a journal article from Information Processing and Management, published by Elsevier in 2007. The article is delivered in HTML format and is available in your Amazon.com Media Library immediately after purchase. You can view it with any web browser.
Description:
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.
Description:
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.
