Data mining techniques are used to power intelligent software, both on and off the Internet. Data Mining: Practical Machine Learning Tools explains the magic behind information extraction in a book that succeeds at bringing the latest in computer science research to any IS manager or developer. In addition, this book provides an opportunity for the authors to showcase their powerful reusable Java class library for building custom data mining software.|
This text is remarkable with its comprehensive review of recent research on machine learning, all told in a very approachable style. (While there is plenty of math in some sections, the authors' explanations are always clear.) The book tours the nature of machine learning and how it can be used to find predictive patterns in data comprehensible to managers and developers alike. And they use sample data (for such topics as weather, contact lens prescriptions, and flowers) to illustrate key concepts.
After setting out to explain the types of machine learning models (like decision trees and classification rules), the book surveys algorithms used to implement them, plus strategies for improving performance and the reliability of results. Later the book turns to the authors' downloadable Weka (rhymes with "Mecca") Java class library, which lets you experiment with data mining hands-on and gets you started with this technology in custom applications. Final sections look at the bright prospects for data mining and machine learning on the Internet (for example, in Web search engines).
Precise but never pedantic, this admirably clear title delivers a real-world perspective on advantages of data mining and machine learning. Besides a programming how-to, it can be read profitably by any manager or developer who wants to see what leading-edge machine learning techniques can do for their software. --Richard Dragan
Topics covered: Data mining and machine learning basics, sample datasets and applications for data mining, machine learning vs. statistics, the ethics of data mining, generalization, concepts, attributes, missing values, decision tables and trees, classification rules, association rules, exceptions, numeric prediction, clustering, algorithms and implementations in Java, inferring rules, statistical modeling, covering algorithms, linear models, support vector machines, instance-based learning, credibility, cross-validation, probability, costs (lift charts and ROC curves), selecting attributes, data cleansing, combining multiple models (bagging, boosting, and stacking), Weka (reusable Java classes for machine learning), customizing Weka, visualizing machine learning, working with massive datasets, text mining, and e-mail and the Internet.