Get a hands-on introduction to building and using decision trees and random forests. Tree-based machine learning algorithms are used to categorize data based on known outcomes in order to facilitate predicting outcomes in new situations. You will learn not only how to use decision trees and random forests for classification and regression, and some of their respective limitations, but also how the algorithms that build them work. Each chapter introduces a new data concern and then walks you through modifying the code, thus building the engine just-in-time. Along the way you will gain experience making decision trees and random forests work for you. This book uses Python, an easy to read programming language, as a medium for teaching you how these algorithms work, but it isn't about teaching you Python, or about using pre-built machine learning libraries specific to Python. It is about teaching you how some of the algorithms inside those kinds of libraries work and why we might use them, and gives you hands-on experience that you can take back to your favorite programming environment.
Table of Contents:
- A brief introduction to decision trees
- Chapter 1: Branching - uses a greedy algorithm to build a decision tree from data that can be partitioned on a single attribute.
- Chapter 2: Multiple Branches - examines several ways to partition data in order to generate multi-level decision trees.
- Chapter 3: Continuous Attributes - adds the ability to partition numeric attributes using greater-than.
- Chapter 4: Pruning - explore ways of reducing the amount of error encoded in the tree.
- Chapter 5: Random Forests - introduces ensemble learning and feature engineering.
- Chapter 6: Regression Trees - investigates numeric predictions, like age, price, and miles per gallon.
- Chapter 7: Boosting - adjusts the voting power of the randomly selected decision trees in the random forest in order to improve its ability to predict outcomes.