- Apply distributed systems patterns to build scalable and reliable machine learning projects
- Build ML pipelines with data ingestion, distributed training, model serving, and more
- Automate ML tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
- Make trade-offs between different patterns and approaches
- Manage and monitor machine learning workloads at scale
Inside Distributed Machine Learning Patterns you'll learn to apply established distributed systems patterns to machine learning projects--plus explore cutting-edge new patterns created specifically for machine learning. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Hands-on projects and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology Deploying a machine learning application on a modern distributed system puts the spotlight on reliability, performance, security, and other operational concerns. In this in-depth guide, Yuan Tang, project lead of Argo and Kubeflow, shares patterns, examples, and hard-won insights on taking an ML model from a single device to a distributed cluster. About the book Distributed Machine Learning Patterns provides dozens of techniques for designing and deploying distributed machine learning systems. In it, you'll learn patterns for distributed model training, managing unexpected failures, and dynamic model serving. You'll appreciate the practical examples that accompany each pattern along with a full-scale project that implements distributed model training and inference with autoscaling on Kubernetes. What's inside
- Data ingestion, distributed training, model serving, and more
- Automating Kubernetes and TensorFlow with Kubeflow and Argo Workflows
- Manage and monitor workloads at scale
1 Introduction to distributed machine learning systems
PART 2 PATTERNS OF DISTRIBUTED MACHINE LEARNING SYSTEMS
2 Data ingestion patterns
3 Distributed training patterns
4 Model serving patterns
5 Workflow patterns
6 Operation patterns
PART 3 BUILDING A DISTRIBUTED MACHINE LEARNING WORKFLOW
7 Project overview and system architecture
8 Overview of relevant technologies
9 A complete implementation