Bin Yu and Rebecca Barter employ the innovative Predictability, Computability, and Stability (PCS) framework to assess the trustworthiness and relevance of data-driven results relative to three sources of uncertainty that arise throughout the data science life cycle: the human decisions and judgment calls made during data collection, cleaning, and modeling. By providing real-world data case studies, intuitive explanations of common statistical and machine learning techniques, and supplementary R and Python code, Veridical Data Science offers a clear and actionable guide for conducting responsible data science. Requiring little background knowledge, this lucid, self-contained textbook provides a solid foundation and principled framework for future study of advanced methods in machine learning, statistics, and data science.
- Presents the Predictability, Computability, and Stability (PCS) methodology for producing trustworthy data-driven results
- Teaches how a data science project should be conducted from beginning to end, including extensive discussion of the data scientist's decision-making process
- Cultivates critical thinking throughout the entire data science life cycle
- Provides practical examples and illuminating case studies of real-world data analysis problems with associated code, exercises, and solutions
- Suitable for advanced undergraduate and graduate students, domain scientists, and practitioners