This text covers all of the data science, machine learning, and deep learning topics relevant to materials science and engineering, accompanied by numerous examples and applications. Almost all methods and algorithms introduced are implemented "from scratch" using Python and NumPy.
The book starts with an introduction to statistics and probabilities, explaining important concepts such as random variables and probability distributions, Bayes' theorem and correlations, sampling techniques, and exploratory data analysis, and puts them in the context of materials science and engineering. Therefore, it serves as a valuable primer for both undergraduate and graduate students, as well as a review for research scientists and practicing engineers.
The second part provides an in-depth introduction of (statistical) machine learning. It begins with outlining fundamental concepts and proceeds to explore a variety of supervised learning techniques for regression and classification, including advanced methods such as kernel regression and support vector machines. The section on unsupervised learning emphasizes principal component analysis, and also covers manifold learning (t-SNE and UMAP) and clustering techniques. Additionally, feature engineering, feature importance, and cross-validation are introduced.
The final part on neural networks and deep learning aims to promote an understanding of these methods and dispel misconceptions that they are a "black box". The complexity gradually increases until fully connected networks can be implemented. Advanced techniques and network architectures, including GANs, are implemented "from scratch" using Python and NumPy, which facilitates a comprehensive understanding of all the details and enables the user to conduct their own experiments in Deep Learning.