I recently gave a presentation to Dr. Haixuan Xu’s group meeting at JIAM about how machine learning can be used with material science data. This talk targeted an audience that mostly has not seen what Python has to offer to data science. I focused on some of the essential tools but was clear that I was not the most knowledgeable on neural networks.
The entire repository is available my github repository costrouc/mse-machinelearning-notebooks. Additionally I have begun to start using Binder to allow anyone to interact with the code examples via a notebook only requiring a web browser. See this link to get started with the material. I hope that binder is used for many more science talks. I found it much more powerful than a simple presentation.
In this workshop/course I covered the general steps to machine learning.
- gather the data (web scraping, experiments, HPC simulations)
- explore the data. This includes visualizing data with matplotlib and data transformation and analysis using pandas
- transforming/sanitizing Data. Some common procedures that are used include: removing rows from dataset with missing values, replacing missing data with mean value, and converting categorical variables to continuous variables
- applying machine learning algorithms using scikit-learn, pytorch, and pymc3.
- validate model
While this talk was broad and tried not to focus in on specific techniques. I tried to show how scikit-learn has a common api that greatly simplified using new techniques.
I really enjoyed giving the talk and I hope that some of this material may be of use to others.