Hands-on Introduction to Data & Machine Learning in Science and Engineering
2020.08.12 Ale Strachan Purdue University
Table of Contents available below.
This video is part of the nanoHUB Series “Materials Science Education Champion Seminar Series” found on nanoHUB.org at https://nanohub.org/resources/33975
This hands-on tutorial will introduce participants to modern tools to manage, organize, and visualize data as well as machine learning techniques to extract information from it. We will discuss introductory activities designed to introduce undergraduate students to data science and advanced topics. Participants will use APIs to query online repositories, organize and process the resulting data, and use it to build predictive models. The activities will include building artificial neural networks and random forests, training them with the data acquired and using these models to make decisions. We will exemplify how active learning can be used to reduce the number of experiments required to arrive at a desired design goal. All simulations will be performed using Jupyter notebooks via nanoHUB and will make use of several online data repositories.
Tools highlighted in this presentation can be found on nanoHUB at:
Nanomaterial Mechanics Explorer – https://nanohub.org/resources/nanomatmech
DFT Material Properties Simulator – https://nanohub.org/tools/dftmatprop
Polymer Modeler – https://nanohub.org/resources/polymod
Citrine Tools for Materials Informatics – https://nanohub.org/tools/citrinetools
Table of contents:
00:00 Data Science for Engineering and Science
01:41 Untitled: Slide 2
03:44 Chaired by Dave McDowell
04:38 Data
05:21 Computation
06:03 Data Science & Machine Learning in Science & Engineering
07:40 nanoHUB: a community-driven resource
09:07 Apps connected to powerful research codes
11:01 Jupyter: end to end scientific workflows
12:51 Impact on education
15:37 Publish for reproducibility and discoverability
16:30 Data Science & Machine Learning in Science & Engineering
26:41 Visualizing data, finding correlations
30:57 Linear Regression – a materials example
31:00 Regression
34:20 Neural Networks 101
34:31 Neural Networks 101: activation functions & training
34:33 Prediction of Young’s moduli
34:36 Overfitting and underfitting
34:37 Data Science & Machine Learning in Science & Engineering
34:42 Data science for design of experiments
36:25 Maximizing Li+ conductivity in solid oxides
37:03 Find the best conductor with the fewest experiments
38:06 Decision trees and random forests
42:27 Random forests
42:29 Sequential Learning tool in nanoHUB
42:30 Finding best conductor with the fewest experiments
43:53 Learn more
44:03 Data
45:03 Thanks – Questions?
source