Course description

This course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered integrates the five key facets of an investigation using data: data collection—data wrangling, cleaning, and sampling to get a suitable data set; data management—accessing data quickly and reliably; exploratory data analysis—generating hypotheses and building intuition; prediction or statistical learning; and communication—summarizing results through visualization, stories, and interpretable summaries. Students who have previously completed CSCI E-107 or CSCI E-109 cannot count CSCI E-109a or CSCI E-109b toward a degree or certificate.


  • Scientific Program Director, Institute for Applied Computational Science, John A. Paulson School of Engineering and Applied Sciences, Harvard University
  • Senior Preceptor in Statistics, Harvard University
Enroll now.
Take course