Course description

One of the broad goals of data science is examining raw data with the purpose of identifying their structure and trends, and deriving conclusions and hypotheses from the latter. In the modern world awash with data, data analytics is more important than ever to fields ranging from biomedical research, space and weather science, finance, business operations, and production, through marketing and social media applications. This course provides an intensive introduction into various statistical learning methods; the R programming language, a very popular and powerful platform for scientific and statistical analysis and visualization, is also introduced and used throughout the course. We discuss the fundamentals of statistical testing and learning, and cover topics of linear and non-linear regression, regularization, unsupervised methods (principle component analysis [PCA] and clustering), and supervised classification, including support vector machines, random forests, and neural nets, using datasets drawn from diverse domains. This course is geared less toward theory (although some is presented, mostly qualitatively), and more toward developing intuition and the right way of thinking about statistical problems, as well as building practical skills through multiple, incremental assignments and extensive experimentation.

Instructors

You may also like