Course description

Data mining seeks to find valuable insights and relationships in large complex data sets. Applications of data mining include web search interactions in social networks finding relationships in large internet-of-things (IOT) sensor networks and finding interactions between drugs. This course surveys a range of algorithms used for key applications of data mining. The emphasis is on unsupervised and semi-supervised learning and includes discussion of supervised learning and graph algorithms. Scaling and computational efficiency of data mining algorithms is discussed. The course is comprised of readings and lectures on theory along with hands-on exercises and projects where students apply the theory. For the hands-on component of the course students use a variety of libraries in the Python language. Examples include Scikit-Learn Spark ML NLTK Surprise and GraphX. Students give a short presentation on their projects during the last class meeting(s). Students enrolled for graduate credit are required to submit an independent project that demonstrates mastery of the methods covered in the course as applied to a suitable real-world data set. Students who complete the course are able to apply the theory of common data mining algorithms to a variety of real-world applications with an understanding of the limitations of each; apply exploratory data mining methods to find valuable relationships in large complex data sets; make use of commonly used data mining Python libraries to create basic data mining solutions; understand and apply methods required to scale data mining algorithms; and understand the ethical and privacy issues inherent in some data mining applications.


Associated Schools

  • Harvard Division of Continuing Education

Enroll now.
Take course