Course description

Growth in the availability of text data and computational power, together with the development of powerful new techniques, has led to a revolution in the field of natural language processing (NLP) in the past few years. This course provides an application-focused overview of the field from tokenization to state of the art models such as recurrent neural networks and transformers. The first half of the course provides theoretical background and introduces Python code for implementation. The second half focuses on how to scope and execute NLP projects for supervised and unsupervised use cases. Student progress is assessed by three coding assignments, a midterm, and a final project using real world data and pre-trained models.


  • Associate Director of Data Sciences, Massachusetts Institute of Technology Sloan School of Management

Associated Schools

  • Harvard Division of Continuing Education

  • Harvard Extension School

Enroll now.
Take course