Course description

This course introduces students to the tools, techniques, and opportunities for performing text analytics using a variety of tools. We examine options such as Natural Language Toolkit (NLTK), Babelfy, Scikit-learn, and the WordNet dictionary along with fully featured applications such as IBMs Watson Explorer analytics platform. If time permits, Stanford Universitys DeepDive tool may be explored as well. Course work involves using the selected tools to analyze groups of texts for insights such as sentiment, such as how a consumer or client feels about a product or experience; metadataif we can reliably identify phone numbers, credit card numbers, model numbers, or other specific elements, and named entity recognitionsearching for personal names, locations, and other specific entities. Significant time is spent early in the course discussing basic linguistic concepts such as the various -nym forms (such as meronyms, mesonyms, troponyms, and synonyms), stemming, lemmatization, parts of speech, word sense disambiguation, and other areas relevant to search systems and text analysis. A solid understanding of the Python language is required; no remedial instruction in language fundamentals is offered. Students are expected to have the necessary skills to deliver assignments via the course Linux server; submissions cannot be made using Jupyter notebooks or other means.


Associated Schools

  • Harvard Extension School

  • Harvard Division of Continuing Education

Enroll now.
Take course