What lies beyond the Jupyter notebook? How can we elevate code from concept to production? What happens when scikit-learn isnt enough? Will that last script die as a one-off or perform just as well for the next 10,000 inputs? The last decade has seen an amazing commoditization of cloud computing and scientific development tools that make it a truly glorious time to be a data scientist, yet the increasing ease-of-use can paradoxically hinder the development of more sophisticated tools if the scientist relies too heavily on magic and never opens the hood to explore how things really work. In this course, we explore the next level of fundamentals that make a difference for data science teams in real organizations using complex data. Key topics include formal collaboration techniques, testing, continuous integration and deployment, repeatable and intuitive workflows with directed graphs, recurring themes in practical algorithms, meta-programming and glue, performance optimization, and an emphasis on practical integration with tools in the broader data science ecosystem such as GitHub, Docker, Amazon Web Services, and Hadoop.
Harvard Division of Continuing Education