The modern data analysis pipeline involves collection, preprocessing, storage, analysis, and interactive visualization of data.
The goal of this course, part of the Analytics: Essential Tools and Methods MicroMasters program, is for you to learn how to build these components and connect them using modern tools and techniques.
In the course, you’ll see how computing and mathematics come together. For instance, “under the hood” of modern data analysis lies numerical linear algebra, numerical optimization, and elementary data processing algorithms and data structures. Together, they form the foundations of numerical and data-intensive computing.
The hands-on component of this course will develop your proficiency with modern analytical tools. You will learn how to mash up Python, R, and SQL through Jupyter notebooks, among other tools. Furthermore, you will apply these tools to a variety of real-world datasets, thereby strengthening your ability to translate principles into practice.
- The Big Data Capstone project will give you the chance to demonstrate practically what you have learned in the Big Data MicroMasters program including:How to evaluate, select and apply data science techniques, principles and theory;
- How to plan and execute a project;
- Work autonomously using your own initiative;
- Identify social and ethical concerns around your project;
- Develop communication skills using online collaborative technologies.
Richard (Rich) Vuduc is a Professor at the Georgia Institute of Technology (“Georgia Tech”), in the School of Computational Science and Engineering, a department devoted to the study of computer-based modeling and simulation of natural and engineered systems. His research lab, The HPC Garage (@hpcgarage), is interested in high-performance computing, with an emphasis on algorithms, performance analysis, and performance engineering. Rich is a recipient of a DARPA Computer Science Study Groupgrant; an NSF CAREER award; a collaborative Gordon Bell Prize in 2010; Lockheed-Martin Aeronautics Company Dean’s Award for Teaching Excellence (2013); and Best Paper Awards at the SIAM Conference on Data Mining (SDM, 2012) and the IEEE Parallel and Distributed Processing Symposium (IPDPS, 2015), among others. He has also served as his department’s Associate Chair and Director of its graduate programs. External to Georgia Tech, he currently serves as Chair of the SIAM Activity Group on Supercomputing (2018-2020); co-chaired the Technical Papers Program of the “Supercomputing” (SC) Conference in 2016; and serves as an associate editor of both the International Journal of High-Performance Computing Applications and IEEE Transactions on Parallel and Distributed Systems. Rich received his Ph.D. in Computer Science from the University of California, Berkeley, and was a postdoctoral scholar in the Center for Advanced Scientific Computing the Lawrence Livermore National Laboratory.