Курс Data Science: Wrangling

Payment
Обучение бесплатное
Certificate
Сертификация платная
Duration
5 месяцев
О курсе

In this course, part of our Professional Certificate Program in Data Science,we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling.

This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.

Программа
Data Science: Wrangling
Learn to process and convert raw data into formats needed for analysis.
Что Вы изучите?
  • Importing data into R fromdifferent file formats
  • Web scraping
  • How to tidy data using the tidyverse tobetter facilitateanalysis
  • String processing with regular expressions (regex)
  • Wrangling data using dplyr
  • How to workwith dates and times as file formats
  • Text mining
Лекторы
Rafael Irizarry
Rafael Irizarry
Professor of Biostatistics Harvard University
Rafael Irizarry is a Professor of Biostatistics at the Harvard T.H. Chan School of Public Health and a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute. For the past 15 years, Dr. Irizarry’s research has focused on the analysis of genomics data. During this time, he has also has taught several classes, all related to applied statistics. Dr. Irizarry is one of the founders of the Bioconductor Project, an open source and open development software project for the analysis of genomic data. His publications related to these topics have been highly cited and his software implementations widely downloaded.
Платформа
EdX
Эта платформа предоставляет все курсы бесплатно. Авторами выступают топовые университеты и корпорации, которые стараются удерживать стандарты качества. За несоблюдение дедлайнов, невыполнение домашнего задания студенты теряют баллы. Как и в других платформах, лекционные видео чередуются с практическими заданиями. Обучение проводится на английском, китайском, испанском, французском и хинди.
Data Science: Wrangling