The goal of this course is to gain knowledge how to use open source Knime Analytics Platform for data analysis and machine learning predictive models on real data sets.
The course was done in KNIME analytics platform version 3.x (there can be minor diferencies in few nodes in comparison with 4.x version)
The course has two main sections:
1. PRE-PROCESSING DATA: MODELING AND VISUALISING DATA FRAMES IN GENERAL
In this part we will cover the operations how to model, transform and prepare data frames and visualise them, mainly:
- table transformation (merging data, table information, transpose, group by, pivoting etc.)
- row operations (eg. filter)
- column operations (filtering, spliting, adding, date information, missing values, adding binners, change data types, do basic math operations etc.)
- data visualisation (column chart, line plot, pie chart, scatter plot, box plot)
2. MACHINE LEARNING - REGRESSION AND CLASSIFICATION: We will create machine learning models within the standard machine learning process way, which consists from:
- acquiring data by reading nodes into the KNIME software (the data frames are available in this course for download)
- pre-processing and transforming data to get well prepared data frame for the prediction
- visualizing data with KNIME visual nodes (we will create basic plots and charts to have clear picture about our data)
- creating machine learning predictive models and evaluating them:
1. Decision Tree Classification
2. Simple linear Regression
3. Decision Tree Regression
4. Random Forest Regression
5. Random Forest Classification
6. Polynomial Regression (+ info about multi linear Regression - for Knime same nodes)
7. Naive Bayes
8. K nearest neighbors
9. Grandient booster Regression
10. Grandient booster Classification
models 3 - 10 were added in the end of 2019.
I will also explain the Knime Analytics Platform environment, guide you through the installation , and show you where to find help and hints.
- access to computer or laptop with Windows (32bit or 64 bit), Linux (64bit) or Mac (64bit) and with permission to download softwares (if not, ask your administrator to download it for you – it is common at company´s computers)
- no prior knowledge required
- basic data analyzing experience in different programs, like MS Excel or SQL or Python etc. is added advantage
