Visited 524 times , 6 Visits today
An End to End Applied Machine Learning Recipe in R: Binary Classification using Bagging, Boosting and Neural Networks
Learn by Coding
In this Learn Data Science by Coding – Recipe, YOU will learn:
- How to organise a Predictive Modelling Machine Learning project.
- What are the different steps in Predictive Modelling and Applied Machine Learning.
- How to summarise and present feature variables in Predictive Modelling (Descriptive statistics).
- How to visualise features through histogram, density plot, box plot and scatter matrix.
- How to find correlations among features variables.
- How to visualise target variables.
- How to do data analysis for feature and target variables.
- How to utilise CARET packages in R.
- How to implement Bagging, Boosting, Neural Networks for Binary Classification in R.
- How to tune parameters: manual tuning and automatic tuning in R.
- How to compare Algorithms with Accuracy and Kappa using caret package in R.
- How save a trained model in R.
- How to connect to MySQL database to query prediction dataset.
- How to prepare prediction dataset and load a pre-trained model in R.
- How to make prediction using the trained model and report the result.
What is Machine Learning?
Machine learning is the science of getting computers to act without being explicitly program. It is a subset of AI: Artificial Intelligence. Predictive modelling is a branch of Machine Learning that particularly deals with tabular data to explicitly find patterns and/or insights from the data available.
Types of Machine Learning Problems
There are common classes of problems in Machine Learning. The problems discussed below are standards for most of the ML based predictive modelling problems.
- Classification (or Supervised Learning): Data are labelled meaning that they are assigned to classes, for example spam/non-spam or fraud/non-fraud. The decision being modelled is to assign labels to new unlabelled pieces of data. Classification should be Binary classification and Multi-class classification.
- Regression (or Supervised Learning): Data are labelled with a real value (think of a real number) rather than a label/class. Examples that are easy to understand are time series data like the price of a stock over time, monthly sales volume of a store etc. The decision being modelled is what value to predict for new unpredicted data.
- Clustering (or Unsupervised Learning): Data are not labelled, but can be divided into groups based on similarity and other measures of natural structure in the data.
Steps to setup a Predictive Modelling project
The first and initial step in predictive modelling machine learning is to define and formalise a problem. A data scientist (or machine learning engineer or developer) should investigate and characterise the problem to better understand the objectives and goals of the project i.e. whether it is a ‘classification’ or ‘regression’ or ‘clustering’ problem.
A data scientist should utilise some well-understood descriptive statistics and visualisation techniques to the data available. This descriptive exploratory data analysis would help to better understand the structure of data.
A data scientist should utilise data transformations, missing value treatment etc. in order to better expose the structure of the prediction problem to modelling algorithms.
A data scientist should choose out of bag predictive modelling machine learning algorithms to fit the data available. Data must be split into train and test data to report performance of each algorithm tested.
A data scientist should evaluate the model to report the performance using some well understood evaluation techniques such as confusion matrix for classification, RMSE estimation for regression etc.
A data scientist should use algorithm tuning to further achieve the most out of the better performing algorithm on the data available.
Finalisation and prediction
Finally, the tuned model needs to finalise for making predictions on unseen data and the outcomes of the model need to be presented.
Few snippets from this Data Science Recipe,
Looking for Data Science Recipe?
All Data Science Recipes (Python and R codes) are found in https://setscholars.com/DataScience/productcat/data-science-recipes/
Or Simply search in Google:
a) Data Science Recipe + “name of dataset” e.g. data science recipe + iris dataset
b) Data Science Recipe + “name of dataset” + “Algorithm” e.g. data science recipe + pima diabetes dataset + random forest