## Predictive Analytics with R Training

## Introduction

Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Predictive analytics does not tell you what will happen in the future. It forecasts what might happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk assessment.

## Prerequisites

Basic understanding of R (matrices, data frames, functions, etc) is needed. Some basic understanding of regression techniques is helpful.

## What You Will Learn:

This course will provide an overview of using R for supervised learning (aka machine learning, pattern recognition, predictive analytics, etc). The session will step through the process of building, visualizing, testing and comparing models that are focused on prediction. The goal of the course is to provide a thorough workflow in R that can be used with many different modeling techniques. A case study is used to illustrate functionality.

## Course Contents

**1. Setting GNU R for Predictive Analytics **

- Installing GNU R
- The R graphic user interface
- The menu bar of the R console
- A quick look at the File menu
- A quick look at the Misc menu

- Packages
- Installing packages in R
- Loading packages in R

**2. Visualizing and Manipulating Data Using R**

- The roulette case
- Histograms and bar plots
- Scatterplots
- Boxplots
- Line plots
- Application – Outlier detection
- Formatting plots

**3 Data Visualization with Lattice**

- Loading and discovering the lattice package
- Discovering multipanel conditioning with xyplot()
- Discovering other lattice plots
- Histograms
- Stacked bars
- Dotplots
- Displaying data points as text

- Updating graphics
- Case study – exploring cancer-related deaths in the US
- Discovering the dataset
- Integrating supplementary external data

**4. Cluster Analysis**

- Distance measures
- Learning by doing – partition clustering with kmeans()
- Setting the centroids
- Computing distances to centroids
- Computing the closest cluster for each case
- Tasks performed by the main function
- Internal validation

- Using k-means with public datasets
- Understanding the data with the all.us.city.crime.1970 dataset
- Finding the best number of clusters in the life.expectancy.1971 dataset
- External validation

**5. Agglomerative Clustering Using hclust()**

- The inner working of agglomerative clustering
- Agglomerative clustering with hclust()
- Exploring the results of votes in Switzerland
- The use of hierarchical clustering on binary attributes

**6. Dimensionality Reduction with Principal Component Analysis**

- The inner working of Principal Component Analysis
- Learning PCA in R
- Dealing with missing values
- Selecting how many components are relevant
- Naming the components using the loadings
- PCA scores
- Accessing the PCA scores

- PCA scores for analysis
- PCA diagnostics

**7. Exploring Association Rules with Apriori**

- Apriori – basic concepts
- Association rules
- Itemsets
- Support
- Confidence
- Lift

- The inner working of apriori
- Generating itemsets with support-based pruning
- Generating rules by using confidence-based pruning

- Analyzing data with apriori in R
- Using apriori for basic analysis
- Detailed analysis with apriori
- Preparing the data
- Analyzing the data
- Coercing association rules to a data frame
- Visualizing association rules

**8. Probability Distributions, Covariance, and Correlation**

- Probability distributions
- Introducing probability distributions
- Discrete uniform distribution

- The normal distribution
- The Student’s t-distribution
- The binomial distribution
- The importance of distributions

- Introducing probability distributions
- Covariance and correlation
- Covariance
- Correlation
- Pearson’s correlation
- Spearman’s correlation

**9. Linear Regression**

- Understanding simple regression
- Computing the intercept and slope coefficient
- Obtaining the residuals
- Computing the significance of the coefficient

- Working with multiple regression
- Analyzing data in R: correlation and regression
- First steps in the data analysis
- Performing the regression
- Checking for the normality of residuals
- Checking for variance inflation
- Examining potential mediations and comparing models
- Predicting new data

- Robust regression
- Bootstrapping

**10. Multilevel Analyses**

- Nested data
- Multilevel regression
- Random intercepts and fixed slopes
- Random intercepts and random slopes

- Multilevel modeling in R
- The null model
- Random intercepts and fixed slopes
- Random intercepts and random slopes

- Predictions using multilevel models
- Using the predict() function
- Assessing prediction quality

**10. Text Analytics with R**

- An introduction to text analytics
- Loading the corpus
- Data preparation
- Preprocessing and inspecting the corpus
- Computing new attributes

- Creating the training and testing data frames
- Classification of the reviews
- Document classification with k-NN
- Document classification with Naïve Bayes
- Classification using logistic regression
- Document classification with support vector machines

- Mining the news with R
- A successful document classification
- Extracting the topics of the articles
- Collecting news articles in R from the New York Times article search API

## No comments yet.