Chapter 5 Visualization

5.1 Introduction

Data Visualization allows for the effective translation of data and processes into business applicable decisions that can explain key metrics. Ploting data prior to analysis can give key insights on variables and distributions.

5.2 Exploratory Visualization

Exploratory visualization involoves learning descriptive details prior to modeling efforts. Preemptive results from visualizing distributions can lead to more informed approachs in variable transformation, error distribution selection, parameter tunning.

5.2.1 ggplot2

ggplot2: An implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes.

5.2.1.1 Examples:

library(ggplot2)
library(insuranceData)
data("AutoClaims")
head(AutoClaims)
##      STATE CLASS GENDER AGE    PAID
## 1 STATE 14   C6       M  97 1134.44
## 2 STATE 15   C6       M  96 3761.24
## 3 STATE 15   C11      M  95 7842.31
## 4 STATE 15   F6       F  95 2384.67
## 5 STATE 15   F6       M  95  650.00
## 6 STATE 15   F6       M  95  391.12
g <- ggplot(AutoClaims, aes(x = AGE)) +
            geom_bar() +
            facet_grid(. ~ GENDER)
g


library(ggplot2)
library(insuranceData)


g2 <- ggplot(AutoClaims, aes(x = AGE, y = PAID, color = GENDER)) +
            geom_point() +
            geom_text(aes(label = STATE)) +
            theme_classic()
g2

5.3 Interactive Visualization

5.3.1 plotly

plotly: Easily translate ‘ggplot2’ graphs to an interactive web-based version and/or create custom web-based visualizations directly from R. Once uploaded to a ‘plotly’ account, ‘plotly’ graphs (and the data behind them) can be viewed and modified in a web browser.

5.3.1.1 Examples

suppressPackageStartupMessages(library(plotly))
library(insuranceData)
data("AutoCollision")
head(AutoCollision)
##   Age Vehicle_Use Severity Claim_Count
## 1   A    Pleasure   250.48          21
## 2   A  DriveShort   274.78          40
## 3   A   DriveLong   244.52          23
## 4   A    Business   797.80           5
## 5   B    Pleasure   213.71          63
## 6   B  DriveShort   298.60         171
plot_ly(AutoCollision, x = Severity, y = Claim_Count, mode = "markers", 
        color = Severity, size = Severity)

5.4 Other visualization packages