Data Munging of the Titanic

— title: “Data Munging of the Titanic” output: html_document: theme: cerulean — _________________________________________________________________________________________________ ### Assign the dataset “`{r echo=TRUE, message=FALSE, warning=FALSE} train <- read.csv(“../input/train.csv”, stringsAsFactors=FALSE) # Check for the NA data library(Amelia) missmap(train, col=c(“yellow”, “blue”), legend = FALSE, main = “The Train Data”) “` ### Pclass and Sex, the Most Important Factors “`{r echo=TRUE, message=FALSE, warning=FALSE} total <- train total$Pclass <- factor(total$Pclass) levels(total$Pclass) <- c(“FirstClass”, “SecondClass”, “ThirdClass”) total$Survived <- factor(total$Survived) library(ggplot2) ggplot(total, aes(Pclass)) + geom_bar(aes(fill = Survived)) + facet_grid(~Sex) + ggtitle(“Pclass and Sex as the Survival Factors”) “` ### Age: the Different Fates of Juniors and Seniors “`{r echo=TRUE, message=FALSE, warning=FALSE} ggplot(na.omit(total), aes(Age)) + geom_bar(aes(fill = Survived), binwidth = 2) + facet_wrap(~Sex+Pclass, nrow = 2, scales = “free_y”) + ggtitle(“Age as the Survival Factor”) # Add the variable of age…


Link to Full Article: Data Munging of the Titanic