Data Science using Scala with Spark on Azure

This topic shows how to use Scala for supervised machine learning tasks with the Spark scalable machine learning library (MLlib) and SparkML packages on an Azure HDInsight Spark cluster. It walks you through the tasks that constitute the Data Science Process: data ingestion and exploration, visualization, feature engineering, modeling, and model consumption. The models built include logistic and linear regression, random forests and gradient boosted trees. We address two common supervised machine learning tasks: Regression problem: Prediction of the tip amount ($) Binary classification: Prediction of tip or no-tip (1/0) for a taxi trip The modeling process requires training and evaluation on a test data sets with relevant accuracy metrics. We also show how to store these models in Azure blob storage (WASB) and how to score and evaluate their…


Link to Full Article: Data Science using Scala with Spark on Azure