A scalable data science platform with Microsoft R Server and Spark

If you want to train a statistical model on very large amounts of data, you’ll need three things: a storage platform capable of holding all of the training data, a computational platform capable of efficently performing the heavy-duty mathematical computations required, and a statistical computing language with algorithms that can take advantage of the storage and computation power. Microsoft R Server, running on HDInsight with Apache Spark provides all three. As Mario Inchiosa and Roni Burd demonstrate in this recorded webinar, Microsoft R Server can now run within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR (pdf) take advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. And if your data grows or you just need more…


Link to Full Article: A scalable data science platform with Microsoft R Server and Spark