Spark in Action Book Review & Interview

In Spark in Action book, authors Petar Zecevic and Marko Bonaci discuss Apache Spark framework for data processing (batch as well as streaming data use cases). They introduce the architecture of Spark and core concepts such as Resilient Distributed Datasets (RDDs) and DataFrames. The authors discuss, with code examples, how to process data using Spark libraries like Spark SQL and Spark Streaming as well as apply Machine Learning algorithms using Spark MLlib and ML. They also talk about graph data analytics using the Spark GraphX library. Spark Operations section in the book covers the topic of how to run Spark in a standalone cluster as well as on a distributed cluster using other frameworks like YARN and Mesos. InfoQ spoke with the authors about Apache Spark framework, developer tools, and the…


Link to Full Article: Spark in Action Book Review & Interview