Apache Spark 2.0 Released

Apache Spark 2.0 has been released with updated SQL support, structured streaming and better performance.  Apache Spark is an open source data processing engine that has become very popular since its initial release. It improves on Hadoop MapReduce performance, running programs up to 100 times faster in memory and ten times faster on disk, according to Apache. The graph below shows logistic regression in Hadoop and Spark (according to Apache). The new version has improved support for standard SQL, with a new ANSI SQL parser and support for subqueries. The parser supports both ANSI-SQL and Hive QL, while the subquery support covers uncorrelated and correlated scalar subqueries; NOT IN predicate subqueries; IN predicate subqueries; and (NOT) EXISTS predicate subqueries. The support for SQL:2003 means Spark 2.0 can run all the…


Link to Full Article: Apache Spark 2.0 Released