Apache Spark 1.6 with Dataset API

Data analysis tool in a new version Apache Spark 1.6 with Dataset API January 11, 2016 Michael Thomas #Big Data flying sparks image via Shutterstock After a preview version had been published at the end of November 2015, the final version of Apache Spark 1.6 is at long last ready for download. The update contains a total of over 1,000 changes; release highlights include a variety of performance improvements, the new Dataset API and expanded data science functions. Performance improvements Since Parquet is among the most often used data formats in Spark and scan performance can have considerable influence on large applications, a newer Parquet reader has been implemented in Spark 1.6 which promises performance improvements of up to 50%: in benchmarks, the scan throughput for 5 columns could therefore be increased…


Link to Full Article: Apache Spark 1.6 with Dataset API