Apache Spark: what it is, what it does, and why it matters

Apache Spark has entered many conversations about big data. One of the newest (and perhaps most talked about) entrants to the big data landscape is Apache Spark, a tool that many view as a more powerful and accessible alternative to Hadoop. Others recognise that Spark is a powerful complement to Hadoop, with its own set of strengths, quirks and limitations. But what is the reality? Who uses Spark and how is it different from other data processing engines? What is Spark? Spark is an all-purpose data processing engine that can be used in a variety of circumstances. Application developers and data scientists can incorporate Spark into their applications to quickly query, analyse, and transform data at scale. Since the start, Spark was optimised to run in memory, allowing it to process data…


Link to Full Article: Apache Spark: what it is, what it does, and why it matters