Structured Streaming in Spark

July 28th, 2016 Editor’s note: Andrew recently spoke at StampedeCon on this very topic. Find more information, and his slides, here.  Spark 2.0 (just released yesterday) has many new features—one of the most important being structured streaming. Structured streaming allows you to work with streams of data just like any other DataFrame. This has the potential to vastly simplify streaming application development, just like the transition from RDD’s to DataFrame’s did for batch. Code reuse between batch and streaming is also made possible since they use the same interface. Finally, since structured streaming uses the Catalyst SQL optimizer and other DataFrame optimizations like code generation, it has the potential to increase performance substantially. In this post, I’ll look at how to get started with structured streaming, starting with the “word…


Link to Full Article: Structured Streaming in Spark