How to Install PySpark and Integrate with IPython Notebook

At Dataquest, we’ve released an interactive course on Spark, with a focus on PySpark. We explore the fundamentals of Map-Reduce and how to utilize PySpark to clean, transform, and munge data. In this post, we’ll dive into how to install PySpark locally on your own computer and how to integrate it into the IPython Notebbok workflow.Some familarity with the command line will be necessary to install PySpark. Overview At a high level, these are the steps to install PySpark and integrate it with IPython notebook: Install the required packages below Download and build Spark Set your enviroment variables Create an IPython profile for PySpark Required packages Java SE Development Kit Scala Build Tool Spark 1.5.1 (at the time of writing) Python 2.6 or higher (we prefer to use Python 3.4+)…


Link to Full Article: How to Install PySpark and Integrate with IPython Notebook