Run Spark on iPython Notebook (Jupyter). I've posted many posts on the same theme in the past, but I think this is the easiest method.
Easy to install Homebrew apache-spark. Homebrew installation omitted
brew install apache-spark
Python creates a dedicated environment with virtualenv. I named it spark.
mkvirtualenv spark
Install the required modules. numpy comes in at the same time you install pandas. Add scipy as needed.
pip install jupyter pandas matplotlib
Run pyspark with PYSPARK_DRIVER_PYTHON
and PYSPARK_DRIVER_PYTHON_OPTS
.
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS=notebook pyspark --master local[*]
Execution example
Recommended Posts