Just run the following command:
PYSPARK_PYTHON=/Users/username/.pyenv/shims/python PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" ~/somewhere/spark-1.5.2/bin/pyspark --master local[4]
--jupyter notebook and Apache Spark are installed
--Basically, just start it as described in https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell.
--Just set the environment variables for $ {PYSPARK_DRIVER_PYTHON}
and $ {PYSPARK_DRIVER_PYTHON_OPTS}
correctly and start pyspark
.
--This time, add the --master local [4]
option to test on the node at hand.
--Also, to make sure that the master and worker use the same version of python, I specified the python path in $ {PYSPARK_PYTHON}
.
--You can find the path to python
in your environment by looking up which python
.
--The above commands are summarized below.
PYSPARK_PYTHON=/Users/username/.pyenv/shims/python PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" ~/somewhere/spark-1.5.2/bin/pyspark --master local[4]
Recommended Posts