unsolved
In the Dockerfile
From jupyter/jupyter/pyspark-notebook:~~~~~~~~~
Loading Pyspark as.
Python 3.7.6
pyspark 2.4.5
from pyspark.sql import SparkSession
/usr/local/spark/python/pyspark/__init__.py in <module>
49
50 from pyspark.conf import SparkConf
---> 51 from pyspark.context import SparkContext
52 from pyspark.rdd import RDD, RDDBarrier
53 from pyspark.files import SparkFiles
/usr/local/spark/python/pyspark/context.py in <module>
27 from tempfile import NamedTemporaryFile
28
---> 29 from py4j.protocol import Py4JError
30
31 from pyspark import accumulators
ModuleNotFoundError: No module named 'py4j'
This time on the jupyter notebook as it will cause problems
!pip install py4j
Corresponded with The following error will appear, so I would like to add it as soon as I understand how to deal with it without an error.
ERROR: pyspark 2.4.5 has requirement py4j==0.10.7, but you'll have py4j 0.10.9.1 which is incompatible.
Recommended Posts