python – environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON
python – environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON
By the way, if you use PyCharm, you could add PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
to run/debug configurations per image below
You should set the following environment variables in $SPARK_HOME/conf/spark-env.sh
:
export PYSPARK_PYTHON=/usr/bin/python
export PYSPARK_DRIVER_PYTHON=/usr/bin/python
If spark-env.sh
doesnt exist, you can rename spark-env.sh.template
python – environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON
This may happen also if youre working within an environment. In this case, it may be harder to retrieve the correct path to the python executable (and anyway I think its not a good idea to hardcode the path if you want to share it with others).
If you run the following lines at the beginning of your script/notebook (at least before you create the SparkSession/SparkContext) the problem is solved:
import os
import sys
os.environ[PYSPARK_PYTHON] = sys.executable
os.environ[PYSPARK_DRIVER_PYTHON] = sys.executable
Package os
allows you to set global variables; package sys
gives the string with the absolute path of the executable binary for the Python interpreter.