python – environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

python – environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

By the way, if you use PyCharm, you could add PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to run/debug configurations per image below
enter

You should set the following environment variables in $SPARK_HOME/conf/spark-env.sh:

export PYSPARK_PYTHON=/usr/bin/python
export PYSPARK_DRIVER_PYTHON=/usr/bin/python

If spark-env.sh doesnt exist, you can rename spark-env.sh.template

python – environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

This may happen also if youre working within an environment. In this case, it may be harder to retrieve the correct path to the python executable (and anyway I think its not a good idea to hardcode the path if you want to share it with others).

If you run the following lines at the beginning of your script/notebook (at least before you create the SparkSession/SparkContext) the problem is solved:

import os
import sys

os.environ[PYSPARK_PYTHON] = sys.executable
os.environ[PYSPARK_DRIVER_PYTHON] = sys.executable

Package os allows you to set global variables; package sys gives the string with the absolute path of the executable binary for the Python interpreter.

Leave a Reply

Your email address will not be published. Required fields are marked *