apache spark – importing pyspark in python shell
apache spark – importing pyspark in python shell
Assuming one of the following:
- Spark is downloaded on your system and you have an environment variable
SPARK_HOME
pointing to it - You have ran
pip install pyspark
Here is a simple method (If you dont bother about how it works!!!)
Use findspark
-
Go to your python shell
pip install findspark import findspark findspark.init()
-
import the necessary modules
from pyspark import SparkContext from pyspark import SparkConf
-
Done!!!
If it prints such error:
ImportError: No module named py4j.java_gateway
Please add $SPARK_HOME/python/build to PYTHONPATH:
export SPARK_HOME=/Users/pzhang/apps/spark-1.1.0-bin-hadoop2.4
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
apache spark – importing pyspark in python shell
Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths. Check out $SPARK_HOME/bin/pyspark
:
export SPARK_HOME=/some/path/to/apache-spark
# Add the PySpark classes to the Python path:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
I added this line to my .bashrc file and the modules are now correctly found!