Saturday, March 23, 2019

Configure Anaconda, Jupyter Notebook and Spark 2.4 on MacOS

Configuration File

  • Spark configuration file: /Users/donghua/spark-2.4.0-bin-hadoop2.7/sbin/spark-config.sh
# symlink and absolute path should rely on SPARK_HOME to resolve
> if [ -z "${SPARK_HOME}" ]; then
  export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi

export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}/conf"}"
# Add the PySpark classes to the PYTHONPATH:
if [ -z "${PYSPARK_PYTHONPATH_SET}" ]; then
  export PYTHONPATH="${SPARK_HOME}/python:${PYTHONPATH}"
  export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.7-src.zip:${PYTHONPATH}"
  export PYSPARK_PYTHONPATH_SET=1
fi


# added by Anaconda3 5.0.1 installer
export PATH="/Users/donghua/anaconda3/bin:$PATH"

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"
  • Anaconda Jupyter Configuration File: /Users/donghua/anaconda3/share/jupyter/kernels/pyspark2/kernel.json
    {
      "argv": [
        "python3.6",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
      ],
      "display_name": "Python3.6+ Pyspark(Spark 2.4.0)",
      "language": "python",
      "env": {
        "PYSPARK_PYTHON": "python",
        "SPARK_HOME": "/Users/donghua/spark-2.4.0-bin-hadoop2.7",
        "SPARK_CONF_DIR": "/Users/donghua/spark-2.4.0-bin-hadoop2.7/conf",
        "PYTHONPATH": "/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip:/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/:",
        "PYTHONSTARTUP": "/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/pyspark/shell.py",
        "PYSPARK_SUBMIT_ARGS": "--master spark://Donghuas-MacBook-Air.local:7077 --name PySparkShell pyspark-shell"
      }
  } 

Stop/start Spark local cluster

cd /Users/donghua/spark-2.4.0-bin-hadoop2.7;./sbin/stop-all.sh
cd /Users/donghua/spark-2.4.0-bin-hadoop2.7;./sbin/start-all.sh

Commands:

  • Juypter:
cd /Users/donghua/spark-2.4.0-bin-hadoop2.7;jupyter-notebook --ip=Donghuas-MacBook-Air.local --port 9999
  • Pyspark:
cd /Users/donghua/spark-2.4.0-bin-hadoop2.7;/Users/donghua/spark-2.4.0-bin-hadoop2.7/bin/pyspark --master spark://Donghuas-MacBook-Air.local:7077
  • Spark-submit:
/Users/donghua/spark-2.4.0-bin-hadoop2.7/bin/spark-submit --master spark://Donghuas-MacBook-Air.local:7077 NameList.py file:///Users/donghua/spark-2.4.0-bin-hadoop2.7/data/data/people.json file:///tmp/nameList

URLs

No comments:

Post a Comment