Friday, March 15, 2019

Workable jupyter notebook and spark2 configuration in MacOS

File: /Users/donghua/anaconda3/share/jupyter/kernels/pyspark2/kernel.json
    {
      "argv": [
        "python3.6",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
      ],
      "display_name": "Python3.6 + Pyspark(Spark 2.4.0)",
      "language": "python",
      "env": {
        "PYSPARK_PYTHON": "python",
        "SPARK_HOME": "/Users/donghua/spark-2.4.0-bin-hadoop2.7",
        "SPARK_CONF_DIR": "/Users/donghua/spark-2.4.0-bin-hadoop2.7/conf",
        "PYTHONPATH": "/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip:/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/:",
        "PYTHONSTARTUP": "/Users/donghua/spark-2.4.0-bin-hadoop2.7/python/pyspark/shell.py",
        "PYSPARK_SUBMIT_ARGS": "--master spark://Donghuas-MacBook-Air.local:7077 --name PySparkShell pyspark-shell"
      }
  }


File: /Users/donghua/spark-2.4.0-bin-hadoop2.7/sbin/spark-config.sh

export PATH="/Users/donghua/anaconda3/bin:$PATH"

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"


Enable Spark cluster to connect to HDFS/Hive in non-secure CDH cluster

Copy following 4 files into $SPARK_HOME/conf folder.
- core-site.xml
- hadoop-env,sh
- hive-site.xml
- hive-env.sh

-rwxr-xr-x@ 1 donghua  staff  3860 Mar 16 11:54 /Users/donghua/spark-2.4.0-bin-hadoop2.7/conf/core-site.xml
-rwxr-xr-x@ 1 donghua  staff   557 Mar 16 11:54 /Users/donghua/spark-2.4.0-bin-hadoop2.7/conf/hadoop-env.sh
-rwxr-xr-x@ 1 donghua  staff  1132 Mar 16 11:54 /Users/donghua/spark-2.4.0-bin-hadoop2.7/conf/hive-env.sh
-rwxr-xr-x@ 1 donghua  staff  5399 Mar 16 11:54 /Users/donghua/spark-2.4.0-bin-hadoop2.7/conf/hive-site.xml

No comments:

Post a Comment