Sunday, November 19, 2017

Installing or Upgrading Cloudera Distribution of Apache Spark 2

Referece URLs:

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_addon_services.html

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html


[root@cdh-vm csd]# wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar -O /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar

[root@cdh-vm csd]# chown cloudera-scm:cloudera-scm  /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@cdh-vm csd]# chmod 644 /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@cdh-vm csd]# ls -l /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 17240 Jul 13 10:17 /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar

[root@cdh-vm csd]# /etc/init.d/cloudera-scm-server restart

image

image

- Download, Distrubute and Activated the SPARK2 parcels.

image

- Continue to add the Spark2 Service

image

image

image

image

image

image

[donghua@cdh-vm ~]$ pyspark2
Python 2.7.5 (default, Aug  4 2017, 00:39:18)
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0.cloudera1
      /_/

Using Python version 2.7.5 (default, Aug  4 2017 00:39:18)
SparkSession available as 'spark'.

In [1]: t=spark.read.text('/user/donghua/hadoopsecurity.txt')
17/11/19 08:35:44 WARN streaming.FileStreamSink: Error while looking for metadata directory.

In [2]: t.collect()
Out[2]:
[Row(value=u'Kerberos Principals and Keytabs'),
  Row(value=u'Why Use Cloudera Manager to Implement Hadoop Security?'),
  Row(value=u'Enabling Kerberos Authentication Using Cloudera Manager'),
  Row(value=u'Viewing and Regenerating Kerberos Principals'),
  Row(value=u'Configuring LDAP Group Mappings'),
  Row(value=u'Mapping Kerberos Principals to Short Names'),
  Row(value=u'Troubleshooting Kerberos Security Issues'),
  Row(value=u'Known Kerberos Issues in Cloudera Manager'),
  Row(value=u'Appendix A - Manually Configuring Kerberos Using Cloudera Manager'),
  Row(value=u'Appendix B - Set up a Cluster-dedicated MIT KDC and Default Domain for the Hadoop Cluster'),
  Row(value=u'Appendix C - Hadoop Users in Cloudera Manager')]