Sunday, November 19, 2017

Installing or Upgrading Cloudera Distribution of Apache Spark 2

Referece URLs:

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_addon_services.html

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html


[root@cdh-vm csd]# wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar -O /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar

[root@cdh-vm csd]# chown cloudera-scm:cloudera-scm  /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@cdh-vm csd]# chmod 644 /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@cdh-vm csd]# ls -l /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
-rw-r--r-- 1 cloudera-scm cloudera-scm 17240 Jul 13 10:17 /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar

[root@cdh-vm csd]# /etc/init.d/cloudera-scm-server restart

image

image

- Download, Distrubute and Activated the SPARK2 parcels.

image

- Continue to add the Spark2 Service

image

image

image

image

image

image

[donghua@cdh-vm ~]$ pyspark2
Python 2.7.5 (default, Aug  4 2017, 00:39:18)
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0.cloudera1
      /_/

Using Python version 2.7.5 (default, Aug  4 2017 00:39:18)
SparkSession available as 'spark'.

In [1]: t=spark.read.text('/user/donghua/hadoopsecurity.txt')
17/11/19 08:35:44 WARN streaming.FileStreamSink: Error while looking for metadata directory.

In [2]: t.collect()
Out[2]:
[Row(value=u'Kerberos Principals and Keytabs'),
  Row(value=u'Why Use Cloudera Manager to Implement Hadoop Security?'),
  Row(value=u'Enabling Kerberos Authentication Using Cloudera Manager'),
  Row(value=u'Viewing and Regenerating Kerberos Principals'),
  Row(value=u'Configuring LDAP Group Mappings'),
  Row(value=u'Mapping Kerberos Principals to Short Names'),
  Row(value=u'Troubleshooting Kerberos Security Issues'),
  Row(value=u'Known Kerberos Issues in Cloudera Manager'),
  Row(value=u'Appendix A - Manually Configuring Kerberos Using Cloudera Manager'),
  Row(value=u'Appendix B - Set up a Cluster-dedicated MIT KDC and Default Domain for the Hadoop Cluster'),
  Row(value=u'Appendix C - Hadoop Users in Cloudera Manager')]

Saturday, November 18, 2017

Install and configure jupter for pyspark

[root@cdh-vm ~]# pip install jupyter


[donghua@cdh-vm ~]$ grep PYSPARK  ~/.bash_profile
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip 192.168.56.10 --port 3333 --no-mathjax"

[donghua@cdh-vm ~]$ pyspark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
[I 13:34:34.137 NotebookApp] Serving notebooks from local directory: /home/donghua
[I 13:34:34.137 NotebookApp] 0 active kernels
[I 13:34:34.137 NotebookApp] The Jupyter Notebook is running at:
[I 13:34:34.137 NotebookApp] http://192.168.56.10:3333/?token=3a127150007c4f0816871644a97feb3b2c1ad721411e9576
[I 13:34:34.137 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 13:34:34.138 NotebookApp] No web browser found: could not locate runnable browser.
[C 13:34:34.138 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://192.168.56.10:3333/?token=3a127150007c4f0816871644a97feb3b2c1ad721411e9576

image

================================================

[root@cdh-vm ~]# pip install jupyter
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl
Collecting nbconvert (from jupyter)
  Downloading nbconvert-5.3.1-py2.py3-none-any.whl (387kB)
    100% |████████████████████████████████| 389kB 723kB/s
Collecting ipywidgets (from jupyter)
  Downloading ipywidgets-7.0.5-py2.py3-none-any.whl (68kB)
    100% |████████████████████████████████| 71kB 3.2MB/s
Collecting notebook (from jupyter)
  Downloading notebook-5.2.1-py2.py3-none-any.whl (8.0MB)
    100% |████████████████████████████████| 8.0MB 126kB/s
Collecting qtconsole (from jupyter)
  Downloading qtconsole-4.3.1-py2.py3-none-any.whl (108kB)
    100% |████████████████████████████████| 112kB 4.3MB/s
Collecting jupyter-console (from jupyter)
  Downloading jupyter_console-5.2.0-py2.py3-none-any.whl
Collecting ipykernel (from jupyter)
  Downloading ipykernel-4.6.1-py2-none-any.whl (104kB)
    100% |████████████████████████████████| 112kB 5.6MB/s
Collecting pandocfilters>=1.4.1 (from nbconvert->jupyter)
  Downloading pandocfilters-1.4.2.tar.gz
Collecting entrypoints>=0.2.2 (from nbconvert->jupyter)
  Downloading entrypoints-0.2.3-py2.py3-none-any.whl
Collecting jinja2 (from nbconvert->jupyter)
  Downloading Jinja2-2.10-py2.py3-none-any.whl (126kB)
    100% |████████████████████████████████| 133kB 3.6MB/s
Collecting testpath (from nbconvert->jupyter)
  Downloading testpath-0.3.1-py2.py3-none-any.whl (161kB)
    100% |████████████████████████████████| 163kB 2.6MB/s
Collecting mistune>=0.7.4 (from nbconvert->jupyter)
  Downloading mistune-0.8.1-py2.py3-none-any.whl
Collecting nbformat>=4.4 (from nbconvert->jupyter)
  Downloading nbformat-4.4.0-py2.py3-none-any.whl (155kB)
    100% |████████████████████████████████| 163kB 2.9MB/s
Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from nbconvert->jupyter)
Collecting bleach (from nbconvert->jupyter)
  Downloading bleach-2.1.1-py2.py3-none-any.whl
Requirement already satisfied: traitlets>=4.2 in /usr/lib/python2.7/site-packages (from nbconvert->jupyter)
Collecting jupyter-core (from nbconvert->jupyter)
  Downloading jupyter_core-4.4.0-py2.py3-none-any.whl (126kB)
    100% |████████████████████████████████| 133kB 3.5MB/s
Requirement already satisfied: ipython<6.0.0,>=4.0.0; python_version < "3.3" in /usr/lib/python2.7/site-packages (from ipywidgets->jupyter)
Collecting widgetsnbextension~=3.0.0 (from ipywidgets->jupyter)
  Downloading widgetsnbextension-3.0.8-py2.py3-none-any.whl (2.2MB)
    100% |████████████████████████████████| 2.2MB 347kB/s
Requirement already satisfied: ipython-genutils in /usr/lib/python2.7/site-packages (from notebook->jupyter)
Collecting jupyter-client (from notebook->jupyter)
  Downloading jupyter_client-5.1.0-py2.py3-none-any.whl (84kB)
    100% |████████████████████████████████| 92kB 2.8MB/s
Collecting tornado>=4 (from notebook->jupyter)
  Downloading tornado-4.5.2.tar.gz (483kB)
    100% |████████████████████████████████| 491kB 1.8MB/s
Collecting terminado>=0.3.3; sys_platform != "win32" (from notebook->jupyter)
  Downloading terminado-0.7-py2.py3-none-any.whl
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.0 in /usr/lib/python2.7/site-packages (from jupyter-console->jupyter)
Collecting configparser>=3.5; python_version == "2.7" (from entrypoints>=0.2.2->nbconvert->jupyter)
  Downloading configparser-3.5.0.tar.gz
Collecting MarkupSafe>=0.23 (from jinja2->nbconvert->jupyter)
  Downloading MarkupSafe-1.0.tar.gz
Collecting jsonschema!=2.5.0,>=2.4 (from nbformat>=4.4->nbconvert->jupyter)
  Downloading jsonschema-2.6.0-py2.py3-none-any.whl
Collecting html5lib!=1.0b1,!=1.0b2,!=1.0b3,!=1.0b4,!=1.0b5,!=1.0b6,!=1.0b7,!=1.0b8,>=0.99999999pre (from bleach->nbconvert->jupyter)
  Downloading html5lib-1.0b10-py2.py3-none-any.whl (112kB)
    100% |████████████████████████████████| 112kB 3.7MB/s
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from bleach->nbconvert->jupyter)
Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from traitlets>=4.2->nbconvert->jupyter)
Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->nbconvert->jupyter)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Requirement already satisfied: setuptools>=18.5 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Requirement already satisfied: pathlib2; python_version == "2.7" or python_version == "3.3" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Requirement already satisfied: simplegeneric>0.8 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Requirement already satisfied: pickleshare in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Collecting pyzmq>=13 (from jupyter-client->notebook->jupyter)
  Downloading pyzmq-16.0.3-cp27-cp27mu-manylinux1_x86_64.whl (3.0MB)
    100% |████████████████████████████████| 3.0MB 299kB/s
Collecting python-dateutil>=2.1 (from jupyter-client->notebook->jupyter)
  Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    100% |████████████████████████████████| 194kB 2.6MB/s
Requirement already satisfied: backports.ssl_match_hostname in /usr/lib/python2.7/site-packages (from tornado>=4->notebook->jupyter)
Collecting singledispatch (from tornado>=4->notebook->jupyter)
  Downloading singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting certifi (from tornado>=4->notebook->jupyter)
  Downloading certifi-2017.11.5-py2.py3-none-any.whl (330kB)
    100% |████████████████████████████████| 337kB 596kB/s
Collecting backports_abc>=0.4 (from tornado>=4->notebook->jupyter)
  Downloading backports_abc-0.5-py2.py3-none-any.whl
Requirement already satisfied: ptyprocess in /usr/lib/python2.7/site-packages (from terminado>=0.3.3; sys_platform != "win32"->notebook->jupyter)
Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.0->jupyter-console->jupyter)
Collecting functools32; python_version == "2.7" (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert->jupyter)
  Downloading functools32-3.2.3-2.zip
Collecting webencodings (from html5lib!=1.0b1,!=1.0b2,!=1.0b3,!=1.0b4,!=1.0b5,!=1.0b6,!=1.0b7,!=1.0b8,>=0.99999999pre->bleach->nbconvert->jupyter)
  Downloading webencodings-0.5.1-py2.py3-none-any.whl
Requirement already satisfied: scandir; python_version < "3.5" in /usr/lib64/python2.7/site-packages (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=4.0.0; python_version < "3.3"->ipywidgets->jupyter)
Installing collected packages: pandocfilters, configparser, entrypoints, MarkupSafe, jinja2, testpath, mistune, functools32, jsonschema, jupyter-core, nbformat, webencodings, html5lib, bleach, nbconvert, pyzmq, python-dateutil, jupyter-client, singledispatch, certifi, backports-abc, tornado, terminado, ipykernel, notebook, widgetsnbextension, ipywidgets, qtconsole, jupyter-console, jupyter
  Running setup.py install for pandocfilters ... done
  Running setup.py install for configparser ... done
  Running setup.py install for MarkupSafe ... done
  Running setup.py install for functools32 ... done
  Running setup.py install for tornado ... done
Successfully installed MarkupSafe-1.0 backports-abc-0.5 bleach-2.1.1 certifi-2017.11.5 configparser-3.5.0 entrypoints-0.2.3 functools32-3.2.3.post2 html5lib-1.0b10 ipykernel-4.6.1 ipywidgets-7.0.5 jinja2-2.10 jsonschema-2.6.0 jupyter-1.0.0 jupyter-client-5.1.0 jupyter-console-5.2.0 jupyter-core-4.4.0 mistune-0.8.1 nbconvert-5.3.1 nbformat-4.4.0 notebook-5.2.1 pandocfilters-1.4.2 python-dateutil-2.6.1 pyzmq-16.0.3 qtconsole-4.3.1 singledispatch-3.4.0.3 terminado-0.7 testpath-0.3.1 tornado-4.5.2 webencodings-0.5.1 widgetsnbextension-3.0.8
[root@cdh-vm ~]#

Friday, November 17, 2017

Using ipython in pyspark

Here is the link for ipython installation: http://www.dbaglobe.com/2017/11/install-ipython-on-centos7-redhat-el-7.html


If you use Spark < 1.2 you can simply execute bin/pyspark with an environmental variable IPYTHON=1.

IPYTHON=1 /usr/bin/pyspark


or

export IPYTHON=1
/usr/bin/pyspark

While above will still work on the Spark 1.2 and above recommended way to set Python environment for these versions is PYSPARK_DRIVER_PYTHON


PYSPARK_DRIVER_PYTHON=ipython /usr/bin/pyspark

or

export PYSPARK_DRIVER_PYTHON=ipython
/usr/bin/pyspark

image

RDD Lineage and Persistence

Example 1: Without Persistence

>>> hs=sc.textFile('hdfs://cdh-vm/user/donghua/hadoopsecurity.txt')
>>> rdd1=hs.map(lambda line:line.upper())
>>> rdd2=rdd1.filter(lambda line:line.startswith('E'))
>>> rdd2.collect()
[u'ENABLING KERBEROS AUTHENTICATION USING CLOUDERA MANAGER']

>>> print rdd2.toDebugString()
(2) PythonRDD[15] at collect at <stdin>:1 []
  |  hdfs://cdh-vm/user/donghua/hadoopsecurity.txt MapPartitionsRDD[14] at textFile at NativeMethodAccessorImpl.java:-2 []
  |  hdfs://cdh-vm/user/donghua/hadoopsecurity.txt HadoopRDD[13] at textFile at NativeMethodAccessorImpl.java:-2 []

Example 2: With default Persistence for RDD1


>>> hs=sc.textFile('hdfs://cdh-vm/user/donghua/hadoopsecurity.txt')
> >> rdd1=hs.map(lambda line:line.upper())
>>> rdd1.persist()
PythonRDD[18] at RDD at PythonRDD.scala:43
>>> rdd2=rdd1.filter(lambda line:line.startswith('E'))
>>> rdd2.collect()
[u'ENABLING KERBEROS AUTHENTICATION USING CLOUDERA MANAGER']


>>> print rdd2.toDebugString()
(2) PythonRDD[19] at collect at <stdin>:1 []
|  PythonRDD[18] at RDD at PythonRDD.scala:43 []
  |      CachedPartitions: 2; MemorySize: 577.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
  |  hdfs://cdh-vm/user/donghua/hadoopsecurity.txt MapPartitionsRDD[17] at textFile at NativeMethodAccessorImpl.java:-2 []
  |  hdfs://cdh-vm/user/donghua/hadoopsecurity.txt HadoopRDD[16] at textFile at NativeMethodAccessorImpl.java:-2 []

Example 3: With default Persistence for HS, RDD1 and Memory_and_disk Persistence for RDD2

>>> from pyspark import StorageLevel
>>> hs=sc.textFile('hdfs://cdh-vm/user/donghua/hadoopsecurity.txt')
>>> hs.persist()
hdfs://cdh-vm/user/donghua/hadoopsecurity.txt MapPartitionsRDD[25] at textFile at NativeMethodAccessorImpl.java:-2
>>> rdd1=hs.map(lambda line:line.upper())
>>> rdd1.persist()
PythonRDD[26] at RDD at PythonRDD.scala:43
>>> rdd2=rdd1.filter(lambda line:line.startswith('E'))
>>> rdd2.persist(StorageLevel.MEMORY_AND_DISK)
PythonRDD[27] at RDD at PythonRDD.scala:43

>>> rdd2.collect()
[u'ENABLING KERBEROS AUTHENTICATION USING CLOUDERA MANAGER']
>>> print rdd2.toDebugString()
(2) PythonRDD[27] at RDD at PythonRDD.scala:43 [Disk Memory Deserialized 1x Replicated]
  |       CachedPartitions: 2; MemorySize: 128.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
  |  PythonRDD[26] at RDD at PythonRDD.scala:43 [Disk Memory Deserialized 1x Replicated]
  |      CachedPartitions: 2; MemorySize: 577.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
  |  hdfs://cdh-vm/user/donghua/hadoopsecurity.txt MapPartitionsRDD[25] at textFile at NativeMethodAccessorImpl.java:-2 [Disk Memory Deserialized 1x Replicated]
  |      CachedPartitions: 2; MemorySize: 491.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
  |  hdfs://cdh-vm/user/donghua/hadoopsecurity.txt HadoopRDD[24] at textFile at NativeMethodAccessorImpl.java:-2 [Disk Memory Deserialized 1x Replicated]

Saturday, November 11, 2017

Install Cloudera Manager and CDH using local repository

Step 1: install prerequsite packages:
yum install yum-utils createrepo httpd
Step 2: prepare httpd server
>>>> /etc/httpd/conf.d/cloudera.conf
Alias "/cm" "/repo/cm"
< Directory "/repo/cm">
    Options Indexes FollowSymLinks
    AllowOverride None
    Require all granted
< /Directory>
Alias "/cdh" "/repo/cdh"
< Directory "/repo/cdh">
    Options Indexes FollowSymLinks
    AllowOverride None
    Require all granted

Alternatively, using python SimpleHTTPServer built-in module:
[root@cdh-vm repo]# cd /repo/
[root@cdh-vm repo]# python -m SimpleHTTPServer 80
Serving HTTP on 0.0.0.0 port 80 ...

Step 3: download CM and parcels into respective folder.
https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.13.0/RPMS/
https://archive.cloudera.com/cdh5/parcels/5.13/
Step 4: Setup local repo
[root@cdh-vm ~]# createrepo /repo/cm/
>>>>>>> /etc/yum.repos.d/cloudera-local.repo
[cloudera-local]
name=cloudera-local
baseurl=http://cdh-vm/cm/
gpgcheck=0
enabled=1
Step 5: Install Cloudera Manager
[root@cdh-vm cm]# yum install cloudera-manager-server cloudera-manager-daemons
Step 6: Configure CM using mysql
[root@cdh-vm ~]# /usr/share/cmf/schema/scm_prepare_database.sh mysql cmserver cmserver password
Step 7: login http://cdh-vm.dbaglobe.com:7180 to setup cluster
ScreenHunter 1812
ScreenHunter 1813

Thursday, November 9, 2017

Configure Cloudera Keberos Authentication using MTI-KDC

- Create keberos admin user who has privivleges to to add other principals
[root@cdh-vm krb5kdc]# kadmin.local  -q "addprinc cloudera-scm/admin"

-- Before proceed, verify the KDC works:

ktutil:  add_entry -password -p cloudera-scm/admin -k 1 -e aes256-cts-hmac-sha1-96
Password for cloudera-scm/admin@DBAGLOBE.COM:

[root@cdh-vm log]#  klist -e
Ticket cache: KEYRING:persistent:0:krb_ccache_r0tnzhY
Default principal: cloudera-scm/admin@DBAGLOBE.COM

Valid starting       Expires              Service principal
11/09/2017 23:09:42  11/10/2017 23:09:42  krbtgt/DBAGLOBE.COM@DBAGLOBE.COM
        Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

ScreenHunter 1793ScreenHunter 1795ScreenHunter 1798ScreenHunter 1804ScreenHunter 1802ScreenHunter 1803

Encountered errors:

2017-11-09 23:34:18,010 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8022: readAndProcess from client 192.168.56.10 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]]
2017-11-09 23:34:18,655 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30002 milliseconds
2017-11-09 23:34:18,656 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2017-11-09 23:34:22,804 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8022: readAndProcess from client 192.168.56.10 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not support

How to Fix:

[root@cdh-vm security]# pwd
/usr/java/jdk1.8.0_144/jre/lib/security

[root@cdh-vm security]# mkdir limited
[root@cdh-vm security]# mv *.jar limited/


[root@cdh-vm security]# unzip /home/donghua/jce_policy-8.zip -d  /home/donghua/
[root@cdh-vm security]# cp /home/donghua/UnlimitedJCEPolicyJDK8/*.jar .

Monday, November 6, 2017

Install ipython on CentOS7 / Redhat EL 7

[root@cdh-vm ~]# yum install gcc python-devel python-setuptools

[root@cdh-vm ~]# rpm -ql python-setuptools.noarch |grep easy_install.py
/usr/lib/python2.7/site-packages/easy_install.py
/usr/lib/python2.7/site-packages/easy_install.pyc
/usr/lib/python2.7/site-packages/easy_install.pyo
/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py
/usr/lib/python2.7/site-packages/setuptools/command/easy_install.pyc
/usr/lib/python2.7/site-packages/setuptools/command/easy_install.pyo


[root@cdh-vm ~]# python /usr/lib/python2.7/site-packages/easy_install.py pip
Searching for pip
Reading https://pypi.python.org/simple/pip/
Best match: pip 9.0.1
Downloading https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
Processing pip-9.0.1.tar.gz
Writing /tmp/easy_install-Cjz0xZ/pip-9.0.1/setup.cfg
Running pip-9.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Cjz0xZ/pip-9.0.1/egg-dist-tmp-G6YcYa
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
warning: no previously-included files found matching '.coveragerc'
warning: no previously-included files found matching '.mailmap'
warning: no previously-included files found matching '.travis.yml'
warning: no previously-included files found matching '.landscape.yml'
warning: no previously-included files found matching 'pip/_vendor/Makefile'
warning: no previously-included files found matching 'tox.ini'
warning: no previously-included files found matching 'dev-requirements.txt'
warning: no previously-included files found matching 'appveyor.yml'
no previously-included directories found matching '.github'
no previously-included directories found matching '.travis'
no previously-included directories found matching 'docs/_build'
no previously-included directories found matching 'contrib'
no previously-included directories found matching 'tasks'
no previously-included directories found matching 'tests'
Adding pip 9.0.1 to easy-install.pth file
Installing pip script to /usr/bin
Installing pip2.7 script to /usr/bin
Installing pip2 script to /usr/bin

Installed /usr/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Processing dependencies for pip
Finished processing dependencies for pip


[root@cdh-vm ~]# pip install ipython
Collecting ipython
  Using cached ipython-5.5.0-py2-none-any.whl
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython)
Requirement already satisfied: setuptools>=18.5 in /usr/lib/python2.7/site-packages (from ipython)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython)
Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython)
Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython)
Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython)
  Using cached pathlib2-2.3.0-py2.py3-none-any.whl
Collecting traitlets>=4.2 (from ipython)
  Using cached traitlets-4.3.2-py2.py3-none-any.whl
Collecting simplegeneric>0.8 (from ipython)
  Using cached simplegeneric-0.8.1.zip
Collecting pickleshare (from ipython)
  Using cached pickleshare-0.7.4-py2.py3-none-any.whl
Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython)
Requirement already satisfied: six>=1.9.0 in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython)
Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython)
Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython)
  Using cached scandir-1.6.tar.gz
Collecting ipython-genutils (from traitlets>=4.2->ipython)
  Using cached ipython_genutils-0.2.0-py2.py3-none-any.whl
Collecting enum34; python_version == "2.7" (from traitlets>=4.2->ipython)
  Using cached enum34-1.1.6-py2-none-any.whl
Installing collected packages: scandir, pathlib2, ipython-genutils, enum34, traitlets, simplegeneric, pickleshare, ipython
  Running setup.py install for scandir ... done
  Running setup.py install for simplegeneric ... done
Successfully installed enum34-1.1.6 ipython-5.5.0 ipython-genutils-0.2.0 pathlib2-2.3.0 pickleshare-0.7.4 scandir-1.6 simplegeneric-0.8.1 traitlets-4.3.2

Sunday, November 5, 2017

Custom installation CDH5 from Parcels error with permission issue

Scenarios: Choose YARN, HDFS with optional (Spark) during the installation

Error message:

2017-11-04 23:47:56,151 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: Permission denied
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:181)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:245)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: Permission denied
        at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
        at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
        at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 5 more
2017-11-04 23:47:56,176 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at cdh-vm.dbaglobe.com/192.168.56.10

************************************************************/

Permission incorrect for CDH folders in /var/lib

image

How to fix:

chmod 755 /var/lib/accumulo
chmod 755 /var/lib/kafka
chmod 755 /var/lib/kudu
chmod 755 /var/lib/flume-ng
chmod 755 /var/lib/hadoop-hdfs
chmod 755 /var/lib/solr
chmod 755 /var/lib/zookeeper
chmod 755 /var/lib/llama
chmod 755 /var/lib/hadoop-httpfs
chmod 755 /var/lib/hadoop-mapreduce
chmod 755 /var/lib/sqoop
chmod 755 /var/lib/hadoop-kms
chmod 755 /var/lib/hive
chmod 755 /var/lib/sqoop2
chmod 755 /var/lib/oozie
chmod 755 /var/lib/hbase
chmod 755 /var/lib/sentry
chmod 755 /var/lib/impala
chmod 755 /var/lib/spark
chmod 755 /var/lib/hadoop-yarn

chown accumulo:accumulo /var/lib/accumulo
chown kafka:kafka /var/lib/kafka
chown kudu:kudu /var/lib/kudu
chown flume:flume /var/lib/flume-ng
chown hdfs:hdfs /var/lib/hadoop-hdfs
chown solr:solr /var/lib/solr
chown zookeeper:zookeeper /var/lib/zookeeper
chown llama:llama /var/lib/llama
chown httpfs:httpfs /var/lib/hadoop-httpfs
chown mapred:mapred /var/lib/hadoop-mapreduce
chown sqoop:sqoop /var/lib/sqoop
chown kms:kms /var/lib/hadoop-kms
chown hive:hive /var/lib/hive
chown sqoop2:sqoop2 /var/lib/sqoop2
chown oozie:oozie /var/lib/oozie
chown hbase:hbase /var/lib/hbase
chown sentry:sentry /var/lib/sentry
chown impala:impala /var/lib/impala
chown spark:spark /var/lib/spark
chown yarn:yarn /var/lib/hadoop-yarn

image