Donghua's Blog - DBAGlobe: March 2018

Sunday, March 18, 2018

Install ntp packages to fix issue on cloudera manager agent installation

Install ntp packages to fix issue on cloudera manager agent installation (RHEL7: yum install ntp)

======================================

18/Mar/2018 10:40:32 +0000] 6339 MainThread agent INFO Active parcel list updated; recalculating component info.
[18/Mar/2018 10:40:33 +0000] 6339 MainThread throttling_logger INFO Identified java component java8 with full version JAVA_HOME=/usr/java/defa
ult java version "1.8.0_162" Java(TM) SE Runtime Environment (build 1.8.0_162-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode
) for requested version misc.
[18/Mar/2018 10:40:33 +0000] 6339 MainThread throttling_logger INFO Identified java component java8 with full version JAVA_HOME=/usr/java/jdk1
.8.0_162 java version "1.8.0_162" Java(TM) SE Runtime Environment (build 1.8.0_162-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed
mode) for requested version 8.
[18/Mar/2018 10:40:35 +0000] 6339 Monitor-HostMonitor throttling_logger ERROR Failed to collect NTP metrics
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.1-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 48, in collect
self.collect_ntpd()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.1-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 66, in collect_ntpd
result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.1-py2.7.egg/cmf/monitor/host/ntp_monitor.py", line 38, in _subprocess_
with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.1-py2.7.egg/cmf/subprocess_timeout.py", line 49, in subprocess_with_ti
meout
p = subprocess.Popen(**kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Sunday, March 11, 2018

Regexp_extract example syntax explained in Hive

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select '${hivevar:str1}';
+-----------------------+--+
| _c0 |
+-----------------------+--+
| adfsdf1231asdf123123 |
+-----------------------+--+
1 row selected (0.115 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select regexp_extract('${hivevar:str1}','(\\d+)\\D*(\\d+)',1) part1,regexp_extract('${hivevar:str1}','(\\d+)\\D*(\\d+)',2) part2;
+--------+---------+--+
| part1 | part2 |
+--------+---------+--+
| 1231 | 123123 |
+--------+---------+--+
1 row selected (0.067 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> desc function regexp_extract;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| regexp_extract(str, regexp[, idx]) - extracts a group that matches regexp |
+----------------------------------------------------+--+
1 row selected (0.029 seconds)

Saturday, March 10, 2018

Substitute variables in Impala

set var:xx and set hivevar:xx are equivalent in impala-shell

[cdh-vm.dbaglobe.com:21000] > set var:zipcode=42031;
Variable ZIPCODE set to 42031
[cdh-vm.dbaglobe.com:21000] > select fname,lname from employees where zipcode='${var:zipcode}';
Query: select fname,lname from employees where zipcode='42031'
Query submitted at: 2018-03-10 08:27:09 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=7540c4bdaae3d532:c213eb6900000000
+-------+----------+
| fname | lname |
+-------+----------+
| Russ | Jennings |
| Eliza | Weller |
+-------+----------+
Fetched 2 row(s) in 6.60s
[cdh-vm.dbaglobe.com:21000] > set hivevar:zipcode=42031;
Variable ZIPCODE set to 42031
[cdh-vm.dbaglobe.com:21000] > select fname,lname from employees where zipcode='${hivevar:zipcode}';
Query: select fname,lname from employees where zipcode='42031'
Query submitted at: 2018-03-10 08:27:31 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=444489b56fdff58e:d52cc25800000000
+-------+----------+
| fname | lname |
+-------+----------+
| Eliza | Weller |
| Russ | Jennings |
+-------+----------+
Fetched 2 row(s) in 0.17s

[donghua@cdh-vm ~]$ impala-shell --var=zipcode="42031"
Starting Impala Shell without Kerberos authentication
Connected to cdh-vm.dbaglobe.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan 6 13:27:16 PST 2018)

Run the PROFILE command after a query has finished to see a comprehensive summary
of all the performance and diagnostic information that Impala gathered for that
query. Be warned, it can be very long!
***********************************************************************************

[cdh-vm.dbaglobe.com:21000] > select ${var:zipcode};
Query: select 42031
Query submitted at: 2018-03-10 08:48:06 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=94434f0933a88c9c:15cfcf3d00000000
+-------+
| 42031 |
+-------+
| 42031 |
+-------+
Fetched 1 row(s) in 0.04s
[cdh-vm.dbaglobe.com:21000] > select ${hivevar:zipcode};
Query: select 42031
Query submitted at: 2018-03-10 08:48:17 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=9b4849b05dddb406:9da4309e00000000
+-------+
| 42031 |
+-------+
| 42031 |
+-------+
Fetched 1 row(s) in 0.04s

Substitute variables in Hive

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> set hivevar:zipcode=42031;
No rows affected (0.007 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select fname,lname from employees where zipcode='${hivevar:zipcode}';
+--------+-----------+--+
| fname | lname |
+--------+-----------+--+
| Eliza | Weller |
| Russ | Jennings |
+--------+-----------+--+
2 rows selected (22.079 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> set hivevar:zipcode=95066;
No rows affected (0.011 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select fname,lname from employees where zipcode='${hivevar:zipcode}';
+--------------+-------------+--+
| fname | lname |
+--------------+-------------+--+
| Anna | Atkins |
| Austin | Dickson |
| Alex | Magee |
| Alan | Millner |
| Ashley | Quarles |

[donghua@cdh-vm ~]$ beeline -u jdbc:hive2://cdh-vm.dbaglobe.com:10000 -n donghua --hivevar zipcode="42031"

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> set hivevar:zipcode;
+------------------------+--+
| set |
+------------------------+--+
| hivevar:zipcode=42031 |
+------------------------+--+
1 row selected (0.022 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select fname,lname from employees where zipcode='${hivevar:zipcode}';
+--------+-----------+--+
| fname | lname |
+--------+-----------+--+
| Eliza | Weller |
| Russ | Jennings |
+--------+-----------+--+
2 rows selected (20.247 seconds)

Wednesday, March 7, 2018

Construct SCD Type 2 in Hive examples

create table products(prod_id int, name string, price double, last_modified timestamp) stored as parquet;
create table orders(order_id int, prod_id int, quantity int, order_date timestamp) stored as parquet;

insert into products values(1,'Soap',1.0,'2017-01-01');
insert into products values(1,'Soap',1.1,'2017-04-01');
insert into products values(1,'Soap',1.2,'2017-07-01');
insert into products values(1,'Soap',1.3,'2017-10-01');
insert into products values(2,'Soda',0.6,'2017-01-01');
insert into products values(2,'Soda',0.8,'2017-06-01');
insert into products values(3,'Beer',4.2,'2017-07-01');

insert into orders values(1,1,1,'2017-02-10');
insert into orders values(2,2,10,'2017-07-10');
insert into orders values(3,3,5,'2017-09-10');
insert into orders values(4,1,2,'2017-12-01');

create view prod_scd2 as
select prod_id,price,last_modified as valid_from,
lead(last_modified) over (partition by prod_id order by last_modified) as valid_end
from products;

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select * from prod_scd2 p;
+------------+----------+------------------------+------------------------+--+
| p.prod_id | p.price | p.valid_from | p.valid_end |
+------------+----------+------------------------+------------------------+--+
| 1 | 1.0 | 2017-01-01 00:00:00.0 | 2017-04-01 00:00:00.0 |
| 1 | 1.1 | 2017-04-01 00:00:00.0 | 2017-07-01 00:00:00.0 |
| 1 | 1.2 | 2017-07-01 00:00:00.0 | 2017-10-01 00:00:00.0 |
| 1 | 1.3 | 2017-10-01 00:00:00.0 | NULL |
| 2 | 0.6 | 2017-01-01 00:00:00.0 | 2017-06-01 00:00:00.0 |
| 2 | 0.8 | 2017-06-01 00:00:00.0 | NULL |
| 3 | 4.2 | 2017-07-01 00:00:00.0 | NULL |
+------------+----------+------------------------+------------------------+--+
7 rows selected (20.253 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select * from orders o;
+-------------+------------+-------------+------------------------+--+
| o.order_id | o.prod_id | o.quantity | o.order_date |
+-------------+------------+-------------+------------------------+--+
| 1 | 1 | 1 | 2017-02-10 00:00:00.0 |
| 2 | 2 | 10 | 2017-07-10 00:00:00.0 |
| 3 | 3 | 5 | 2017-09-10 00:00:00.0 |
| 4 | 1 | 2 | 2017-12-01 00:00:00.0 |
+-------------+------------+-------------+------------------------+--+
4 rows selected (0.152 seconds)

select o.order_id,o.order_date,p.price,o.quantity*p.price as total_cost
from orders o join prod_scd2 p
on (o.prod_id=p.prod_id)
where o.order_date between p.valid_from and if(p.valid_end is not null ,p.valid_end,cast('9999-12-31' as timestamp))
order by order_id;

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select o.order_id,o.order_date,p.price,o.quantity*p.price as total_cost
. . . . . . . . . . . . . . . . . . . . .> from orders o join prod_scd2 p
. . . . . . . . . . . . . . . . . . . . .> on (o.prod_id=p.prod_id)
. . . . . . . . . . . . . . . . . . . . .> where o.order_date between p.valid_from and if(p.valid_end is not null ,p.valid_end,cast('9999-12-31' as timestamp))
. . . . . . . . . . . . . . . . . . . . .> order by order_id;

+-------------+------------------------+----------+-------------+--+
| o.order_id | o.order_date | p.price | total_cost |
+-------------+------------------------+----------+-------------+--+
| 1 | 2017-02-10 00:00:00.0 | 1.0 | 1.0 |
| 2 | 2017-07-10 00:00:00.0 | 0.8 | 8.0 |
| 3 | 2017-09-10 00:00:00.0 | 4.2 | 21.0 |
| 4 | 2017-12-01 00:00:00.0 | 1.3 | 2.6 |
+-------------+------------------------+----------+-------------+--+
4 rows selected (48.862 seconds)

Friday, March 2, 2018

Hive rename table column on Avro data format

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> create table t1 (id1 int) stored as avro;

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> insert into t1 values(1);

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> insert into t1 values(2);

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> alter table t1 change id1 id2 int;

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> insert into t1 values(3);

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select * from t1;
+---------+--+
| t1.id2 |
+---------+--+
| NULL |
| NULL |
| 3 |
+---------+--+
3 rows selected (0.135 seconds)

[donghua@cdh-vm ~]$ hdfs dfs -cat /user/hive/warehouse/t1/000000_0
Objavro.schema?{"type":"record","name":"t1","namespace":"default","fields":[{"name":"id1","type":["null","int"],"default":null}]}p??&T??z???Ӌ??&T??z???Ӌ

[donghua@cdh-vm ~]$ hdfs dfs -cat /user/hive/warehouse/t1/000000_0_copy_1
Objavro.schema?{"type":"record","name":"t1","namespace":"default","fields":[{"name":"id1","type":["null","int"],"default":null}]}?0??Qs>?4?????0??Qs>?4????

[donghua@cdh-vm ~]$ hdfs dfs -cat /user/hive/warehouse/t1/000000_0_copy_2
Objavro.schema?{"type":"record","name":"t1","namespace":"default","fields":[{"name":"id2","type":["null","int"],"default":null}]}>?>?)x?Dv&?I??1>?>?)x?Dv&?I??1

Impala can't insert on Avro format

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_avro.html

Impala supports using tables whose data files use the Avro file format. Impala can query Avro tables, and in Impala 1.4.0 and higher can create them, but currently cannot insert data into them. For insert operations, use Hive, then switch back to Impala to run queries.

[cdh-vm.dbaglobe.com:21000] > create table t1(id1 int) stored as avro;
Query: create table t1(id1 int) stored as avro
Fetched 0 row(s) in 1.59s

[cdh-vm.dbaglobe.com:21000] > insert into t1 values(1);
Query: insert into t1 values(1)
Query submitted at: 2018-03-02 21:45:36 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=6847cc676e330ec0:c1d2b6d300000000
WARNINGS: Writing to table format AVRO is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS to override.

[cdh-vm.dbaglobe.com:21000] > select * from t1;
Query: select * from t1
Query submitted at: 2018-03-02 21:46:00 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=974e823123b8e334:bec6f70600000000
Fetched 0 row(s) in 0.06s

[cdh-vm.dbaglobe.com:21000] > set ALLOW_UNSUPPORTED_FORMATS=true;
ALLOW_UNSUPPORTED_FORMATS set to true

[cdh-vm.dbaglobe.com:21000] > insert into t1 values(1);
Query: insert into t1 values(1)
Query submitted at: 2018-03-02 21:47:23 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=f148965b772acd2e:fa4afc8700000000
Modified 0 row(s) in 1.04s

[cdh-vm.dbaglobe.com:21000] > select * from t1;
Query: select * from t1
Query submitted at: 2018-03-02 21:47:26 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=3c4ed8a0106b5cd1:6a45ee0c00000000
+------+
| id1 |
+------+
| NULL |
+------+
Fetched 1 row(s) in 0.36s

[donghua@cdh-vm temp]$ beeline -u jdbc:hive2://cdh-vm.dbaglobe.com:10000 -n donghua
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
scan complete in 2ms
Connecting to jdbc:hive2://cdh-vm.dbaglobe.com:10000
Connected to: Apache Hive (version 1.1.0-cdh5.14.0)
Driver: Hive JDBC (version 1.1.0-cdh5.14.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.14.0 by Apache Hive

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select * from t1;
Error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Checksum failure (state=,code=0)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> truncate table t1;
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Exception while processing
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Exception while processing (state=08S01,code=1)

-- From Impala
[cdh-vm.dbaglobe.com:21000] > drop table t1;
Query: drop table t1

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> create table t1 (id1 int) stored as avro;
No rows affected (0.136 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> insert into t1 values(1);

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000> select * from t1;
+---------+--+
| t1.id1 |
+---------+--+
| 1 |
+---------+--+

Thursday, March 1, 2018

Without Sentry, Impala and Hive will broken on permission issues

[cdh-vm.dbaglobe.com:21000] > alter table suppliers rename to vendors;
Query: alter table suppliers rename to vendors
ERROR: ImpalaRuntimeException: Error making 'alter_table' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: Alter Table operation for default.suppliers failed to move data due to: 'Permission denied by sticky bit: user=impala, path="/user/hive/warehouse/suppliers":donghua:hive:drwxrwxrwt, parent="/user/hive/warehouse":hive:hive:drwxrwxrwt
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkStickyBit(DefaultAuthorizationProvider.java:387)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:159)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3877)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6779)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:4047)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:4017)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3982)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:814)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.rename(AuthorizationProviderProxyClientProtocol.java:271)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:590)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
' See hive log file for details.

[donghua@cdh-vm data_mgmt]$ hdfs dfs -ls -d /user/hive/warehouse/suppliers
drwxrwxrwt - donghua hive 0 2018-03-01 21:40 /user/hive/warehouse/suppliers

Donghua's Blog - DBAGlobe