Saturday, December 30, 2017

Kudu “distribute by” syntax error in Impala

Below example follows syntax in various Cloudera Kudu training (kudu v1.0) and documentations/blogs, with syntax error. (replace “distribute by” to equivalent syntax of “partition by” doesn’t help)

[donghua@cdh-vm ~]$ impala-shell -i cdh-vm.dbaglobe.com -k
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
Connected to cdh-vm.dbaglobe.com:21000
Server version: impalad version 2.10.0-cdh5.13.1 RELEASE (build 1e4b23c4eb52dac95c5be6316f49685c41783c51)
***********************************************************************************
Welcome to the Impala shell.


(Impala Shell v2.10.0-cdh5.13.1 (1e4b23c) built on Thu Nov  9 08:29:47 PST 2017)


[cdh-vm.dbaglobe.com:21000] > create table kudu_iotdatademo3
                            > distribute by hash (eventts) into 3 buckets
                             > tblproperties (
                            > 'storage_handler'='com.cloudera.kudu.hive.KuduStorageHandler',
                             > 'kudu_tablename'='kudu_iotdatademo2',
                            > 'kudu_master_addresses'='cdh-vm.dbaglobe.com:8051',
                            > 'kudu_key_columns'='stationid,eventts')
                            > as select * from iotdatademo2;
Query: create table kudu_iotdatademo3
distribute by hash (eventts) into 3 buckets
tblproperties (
'storage_handler'='com.cloudera.kudu.hive.KuduStorageHandler',
'kudu_tablename'='kudu_iotdatademo2',
'kudu_master_addresses'='cdh-vm.dbaglobe.com:8051',
'kudu_key_columns'='stationid,eventts')
as select * from iotdatademo2
Query submitted at: 2017-12-30 07:17:43 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
ERROR: AnalysisException: Syntax error in line 2:
distribute by hash (eventts) into 3 buckets
^
Encountered: IDENTIFIER
Expected: ADD, ALTER, AS, CACHED, CHANGE, COMMENT, DROP, FROM, LIKE, LOCATION, PARTITION, PARTITIONED, PRIMARY, PURGE, RECOVER, RENAME, REPLACE, ROW, SELECT, SET, SORT, STORED, STRAIGHT_JOIN, TBLPROPERTIES, TO, UNCACHED, VALUES, WITH

CAUSED BY: Exception: Syntax error

Workaround: (Tested in kudu 1.5 + impala 2.1 in CDH 5.13.1)


[cdh-vm.dbaglobe.com:21000] > create table kudu_iotdatademo2
                            > (stationid int,
                             > eventts timestamp,
                            > eventdate int,
                            > eventday tinyint,
                            > speed float,
                             > volume int,
                            > primary key (stationid,eventts))
                            > partition by hash partitions 3
                            > stored as kudu;
Query: create table kudu_iotdatademo2
(stationid int,
eventts timestamp,
eventdate int,
eventday tinyint,
speed float,
volume int,
primary key (stationid,eventts))
partition by hash partitions 3
stored as kudu
Fetched 0 row(s) in 1.15s

[cdh-vm.dbaglobe.com:21000] > insert into kudu_iotdatademo2
                             > select stationid,eventts,eventdate,eventday,speed,volume from iotdatademo2;
Query: insert into kudu_iotdatademo2
select stationid,eventts,eventdate,eventday,speed,volume from iotdatademo2
Query submitted at: 2017-12-30 07:18:56 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=a4acf75a7302750:8317ee4000000000
Modified 3456001 row(s), 0 row error(s) in 31.43s
[cdh-vm.dbaglobe.com:21000] >

Apache Kudu DML example (kudu 1.5.0-cdh5.13.1)

[donghua@cdh-vm ~]$ impala-shell -i cdh-vm.dbaglobe.com -k
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
Connected to cdh-vm.dbaglobe.com:21000
Server version: impalad version 2.10.0-cdh5.13.1 RELEASE (build 1e4b23c4eb52dac95c5be6316f49685c41783c51)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.10.0-cdh5.13.1 (1e4b23c) built on Thu Nov  9 08:29:47 PST 2017)

To see a summary of a query's progress that updates in real-time, run 'set
LIVE_PROGRESS=1;'.
**********************************************************************************

[cdh-vm.dbaglobe.com:21000] > create table employees(id int, name string) stored as kudu;
Query: create table employees(id int, name string) stored as kudu
ERROR: AnalysisException: A primary key is required for a Kudu table.

[cdh-vm.dbaglobe.com:21000] > create table employees(id int, name string, primary key (id)) stored as kudu;
Query: create table employees(id int, name string, primary key (id)) stored as kudu
WARNINGS: Unpartitioned Kudu tables are inefficient for large data sizes.

Fetched 0 row(s) in 0.41s
[cdh-vm.dbaglobe.com:21000] > drop table employees;
Query: drop table employees

[cdh-vm.dbaglobe.com:21000] > create table employees(id int, name string, primary key (id)) partition by hash partitions 3 stored as kudu;
Query: create table employees(id int, name string, primary key (id)) partition by hash partitions 3 stored as kudu
Fetched 0 row(s) in 0.15s

[cdh-vm.dbaglobe.com:21000] > insert into employees values (1,'donghua');
Query: insert into employees values (1,'donghua')
Query submitted at: 2017-12-30 07:22:56 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=724c4c67c59d5eb6:9c74075700000000
Modified 1 row(s), 0 row error(s) in 4.28s

[cdh-vm.dbaglobe.com:21000] > select * from employees;
Query: select * from employees
Query submitted at: 2017-12-30 07:23:12 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=bb462f142f62e12b:385e2ce900000000
+----+---------+
| id | name    |
+----+---------+
| 1  | donghua |
+----+---------+
Fetched 1 row(s) in 0.16s

[cdh-vm.dbaglobe.com:21000] > insert into employees values (2,'larry');
Query: insert into employees values (2,'larry')
Query submitted at: 2017-12-30 07:23:21 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=1a4767310c5a9b99:2c2a26b400000000
Modified 1 row(s), 0 row error(s) in 0.11s

[cdh-vm.dbaglobe.com:21000] > select * from employees;
Query: select * from employees
Query submitted at: 2017-12-30 07:23:26 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=7d4b866c73311bd9:5374e5ad00000000
+----+---------+
| id | name    |
+----+---------+
| 2  | larry   |
| 1  | donghua |
+----+---------+
Fetched 2 row(s) in 0.16s

[cdh-vm.dbaglobe.com:21000] > update employees set id=3 where id=1;
Query: update employees set id=3 where id=1
Query submitted at: 2017-12-30 07:23:44 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
ERROR: AnalysisException: Key column 'id' cannot be updated.

[cdh-vm.dbaglobe.com:21000] > update employees set name='tom' where id=2;
Query: update employees set name='tom' where id=2
Query submitted at: 2017-12-30 07:23:58 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=644fe7f97c2c5221:bc7730eb00000000
Modified 1 row(s), 0 row error(s) in 0.18s

[cdh-vm.dbaglobe.com:21000] > delete from employees where id=1;
Query: delete from employees where id=1
Query submitted at: 2017-12-30 07:24:11 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=3048cecbb9e1c886:7686422c00000000
Modified 1 row(s), 0 row error(s) in 0.13s

[cdh-vm.dbaglobe.com:21000] > select * from employees;
Query: select * from employees
Query submitted at: 2017-12-30 07:24:16 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=7244597f3717fcd0:5c81509d00000000
+----+------+
| id | name |
+----+------+
| 2  | tom  |
+----+------+
Fetched 1 row(s) in 0.14s
[cdh-vm.dbaglobe.com:21000] > exit;
Goodbye donghua

Kudu & Impalad flag file configuration

[root@cdh-vm donghua]# ps -ef|egrep 'kudu-|impalad'
kudu      4466  1221  0 Dec29 ?        00:01:10 /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/kudu/sbin/kudu-tserver --tserver_master_addrs=cdh-vm.dbaglobe.com --flagfile=/run/cloudera-scm-agent/process/96-kudu-KUDU_TSERVER/gflagfile
kudu      4468  1221  0 Dec29 ?        00:01:15 /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/kudu/sbin/kudu-master --flagfile=/run/cloudera-scm-agent/process/97-kudu-KUDU_MASTER/gflagfile
impala   11401  1221  0 Dec29 ?        00:02:12 /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/110-impala-IMPALAD/impala-conf/impalad_flags

[root@cdh-vm donghua]# cat /run/cloudera-scm-agent/process/97-kudu-KUDU_MASTER/gflagfile
-default_num_replicas=3
-fs_data_dirs=/dfs/kmd
-fs_wal_dir=/dfs/kmw
-log_dir=/var/log/kudu
-log_force_fsync_all=false
-logbuflevel=0
-max_log_size=1800
-minloglevel=0
-superuser_acl
-user_acl=*
-v=0
-webserver_certificate_file
-webserver_port=8051
-webserver_private_key_file
-webserver_private_key_password_cmd

[root@cdh-vm donghua]# cat /run/cloudera-scm-agent/process/96-kudu-KUDU_TSERVER/gflagfile
-block_cache_capacity_mb=512
-fs_data_dirs=/dfs/ktd
-fs_wal_dir=/dfs/ktw
-log_dir=/var/log/kudu
-log_force_fsync_all=false
-logbuflevel=0
-maintenance_manager_num_threads=1
-max_log_size=1800
-memory_limit_hard_bytes=1073741824
-minloglevel=0
-superuser_acl
-user_acl=*
-v=0
-webserver_certificate_file
-webserver_port=8050
-webserver_private_key_file
-webserver_private_key_password_cmd


[root@cdh-vm donghua]# cat /run/cloudera-scm-agent/process/110-impala-IMPALAD/impala-conf/impalad_flags
-beeswax_port=21000
-fe_port=21000
-be_port=22000
-llama_callback_port=28000
-hs2_port=21050
-enable_webserver=true
-mem_limit=268435456
-max_log_files=10
-webserver_port=25000
-max_result_cache_size=100000
-state_store_subscriber_port=23000
-statestore_subscriber_timeout_seconds=30
-scratch_dirs=/impala/impalad
-default_query_options
-load_auth_to_local_rules=false
-kerberos_reinit_interval=60
-principal=impala/cdh-vm.dbaglobe.com@DBAGLOBE.COM
-keytab_file=/run/cloudera-scm-agent/process/110-impala-IMPALAD/impala.keytab
-log_filename=impalad
-audit_event_log_dir=/var/log/impalad/audit
-max_audit_event_log_file_size=5000
-abort_on_failed_audit_event=false
-minidump_path=/var/log/impala-minidumps
-max_minidumps=9
-lineage_event_log_dir=/var/log/impalad/lineage
-max_lineage_log_file_size=5000
-hostname=cdh-vm.dbaglobe.com
-state_store_host=cdh-vm.dbaglobe.com
-enable_rm=false
-state_store_port=24000
-catalog_service_host=cdh-vm.dbaglobe.com
-catalog_service_port=26000
-local_library_dir=/var/lib/impala/udfs
-fair_scheduler_allocation_path=/run/cloudera-scm-agent/process/110-impala-IMPALAD/impala-conf/fair-scheduler.xml
-llama_site_path=/run/cloudera-scm-agent/process/110-impala-IMPALAD/impala-conf/llama-site.xml
-disable_admission_control=false
-queue_wait_timeout_ms=60000
-disk_spill_encryption=false
-abort_on_config_error=true
-kudu_master_hosts=cdh-vm.dbaglobe.com

Thursday, December 28, 2017

Permission issue after Sentry enabled for HDFS/HIVE/Impala/Hue

Original few tables created under user donghua, now zero table showed through “show tables” command:

0: jdbc:hive2://cdh-vm. dbaglobe.com:10000/d> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
+-----------+--+
No rows selected (0.386 seconds)

Permission denied for create table:

0: jdbc:hive2://cdh-vm. dbaglobe.com:10000/d> create table employee3 (id int, name string);
Error: Error while compiling statement: FAILED: SemanticException No valid privileges
  User donghua does not have privileges for CREATETABLE
  The required privileges: Server=server1->Db=default->action=*; (state=42000,code=40000)


Quick solution:

login hive as user hive, and grant all permission to donghua; (refer to URL here if help needed to login as hive after kerboros enabled: http://www.dbaglobe.com/2017/12/login-as-keberos-userprincipal-after.html)

create role analyst_role;
grant all on database default to role analyst_role;
grant role analyst_role to donghua;

Hive Transform using python script example

Python Script: 

[donghua@cdh-vm ~]$ cat employees.py
  import sys
  for line in sys.stdin:
   (employeeid,firstname,lastname) = line.split('\t')
   # print function in python2 will introduce newline
   sys.stdout.write(employeeid+'\t'+firstname+','+lastname)
  
Hive Script: 

create table employees (employee_id int,first_name string,last_name string) stored as avro;
insert into employees values(1,'donghua','luo'),(2,'larry','elison'),(3,'tom','kyte');

add file /tmp/employees.py;
select transform(employee_id,first_name,last_name) using 'python employees.py' as (employee_id,full_name) from employees;

Sample output:

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/d> select transform(employee_id,first_name,last_name) using 'python employees.py' as (employee_id,full_name) from employees;
  +--------------+---------------+--+
  | employee_id  |   full_name   |
  +--------------+---------------+--+
  | 1            | donghua,luo   |
  | 2            | larry,elison  |
  | 3            | tom,kyte      |
  +--------------+---------------+--+
  3 rows selected (17.997 seconds)

Tuesday, December 26, 2017

Alternative way to disable transparent hugepage (THP) on Red Hat Enterprise Linux 7

Append “transparent_hugepage=never” to GRUB_CMDLINE_LINUX in /etc/default/grub

[root@hdp-vm ~]# vi /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet transparent_hugepage=never"
GRUB_DISABLE_RECOVERY="true"

[root@hdp-vm ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-693.11.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.11.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-8be2b63ef43643f786bd865127a5a3bb
Found initrd image: /boot/initramfs-0-rescue-8be2b63ef43643f786bd865127a5a3bb.img

[root@hdp-vm ~]# reboot

[root@hdp-vm ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]


[root@hdp-vm ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.11.1.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet transparent_hugepage=never

Hive table sampling explained with examples

-- Leverage prebuild buckets
select * from monthly_taxi_fleet6 tablesample(bucket 1 out of 3 on month);

-- Leverage prebuild buckets, split it from 3 buckets into 10 buckets dynamically)
select * from monthly_taxi_fleet6 tablesample(bucket 1 out of 10 on month);

-- Dynamically build bucket on company column
select * from monthly_taxi_fleet6 tablesample(bucket 1 out of 3 on company);

-- block based sampling
select * from monthly_taxi_fleet6 tablesample(5 percent);

-- block based sampling, limit input by storage size
select * from monthly_taxi_fleet6 tablesample(5M);

-- row based sampling, limiting input by row count basis
select * from monthly_taxi_fleet6 tablesample(10 rows);

Hive Bucketing with examples

Think it as HASH based indexes in RDBMS, more suitable for high cardinanity data columns (e.g.: customer_id, product_id, station_id, etc)

Basic Bucket example:


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create table monthly_taxi_fleet6
. . . . . . . . . . . . . . . . . . . . . . .> (month char(7),fleet smallint,company varchar(50))
. . . . . . . . . . . . . . . . . . . . . . .> clustered by (company) into 3 buckets
. . . . . . . . . . . . . . . . . . . . . . .> stored as avro;

Example using Apache Hive version 1.1.0-cdh5.13.1, hive.enforce.bucketing=false by default
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet6
. . . . . . . . . . . . . . . . . . . . . . .> select month,fleet,company from monthly_taxi_fleet;

[donghua@cdh-vm ~]$ hdfs dfs -ls -R /user/hive/warehouse/monthly_taxi_fleet6
-rwxrwxrwt   1 donghua hive      25483 2017-12-26 10:40 /user/hive/warehouse/monthly_taxi_fleet6/000000_0

-- hive.enforce.bucketing: Whether bucketing is enforced. If true, while inserting into the table, bucketing is enforced.
-- Default Value: Hive 0.x: false, Hive 1.x: false, Hive 2.x: removed, which effectively makes it always true (HIVE-12331)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> set hive.enforce.bucketing=true;

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet6
. . . . . . . . . . . . . . . . . . . . . . .> select month,fleet,company from monthly_taxi_fleet;

[donghua@cdh-vm ~]$ hdfs dfs -ls -R /user/hive/warehouse/monthly_taxi_fleet6
-rwxrwxrwt   1 donghua hive      13611 2017-12-26 10:43 /user/hive/warehouse/monthly_taxi_fleet6/000000_0
-rwxrwxrwt   1 donghua hive       6077 2017-12-26 10:43 /user/hive/warehouse/monthly_taxi_fleet6/000001_0
-rwxrwxrwt   1 donghua hive       6589 2017-12-26 10:43 /user/hive/warehouse/monthly_taxi_fleet6/000002_0

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> describe extended monthly_taxi_fleet6;
+-----------------------------+----------------------------------------------------+----------+--+
|          col_name           |                     data_type                      | comment  |
+-----------------------------+----------------------------------------------------+----------+--+
| month                       | char(7)                                            |          |
| fleet                       | int                                                |          |
| company                     | varchar(50)                                        |          |
|                             | NULL                                               | NULL     |
| Detailed Table Information  | Table(tableName:monthly_taxi_fleet6, dbName:default, owner:donghua, createTime:1514256031, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:month, type:char(7), comment:null), FieldSchema(name:fleet, type:smallint, comment:null), FieldSchema(name:company, type:varchar(50), comment:null)], location:hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet6, inputFormat:org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat, compressed:false, numBuckets:3, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.avro.AvroSerDe, parameters:{serialization.format=1}), bucketCols:[company], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=26277, numRows=1128, rawDataSize=0, COLUMN_STATS_ACCURATE=true, numFiles=3, transient_lastDdlTime=1514256192}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) |          |
+-----------------------------+----------------------------------------------------+----------+--+
5 rows selected (0.075 seconds)

Advanced Bucket example: Partition + Bucketing + Sorted by


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create table monthly_taxi_fleet7
. . . . . . . . . . . . . . . . . . . . . . .> (month char(7),fleet smallint)
. . . . . . . . . . . . . . . . . . . . . . .> partitioned by (company varchar(50))
. . . . . . . . . . . . . . . . . . . . . . .> clustered by (month) sorted by (month)into 3 buckets
. . . . . . . . . . . . . . . . . . . . . . .> stored as avro;

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet7
. . . . . . . . . . . . . . . . . . . . . . .> partition (company)
. . . . . . . . . . . . . . . . . . . . . . .> select month,fleet,company from monthly_taxi_fleet;

[donghua@cdh-vm ~]$ hdfs dfs -ls -R /user/hive/warehouse/monthly_taxi_fleet7
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=CityCab
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=CityCab/000000_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=CityCab/000001_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=CityCab/000002_0

drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Comfort
-rwxrwxrwt   1 donghua hive        913 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Comfort/000000_0
-rwxrwxrwt   1 donghua hive        913 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Comfort/000001_0
-rwxrwxrwt   1 donghua hive        913 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Comfort/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Individual Yellow- Top
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Individual Yellow- Top/000000_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Individual Yellow- Top/000001_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Individual Yellow- Top/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Premier
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Premier/000000_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Premier/000001_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Premier/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Prime
-rwxrwxrwt   1 donghua hive        765 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Prime/000000_0
-rwxrwxrwt   1 donghua hive        765 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Prime/000001_0
-rwxrwxrwt   1 donghua hive        766 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Prime/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=SMRT
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=SMRT/000000_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=SMRT/000001_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=SMRT/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Smart
-rwxrwxrwt   1 donghua hive        720 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Smart/000000_0
-rwxrwxrwt   1 donghua hive        719 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Smart/000001_0
-rwxrwxrwt   1 donghua hive        719 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=Smart/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=TransCab
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=TransCab/000000_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=TransCab/000001_0
-rwxrwxrwt   1 donghua hive        865 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=TransCab/000002_0
drwxrwxrwt   - donghua hive          0 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=YTC
-rwxrwxrwt   1 donghua hive        432 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=YTC/000000_0
-rwxrwxrwt   1 donghua hive        432 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=YTC/000001_0
-rwxrwxrwt   1 donghua hive        432 2017-12-26 11:05 /user/hive/warehouse/monthly_taxi_fleet7/company=YTC/000002_0

Monday, December 25, 2017

Hive Partition by Examples

[donghua@cdh-vm ~]$  beeline -u jdbc:hive2://cdh-vm.dbaglobe.com:10000/default -n donghua
Connecting to jdbc:hive2://cdh-vm.dbaglobe.com:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.13.1)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.13.0 by Apache Hive
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def>

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> select substr(month,0,4) as year,month,company,fleet from monthly_taxi_fleet limit 10;
+-------+----------+-------------------------+--------+--+
| year  |  month   |         company         | fleet  |
+-------+----------+-------------------------+--------+--+
| 2005  | 2005-01  | Comfort                 | 9952   |
| 2005  | 2005-01  | CityCab                 | 4965   |
| 2005  | 2005-01  | SMRT                    | 2441   |
| 2005  | 2005-01  | YTC                     | 1223   |
| 2005  | 2005-01  | Individual Yellow- Top  | 696    |
| 2005  | 2005-01  | Smart                   | 320    |
| 2005  | 2005-01  | TransCab                | 560    |
| 2005  | 2005-01  | Premier                 | 370    |
| 2005  | 2005-02  | Comfort                 | 10046  |
| 2005  | 2005-02  | CityCab                 | 4968   |
+-------+----------+-------------------------+--------+--+

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create table monthly_taxi_fleet_company_year_part
. . . . . . . . . . . . . . . . . . . . . . .> (month char(7),fleet smallint)
. . . . . . . . . . . . . . . . . . . . . . .> partitioned by (company varchar(50),year char(4))
. . . . . . . . . . . . . . . . . . . . . . .> stored as avro;


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> desc monthly_taxi_fleet_company_year_part;

+--------------------------+-----------------------+-----------------------+--+
|         col_name         |       data_type       |        comment        |
+--------------------------+-----------------------+-----------------------+--+
| month                    | char(7)               |                       |
| fleet                    | int                   |                       |
| company                  | varchar(50)           |                       |
| year                     | char(4)               |                       |
|                          | NULL                  | NULL                  |
| # Partition Information  | NULL                  | NULL                  |
| # col_name               | data_type             | comment               |
|                          | NULL                  | NULL                  |
| company                  | varchar(50)           |                       |
| year                     | char(4)               |                       |
+--------------------------+-----------------------+-----------------------+--+


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet_company_year_part (month,fleet,company,year)
. . . . . . . . . . . . . . . . . . . . . . .> values('2017-10',10000,'Comfort','2017');
Error: Error while compiling statement: FAILED: SemanticException 1:50 '[year, company]' in insert schema specification are not found among regular columns of default.monthly_taxi_fleet_company_year_part nor dynamic partition columns.. Error encountered near token 'year' (state=42000,code=40000)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet_company_year_part
. . . . . . . . . . . . . . . . . . . . . . .> partition (company='Comfort',year='2017')
. . . . . . . . . . . . . . . . . . . . . . .> values('2017-10',10000);

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> select * from monthly_taxi_fleet_company_year_part o;
+----------+----------+------------+---------+--+
| o.month  | o.fleet  | o.company  | o.year  |
+----------+----------+------------+---------+--+
| 2017-10  | 10000    | Comfort    | 2017    |
+----------+----------+------------+---------+--+
1 row selected (0.102 seconds)

[donghua@cdh-vm ~]$ hdfs dfs -ls -R /user/hive/warehouse/monthly_taxi_fleet_company_year_part
drwxrwxrwt   - donghua hive          0 2017-12-25 21:25 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=Comfort
drwxrwxrwt   - donghua hive          0 2017-12-25 21:25 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=Comfort/year=2017
-rwxrwxrwt   1 donghua hive        318 2017-12-25 21:25 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=Comfort/year=2017/000000_0

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> drop table monthly_taxi_fleet_company_year_part;

-- Dynamic partition example

-- Default for current session only,
-- To make it permanently, edit properties in hive-site.xml


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> set hive.exec.dynamic.partition = true';
No rows affected (0.003 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> set hive.exec.dynamic.partition.mode = nonstrict;
No rows affected (0.004 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet_company_year_part
. . . . . . . . . . . . . . . . . . . . . . .> partition (company,year)
. . . . . . . . . . . . . . . . . . . . . . .> values('2017-10',10000,'Comfort','2017');


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet_company_year_part
. . . . . . . . . . . . . . . . . . . . . . .> partition (company,year)
. . . . . . . . . . . . . . . . . . . . . . .> select month,fleet,company,substr(month,0,4) as year from monthly_taxi_fleet;
INFO  : Compiling command(queryId=hive_20171225221515_3c4f0c8e-9d61-40e4-9d21-9e6c3249c230): insert into monthly_taxi_fleet_company_year_part
partition (company,year)
select month,fleet,company,substr(month,0,4) as year from monthly_taxi_fleet
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:char(7), comment:null), FieldSchema(name:_col1, type:int, comment:null), FieldSchema(name:_col2, type:varchar(50), comment:null), FieldSchema(name:_col3, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171225221515_3c4f0c8e-9d61-40e4-9d21-9e6c3249c230); Time taken: 0.166 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171225221515_3c4f0c8e-9d61-40e4-9d21-9e6c3249c230): insert into monthly_taxi_fleet_company_year_part
partition (company,year)
select month,fleet,company,substr(month,0,4) as year from monthly_taxi_fleet
INFO  : Query ID = hive_20171225221515_3c4f0c8e-9d61-40e4-9d21-9e6c3249c230
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1513984921012_0069
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0069/
INFO  : Starting Job = job_1513984921012_0069, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0069/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0069
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2017-12-25 22:15:51,245 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-12-25 22:16:04,336 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 810 msec
INFO  : Ended Job = job_1513984921012_0069
INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Starting task [Stage-4:MOVE] in serial mode
INFO  : Moving data to: hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet_company_year_part/.hive-staging_hive_2017-12-25_22-15-43_889_5842295461208605082-6/-ext-10000 from hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet_company_year_part/.hive-staging_hive_2017-12-25_22-15-43_889_5842295461208605082-6/-ext-10002
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table default.monthly_taxi_fleet_company_year_part partition (company=null, year=null) from hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet_company_year_part/.hive-staging_hive_2017-12-25_22-15-43_889_5842295461208605082-6/-ext-10000
INFO  :          Time taken for load dynamic partitions : 15475
INFO  :         Loading partition {company=CityCab, year=2005}
INFO  :         Loading partition {company=Prime, year=2008}
INFO  :         Loading partition {company=CityCab, year=2009}
INFO  :         Loading partition {company=SMRT, year=2014}
INFO  :         Loading partition {company=Individual Yellow- Top, year=2013}
INFO  :         Loading partition {company=Comfort, year=2016}
<omitted for reading clarity>
INFO  :         Loading partition {company=Comfort, year=2008}
INFO  :         Loading partition {company=TransCab, year=2007}
INFO  :         Loading partition {company=Individual Yellow- Top, year=2008}
INFO  :         Loading partition {company=Prime, year=2016}
INFO  :         Loading partition {company=Prime, year=2013}
INFO  :         Loading partition {company=TransCab, year=2008}
INFO  :         Loading partition {company=Comfort, year=2009}
INFO  :         Loading partition {company=TransCab, year=2011}
INFO  :          Time taken for adding to write entity : 3
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2005} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2006} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2007} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2008} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2009} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2010} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2011} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2012} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2013} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2014} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2015} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=CityCab, year=2016} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=Comfort, year=2005} stats: [numFiles=1, numRows=12, totalSize=462, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=Comfort, year=2006} stats: [numFiles=1, numRows=12, totalSize=462, rawDataSize=0]

<omitted for reading clarity>

numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=YTC, year=2005} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=YTC, year=2006} stats: [numFiles=1, numRows=12, totalSize=450, rawDataSize=0]
INFO  : Partition default.monthly_taxi_fleet_company_year_part{company=YTC, year=2007} stats: [numFiles=1, numRows=12, totalSize=447, rawDataSize=0]
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 4.81 sec   HDFS Read: 31633 HDFS Write: 50547 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 4 seconds 810 msec
INFO  : Completed executing command(queryId=hive_20171225221515_3c4f0c8e-9d61-40e4-9d21-9e6c3249c230); Time taken: 40.317 seconds
INFO  : OK
No rows affected (40.505 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def>


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> show partitions monthly_taxi_fleet_company_year_part;
+-------------------------------------------+--+
|                 partition                 |
+-------------------------------------------+--+
| company=CityCab/year=2005                 |
| company=CityCab/year=2006                 |
| company=CityCab/year=2007                 |
| company=CityCab/year=2008                 |
| company=CityCab/year=2009                 |
| company=CityCab/year=2010                 |
| company=CityCab/year=2011                 |
| company=CityCab/year=2012                 |
| company=CityCab/year=2013                 |
| company=CityCab/year=2014                 |
<omitted for reading clarity>
| company=TransCab/year=2012                |
| company=TransCab/year=2013                |
| company=TransCab/year=2014                |
| company=TransCab/year=2015                |
| company=TransCab/year=2016                |
| company=YTC/year=2005                     |
| company=YTC/year=2006                     |
| company=YTC/year=2007                     |
+-------------------------------------------+--+
94 rows selected (0.093 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def>

[donghua@cdh-vm ~]$ hdfs dfs -ls -R /user/hive/warehouse/monthly_taxi_fleet_company_year_part
drwxrwxrwt   - donghua hive          0 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=CityCab
drwxrwxrwt   - donghua hive          0 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=CityCab/year=2005
-rwxrwxrwt   1 donghua hive        450 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=CityCab/year=2005/000000_0
drwxrwxrwt   - donghua hive          0 2017-12-25

<omitted for reading clarity>
-rwxrwxrwt   1 donghua hive        450 2017-12-25 22:15 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=TransCab/year=2016/000000_0
drwxrwxrwt   - donghua hive          0 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC
drwxrwxrwt   - donghua hive          0 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC/year=2005
-rwxrwxrwt   1 donghua hive        450 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC/year=2005/000000_0
drwxrwxrwt   - donghua hive          0 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC/year=2006
-rwxrwxrwt   1 donghua hive        450 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC/year=2006/000000_0
drwxrwxrwt   - donghua hive          0 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC/year=2007
-rwxrwxrwt   1 donghua hive        447 2017-12-25 22:16 /user/hive/warehouse/monthly_taxi_fleet_company_year_part/company=YTC/year=2007/000000_0

Possible Errors & Solutions:

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into monthly_taxi_fleet_company_year_part
. . . . . . . . . . . . . . . . . . . . . . .> partition (company,year)
. . . . . . . . . . . . . . . . . . . . . . .> select month,fleet,company,substr(month,0,4) as year from monthly_taxi_fleet;
INFO  : Compiling command(queryId=hive_20171225214545_54b5b6f8-8a03-451a-929f-fb175747a0ec): insert into monthly_taxi_fleet_company_year_part
partition (company,year)
select month,fleet,company,substr(month,0,4) as year from monthly_taxi_fleet
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:char(7), comment:null), FieldSchema(name:_col1, type:int, comment:null), FieldSchema(name:_col2, type:varchar(50), comment:null), FieldSchema(name:_col3, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171225214545_54b5b6f8-8a03-451a-929f-fb175747a0ec); Time taken: 0.166 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171225214545_54b5b6f8-8a03-451a-929f-fb175747a0ec): insert into monthly_taxi_fleet_company_year_part
partition (company,year)
select month,fleet,company,substr(month,0,4) as year from monthly_taxi_fleet
INFO  : Query ID = hive_20171225214545_54b5b6f8-8a03-451a-929f-fb175747a0ec
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1513984921012_0065
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0065/
INFO  : Starting Job = job_1513984921012_0065, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0065/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0065
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2017-12-25 21:45:12,194 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-12-25 21:46:12,857 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-12-25 21:46:19,235 Stage-1 map = 100%,  reduce = 0%
ERROR : Ended Job = job_1513984921012_0065 with errors
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing command(queryId=hive_20171225214545_54b5b6f8-8a03-451a-929f-fb175747a0ec); Time taken: 76.105 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

2017-12-25 21:45:24,056 FATAL [IPC Server handler 17 on 44101] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1513984921012_0065_m_000000_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"month":"2006-02","company":"YTC","fleet":876}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"month":"2006-02","company":"YTC","fleet":876}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
    ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 100
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:897)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:677)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:98)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
    ... 9 more

   
1 row selected (44.065 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> set hive.exec.max.dynamic.partition=1000;
No rows affected (0.006 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> set hive.exec.max.dynamic.partitions.pernode=1000;
No rows affected (0.002 seconds)

Hive Bitmap Indexes with example

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> show create table monthly_taxi_fleet5;
createtab_stmt
CREATE TABLE `monthly_taxi_fleet5`(
  `month` char(7),
  `company` varchar(50),
  `fleet` smallint)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false',
  'last_modified_by'='donghua',
  'last_modified_time'='1514073485',
  'numFiles'='1',
  'numRows'='-1',
  'rawDataSize'='-1',
  'totalSize'='628',
  'transient_lastDdlTime'='1514073485')
21 rows selected (0.069 seconds)


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> !outputformat csv2
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> explain select * from monthly_taxi_fleet5 where company='Comfort';

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
           TableScan
            alias: monthly_taxi_fleet5
             filterExpr: (company = 'Comfort') (type: boolean)
             Statistics: Num rows: 10 Data size: 628 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (company = 'Comfort') (type: boolean)
              Statistics: Num rows: 5 Data size: 314 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: month (type: char(7)), company (type: varchar(50)), fleet (type: smallint)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 5 Data size: 314 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                   Statistics: Num rows: 5 Data size: 314 Basic stats: COMPLETE Column stats: NONE
                  table:
                       input format: org.apache.hadoop.mapred.TextInputFormat
                       output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

33 rows selected (0.183 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create index monthly_taxi_fleet5_company on table monthly_taxi_fleet5(company) as 'bitmap';
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Please specify deferred rebuild using " WITH DEFERRED REBUILD ".
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Please specify deferred rebuild using " WITH DEFERRED REBUILD ". (state=08S01,code=1)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create index monthly_taxi_fleet5_company on table monthly_taxi_fleet5(company) as 'bitmap' with deferred rebuild;


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> alter index monthly_taxi_fleet5_company on monthly_taxi_fleet5 rebuild;
No rows affected (21.826 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> explain select * from monthly_taxi_fleet5 where company='Comfort';

Explain
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-2 depends on stages: Stage-3
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: default__monthly_taxi_fleet5_monthly_taxi_fleet5_company__
            filterExpr: ((company = 'Comfort') and (not EWAH_BITMAP_EMPTY(_bitmaps))) (type: boolean)
             Filter Operator
              predicate: ((company = 'Comfort') and (not EWAH_BITMAP_EMPTY(_bitmaps))) (type: boolean)
               Select Operator
                expressions: _bucketname (type: string), _offset (type: bigint)
                outputColumnNames: _col0, _col1
                Group By Operator
                   aggregations: collect_set(_col1)
                  keys: _col0 (type: string)
                  mode: hash
                   outputColumnNames: _col0, _col1
                  Reduce Output Operator
                    key expressions: _col0 (type: string)
                    sort order: +
                     Map-reduce partition columns: _col0 (type: string)
                     value expressions: _col1 (type: array<bigint>)
      Reduce Operator Tree:
        Group By Operator
          aggregations: collect_set(VALUE._col0)
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-2
    Move Operator
      files:
           hdfs directory: true
          destination: hdfs://cdh-vm.dbaglobe.com:8020/tmp/hive/donghua/6086c443-dac8-4517-8118-a37d42ce56a4/hive_2017-12-25_19-08-59_026_6029178632909749720-6/-mr-10003

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
           TableScan
            alias: monthly_taxi_fleet5
             filterExpr: (company = 'Comfort') (type: boolean)
             Statistics: Num rows: 10 Data size: 628 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (company = 'Comfort') (type: boolean)
              Statistics: Num rows: 5 Data size: 314 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: month (type: char(7)), company (type: varchar(50)), fleet (type: smallint)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 5 Data size: 314 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                   Statistics: Num rows: 5 Data size: 314 Basic stats: COMPLETE Column stats: NONE
                  table:
                       input format: org.apache.hadoop.mapred.TextInputFormat
                       output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

75 rows selected (0.39 seconds)


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> select * from monthly_taxi_fleet5 where company='Comfort';
INFO  : Compiling command(queryId=hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785): select * from monthly_taxi_fleet5 where company='Comfort'
INFO  : Compiling command(queryId=hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785): INSERT OVERWRITE DIRECTORY "hdfs://cdh-vm.dbaglobe.com:8020/tmp/hive/donghua/5a539317-9cca-43b8-b460-de23c3d7c159/hive_2017-12-25_19-13-24_864_2203428447630231527-6/-mr-10003" SELECT bucketname AS `_bucketname` , COLLECT_SET(offset) AS `_offsets` FROM (SELECT `_bucketname` AS bucketname , `_offset` AS offset FROM (SELECT * FROM `default__monthly_taxi_fleet5_monthly_taxi_fleet5_company__` WHERE (company = 'Comfort')) ind0 WHERE NOT EWAH_BITMAP_EMPTY(ind0.`_bitmaps`) ) tmp_index GROUP BY bucketname
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_bucketname, type:string, comment:null), FieldSchema(name:_offsets, type:array<bigint>, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785); Time taken: 0.07 seconds
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:monthly_taxi_fleet5.month, type:char(7), comment:null), FieldSchema(name:monthly_taxi_fleet5.company, type:varchar(50), comment:null), FieldSchema(name:monthly_taxi_fleet5.fleet, type:smallint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785); Time taken: 0.071 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785): select * from monthly_taxi_fleet5 where company='Comfort'
INFO  : Query ID = hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785
INFO  : Total jobs = 2
INFO  : Launching Job 1 out of 2
INFO  : Starting task [Stage-3:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1513984921012_0058
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0058/
INFO  : Starting Job = job_1513984921012_0058, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0058/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0058
INFO  : Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
INFO  : 2017-12-25 19:13:33,536 Stage-3 map = 0%,  reduce = 0%
INFO  : 2017-12-25 19:13:38,875 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.62 sec
INFO  : 2017-12-25 19:13:45,406 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 3.47 sec
INFO  : MapReduce Total cumulative CPU time: 3 seconds 470 msec
INFO  : Ended Job = job_1513984921012_0058
INFO  : Starting task [Stage-2:MOVE] in serial mode
INFO  : Moving data to: hdfs://cdh-vm.dbaglobe.com:8020/tmp/hive/donghua/5a539317-9cca-43b8-b460-de23c3d7c159/hive_2017-12-25_19-13-24_864_2203428447630231527-6/-mr-10003 from hdfs://cdh-vm.dbaglobe.com:8020/tmp/hive/donghua/5a539317-9cca-43b8-b460-de23c3d7c159/hive_2017-12-25_19-13-24_864_2203428447630231527-6/-mr-10003/.hive-staging_hive_2017-12-25_19-13-24_972_8970662548653177884-6/-ext-10000
INFO  : Launching Job 2 out of 2
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:2
INFO  : Submitting tokens for job: job_1513984921012_0059
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0059/
INFO  : Starting Job = job_1513984921012_0059, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0059/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0059
INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
INFO  : 2017-12-25 19:13:54,773 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-12-25 19:14:04,866 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.79 sec
INFO  : 2017-12-25 19:14:06,983 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.9 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 900 msec
INFO  : Ended Job = job_1513984921012_0059
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 3.47 sec   HDFS Read: 11432 HDFS Write: 88 SUCCESS
INFO  : Stage-Stage-1: Map: 2   Cumulative CPU: 4.9 sec   HDFS Read: 11938 HDFS Write: 43 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 8 seconds 370 msec
INFO  : Completed executing command(queryId=hive_20171225191313_3bbd504c-a4ec-4cf8-8427-fea0f832f785); Time taken: 43.095 seconds
INFO  : OK
monthly_taxi_fleet5.month,monthly_taxi_fleet5.company,monthly_taxi_fleet5.fleet
2005-02,Comfort,10046
2005-01,Comfort,9952
2 rows selected (43.346 seconds)

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> !outputformat vertical


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> select * from default__monthly_taxi_fleet5_monthly_taxi_fleet5_company__ company_index;


company_index.company      CityCab
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      62
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      Comfort
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      62
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      Premier
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      125
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      TransCab
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      188
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      Smart
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      251
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      Individual Yellow- Top
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      314
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      YTC
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      376
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      SMRT
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      439
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      CityCab
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      502
company_index._bitmaps     [1,2,4,8589934592,1,0]

company_index.company      Comfort
company_index._bucketname  hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/monthly_taxi_fleet5/000000_0
company_index._offset      565
company_index._bitmaps     [1,2,4,8589934592,1,0]

10 rows selected (0.143 seconds)

Example to drop the index:

0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> drop index monthly_taxi_fleet5_company on monthly_taxi_fleet5;