Saturday, December 30, 2017

Kudu “distribute by” syntax error in Impala

Below example follows syntax in various Cloudera Kudu training (kudu v1.0) and documentations/blogs, with syntax error. (replace “distribute by” to equivalent syntax of “partition by” doesn’t help)

[donghua@cdh-vm ~]$ impala-shell -i cdh-vm.dbaglobe.com -k
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
Connected to cdh-vm.dbaglobe.com:21000
Server version: impalad version 2.10.0-cdh5.13.1 RELEASE (build 1e4b23c4eb52dac95c5be6316f49685c41783c51)
***********************************************************************************
Welcome to the Impala shell.


(Impala Shell v2.10.0-cdh5.13.1 (1e4b23c) built on Thu Nov  9 08:29:47 PST 2017)


[cdh-vm.dbaglobe.com:21000] > create table kudu_iotdatademo3
                            > distribute by hash (eventts) into 3 buckets
                             > tblproperties (
                            > 'storage_handler'='com.cloudera.kudu.hive.KuduStorageHandler',
                             > 'kudu_tablename'='kudu_iotdatademo2',
                            > 'kudu_master_addresses'='cdh-vm.dbaglobe.com:8051',
                            > 'kudu_key_columns'='stationid,eventts')
                            > as select * from iotdatademo2;
Query: create table kudu_iotdatademo3
distribute by hash (eventts) into 3 buckets
tblproperties (
'storage_handler'='com.cloudera.kudu.hive.KuduStorageHandler',
'kudu_tablename'='kudu_iotdatademo2',
'kudu_master_addresses'='cdh-vm.dbaglobe.com:8051',
'kudu_key_columns'='stationid,eventts')
as select * from iotdatademo2
Query submitted at: 2017-12-30 07:17:43 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
ERROR: AnalysisException: Syntax error in line 2:
distribute by hash (eventts) into 3 buckets
^
Encountered: IDENTIFIER
Expected: ADD, ALTER, AS, CACHED, CHANGE, COMMENT, DROP, FROM, LIKE, LOCATION, PARTITION, PARTITIONED, PRIMARY, PURGE, RECOVER, RENAME, REPLACE, ROW, SELECT, SET, SORT, STORED, STRAIGHT_JOIN, TBLPROPERTIES, TO, UNCACHED, VALUES, WITH

CAUSED BY: Exception: Syntax error

Workaround: (Tested in kudu 1.5 + impala 2.1 in CDH 5.13.1)


[cdh-vm.dbaglobe.com:21000] > create table kudu_iotdatademo2
                            > (stationid int,
                             > eventts timestamp,
                            > eventdate int,
                            > eventday tinyint,
                            > speed float,
                             > volume int,
                            > primary key (stationid,eventts))
                            > partition by hash partitions 3
                            > stored as kudu;
Query: create table kudu_iotdatademo2
(stationid int,
eventts timestamp,
eventdate int,
eventday tinyint,
speed float,
volume int,
primary key (stationid,eventts))
partition by hash partitions 3
stored as kudu
Fetched 0 row(s) in 1.15s

[cdh-vm.dbaglobe.com:21000] > insert into kudu_iotdatademo2
                             > select stationid,eventts,eventdate,eventday,speed,volume from iotdatademo2;
Query: insert into kudu_iotdatademo2
select stationid,eventts,eventdate,eventday,speed,volume from iotdatademo2
Query submitted at: 2017-12-30 07:18:56 (Coordinator:
http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=a4acf75a7302750:8317ee4000000000
Modified 3456001 row(s), 0 row error(s) in 31.43s
[cdh-vm.dbaglobe.com:21000] >